Data and pipeline for Mortality in Puerto Rico after Hurricane Maria
Switch branches/tags
Nothing to show
Clone or download
Latest commit 64d686c Jun 15, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
code pushed code for adjustment Jun 2, 2018
data update to hh_main to include a municipality id Jun 14, 2018
figures update fig 4 May 25, 2018
misc Techniqal FAQ PDF Jun 15, 2018
ref removed row names May 28, 2018
.gitignore pushing final primary RDS May 28, 2018
LICENSE pre pub commit May 25, 2018 Update the README to include technical FAQ Jun 15, 2018
session_info.txt pre pub commit May 25, 2018


Here we provide the data and pipeline for: Mortality in Puerto Rico after Hurricane Maria


Kishore N, Marqués D, Mahmud A, et al. Mortality in Puerto Rico after Hurricane Maria. N Engl J Med. DOI: 10.1056/NEJMsa1803972

Additional Resources

  1. FAQs [PDF] — (Spanish [PDF])
  2. Responses to Inquiries [PDF]
  3. Technical FAQ [PDF]

Repository at time of publication

This repository is constantly being updated in response to feedback and inquiries; however, all code will remain entirely reproducible at any point in the commit history.

For full transparency, we wanted to note what the repository looked like before we made additional changes. Thus, the paper release is the version of the repository that existed at the original time of publication. You can get this release by downloading it or using git checkout paper in your local repository.


Background: Quantifying the effect on society of natural disasters is critical for recovery of public health services and infrastructure. The death toll can be difficult to assess in the aftermath of a major disaster. In September 2017, Hurricane Maria caused massive infrastructural damage to Puerto Rico, but its effect on mortality remains contentious.

Methods: Using a representative, stratified sample, we surveyed 3299 randomly chosen households across Puerto Rico to produce an independent estimate of all-cause mortality after the hurricane. Respondents were asked about displacement, infrastructure loss, and causes of death. We calculated excess deaths by comparing our estimated post-hurricane mortality rate with official rates for the same period in 2016.

Results: From the survey data, we estimated a mortality rate of 14.3 deaths (95% confidence interval [CI], 9.8 to 18.9) per 1000 persons from September 20 through December 31, 2017. This rate yielded a total of 4645 excess deaths during this period (95% CI, 793 to 8498), equivalent to a 62% increase in the mortality rate as compared with the same period in 2016. However, this number is likely to be an underestimate because of survivor bias. The mortality rate remained high through the end of December 2017, and one third of the deaths were attributed to delayed or interrupted health care. Hurricane-related migration was substantial.

Conclusions: The official estimate of 64 deaths attributed to Hurricane Maria in Puerto Rico is a substantial underestimate. This survey, based on community sampling, indicated that the number of excess deaths is likely to be more than 70 times the official estimate. (Funded by the Harvard T.H. Chan School of Public Health and others.)

Main Figure a) A comparison of excess death estimates from official reports, press/academic reports, and our survey. b) Reported deaths per month in the survey, categorized by reported cause of death. Two individuals who died of similar causes are superimposed in December who died at the same age resulting in a count of 37 points representing 38 deaths after the hurricane.


  • code — Scripts and output for figures included in the manuscript and supplement
  • data — Initial data resources and survey data (more information in the folder README )
  • figures — Final figures included in manuscript
  • ref — pipelines used to clean raw information and generate .RDS files found in /data/rdata/
  • misc — Folder with a Spanish version of the paper as well as an xml file with the survey instrument used to gather data.


  • master is locked
  • We have tagged the version of this repository at the time of publication under the paper release
  • Feel free to create a new branch for further incorporation and analysis
  • All geospatial data has been stripped; by using this dataset, you agree to not undertake any steps to identify any respondents or their families
  • More information in data


For any issues with anonymization or major issues with the functionality of the script please create an issue.


The data collected and presented is licensed under the Creative Commons Attribution 3.0 license, and the underlying code used to format, analyze and display that content is licensed under the MIT license.