# Maternal Mortality Data Analysis Project

![Banner](./assets/banner.jpeg)

## Topic
*What problem are you (or your stakeholder) trying to address?*
📝 <!-- Answer Below -->

This project will focus on maternal mortality trends in the United States and globally. Maternal mortality is a critical health indicator because it reflects the quality of healthcare systems, access to care, and broader social and economic inequities. In the U.S., maternal mortality has been rising in recent years despite overall healthcare advancements, and globally, large disparities remain between high-income and low-income countries. Understanding these trends and patterns is essential for shaping public health interventions and policies that can save lives.  

## Project Question
*What specific question are you seeking to answer with this project?*
*This is not the same as the questions you ask to limit the scope of the project.*
📝 <!-- Answer Below -->


1. How have maternal mortality ratios changed in the United States from 2018–2023?

2. How do U.S. maternal mortality trends compare with global and regional trends reported by the World Health Organization?

3. Are there noticeable differences in maternal mortality rates across U.S. census regions, and what do those patterns suggest?

4. What socioeconomic or healthcare access factors might help explain the disparities observed in the data?


## What would an answer look like?
*What is your hypothesized answer to your question?*
📝 <!-- Answer Below -->

- **Line charts** - showing U.S. maternal mortality ratios over time.  
- **Bar charts** - comparing maternal mortality across U.S. census regions.  
- **Global heat map (choropleth)** - showing regional differences in maternal mortality ratios.  
- **Summary tables** - highlighting ratios, raw rates, and changes across years.  


## Data Sources
*What 3 data sources have you identified for this project?*
*How are you going to relate these datasets?*
📝 <!-- Answer Below -->

1. **CDC WONDER – Multiple Cause of Death (2018–2023, Excel)**  
   - Source: CDC WONDER public database.  
   - Provides U.S. maternal mortality–related deaths by census region, with crude rates per 100,000 population.  
   - Type: Database export (Excel)  

2. **NCHS – Pregnancy-Related Mortality Ratio in the U.S. (CSV)**  
   - Source: National Center for Health Statistics.  
   - Provides detailed U.S. pregnancy-related mortality ratios by year.  
   - Type: File (CSV)  

3. **WHO Global Maternal Mortality Estimates (CSV/XLSX)**  
   - Source: WHO & UNICEF joint global estimates.  
   - Provides maternal mortality estimates worldwide, broken down by country, region, and income level.  
   - Type: File (CSV/XLSX)  

### How Datasets Relate
- **U.S. datasets (CDC WONDER + NCHS):** Provide detailed national and regional mortality ratios, allowing analysis of internal disparities.  
- **WHO dataset:** Provides global context, enabling comparison of the U.S. to other countries and regions.  
- **Shared variables:** Time (year) and geography (country or region) will allow for comparative analysis.  


## Approach and Analysis
*What is your approach to answering your project question?*
*How will you use the identified data to answer your project question?*
📝 <!-- Start Discussing the project here; you can add as many code cells as you need -->

1. Import and clean all three datasets into consistent tables.  
2. Analyze U.S. trends nationally and by census region.  
3. Compare U.S. data against WHO’s global/regional estimates.  
4. Visualize using line charts, bar charts, and maps.  
5. Summarize key disparities and possible drivers like healthcare access and socioeconomic factors.  
6. Draw conclusions and suggest policy implications based on findings.

In [30]:
import pandas as pd
from pathlib import Path
# CDC WONDER data
p = Path("data")
wonder_path = p / "Multiple Cause of Death, 2018-2023, Single Race.xls"

try:
    cdc_wonder_raw = pd.read_csv(wonder_path, sep="\t", engine="python")
except Exception:
    cdc_wonder_raw = pd.read_excel(wonder_path)

# Clean out "Notes" rows
first_col = cdc_wonder_raw.columns[0]
cdc_wonder = cdc_wonder_raw[
    ~cdc_wonder_raw[first_col].astype(str).str.contains("Notes", na=False)
].copy()

print("CDC WONDER loaded:", cdc_wonder.shape)
cdc_wonder.head()

CDC WONDER loaded: (65, 6)


Unnamed: 0,Notes,Census Region,Census Region Code,Deaths,Population,Crude Rate
0,,Census Region 1: Northeast,CENS-R1,4714.0,173164600.0,2.7
1,,Census Region 2: Midwest,CENS-R2,7390.0,207616300.0,3.6
2,,Census Region 3: South,CENS-R3,15580.0,388209100.0,4.0
3,,Census Region 4: West,CENS-R4,6387.0,235537400.0,2.7
4,Total,,,34071.0,1004527000.0,3.4


In [29]:
from pathlib import Path
import pandas as pd

p = Path("data")

# NCHS PRMR 
nchs_path = p / "Pregnancy-related mortality ratio in the United States.csv"
nchs = pd.read_csv(nchs_path)

print("NCHS PRMR loaded:", nchs.shape)
nchs.head()


NCHS PRMR loaded: (37, 3)


Unnamed: 0,Year,Mortality ratio,Counts
0,1987,7.2,275
1,1988,9.4,369
2,1989,9.8,394
3,1990,10.0,415
4,1991,10.3,424


In [31]:
!pip install openpyxl -q
import pandas as pd
from pathlib import Path

# WHO data
who_dir = Path("data/WHO")

who_estimates = pd.read_csv(who_dir / "estimates.csv")
who_gdp = pd.read_excel(who_dir / "gdp_WORLDBANK_2022.07.11.xlsx")
who_main = pd.read_csv(who_dir / "main_data.csv")

print("WHO Estimates:", who_estimates.shape)
print("WHO GDP:", who_gdp.shape)
print("WHO Main Data:", who_main.shape)

display(who_estimates.head())
display(who_gdp.head())
display(who_main.head())


WHO Estimates: (79920, 7)
WHO GDP: (221, 42)
WHO Main Data: (4044, 104)


Unnamed: 0,iso_alpha_3_code,year_mid,parameter,0.1,0.5,0.9,estimate_version
0,AFG,1985,mmr,1033.904989,1910.341533,3041.569562,estimates_12-19-22
1,AFG,1985,pm,0.212203,0.392086,0.624263,estimates_12-19-22
2,AFG,1985,maternal_deaths,5552.069789,10258.534032,16333.228548,estimates_12-19-22
3,AFG,1985,mmr_rate,0.002378,0.004394,0.006996,estimates_12-19-22
4,AFG,1985,pm_mean,0.409517,0.409517,0.409517,estimates_12-19-22


Unnamed: 0.1,Unnamed: 0,Last Update,2022-07-05 00:00:00,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41
0,,,,,,,,,,,...,,,,,,,,,,
1,,Year End: YR2021,,,,,,,,,...,,,,,,PPP Reference Year,,,,
2,,,,,,DO NOT USE ESTIMATES,,,,,...,,,,,Section 1 Formula,Formula,Section 2 Formula,,,
3,WHO Ind,Country Name,Country Code,Series Name,Series Code,YR1985,YR1986,YR1987,YR1988,YR1989,...,YR2012,YR2013,YR2014,YR2015,YR2016,YR2017,YR2018,YR2019,YR2020,YR2021
4,WHO,Afghanistan,AFG,GDP per capita,,3401.708996,3542.885068,3318.436761,3024.420817,2742.97948,...,2075.491614,2116.465258,2102.384604,2068.265904,2057.067978,2058.400221,2033.804389,2065.036235,1970.560169,


Unnamed: 0,iso_alpha_3_code,year_start,year_end,year_mid,env_total,env_mat,truemat_vr,truemat,truemat_vr_unscaled,truemat_unscaled,...,mdeathspluslate,ramos...yes.1.,comments,env_total_lifetables,live_births_lifetables,final_mmr,final_pm_before_crisis,final_mmr_before_crisis,env_total_calculated_from_lifetables,live_births_calculated_from_birthsdata
0,AUS,1985.0,1997.0,1991.0,,250.0,250.0,,250.0,,...,,,,,,8.2e-05,0.006147,8.2e-05,40628.0,3035000.0
1,AUS,1994.0,1997.0,1995.5,,80.0,80.0,80.0,80.0,80.0,...,,,,,,0.000104,0.007782,0.000104,10280.0,766000.0
2,AUS,1997.0,2000.0,1998.5,,54.0,54.0,,54.0,,...,,,,,,7.2e-05,0.005041,7.2e-05,10721.0,747000.0
3,AUS,2000.0,2003.0,2001.5,,44.0,44.0,,44.0,,...,,,,,,5.9e-05,0.004332,5.9e-05,10163.0,745000.0
4,AUS,2006.0,2011.0,2008.5,,63.0,63.0,,63.0,,...,,,,,,4.3e-05,0.003871,4.3e-05,16191.0,1453000.0


## Resources and References
*What resources and references have you used for this project?*
📝 <!-- Answer Below -->
- **CDC WONDER – Multiple Cause of Death (2018–2023)**  
  Public database providing U.S. maternal mortality–related deaths by census region, with crude rates per 100,000 population.  
  [CDC WONDER](https://wonder.cdc.gov/)  

- **National Center for Health Statistics (NCHS)**  
  Pregnancy-related mortality ratio data for the United States.  
  [NCHS Maternal Mortality Data](https://www.cdc.gov/nchs/maternal-mortality/index.htm)  

- **World Health Organization (WHO) & UNICEF**  
  Global maternal mortality estimates by country, region, and income level.  
  [WHO Maternal Mortality Data](https://www.who.int/data/gho/data/themes/maternal-and-reproductive-health)  

- **Background Reference**  
  World Health Organization (2023). *Trends in Maternal Mortality 2000–2020*. Geneva: WHO, UNICEF, UNFPA, World Bank Group, UNDESA/Population Division.  


In [34]:
# ⚠️ Make sure you run this cell at the end of your notebook before every submission!
!jupyter nbconvert --to python source.ipynb

usage: jupyter [-h] [--version] [--config-dir] [--data-dir] [--runtime-dir]
               [--paths] [--json] [--debug]
               [subcommand]

Jupyter: Interactive Computing

positional arguments:
  subcommand     the subcommand to launch

options:
  -h, --help     show this help message and exit
  --version      show the versions of core jupyter packages and exit
  --config-dir   show Jupyter config dir
  --data-dir     show Jupyter data dir
  --runtime-dir  show Jupyter runtime dir
  --paths        show all Jupyter paths. Add --json for machine-readable
                 format.
  --json         output paths as machine-readable json
  --debug        output debug information about paths

Available subcommands: kernel kernelspec migrate run troubleshoot

Jupyter command `jupyter-nbconvert` not found.
