# COVID-19 Effect on Mortality and Fertility in Orange County, California

### Motivation  
The datafication of the pandemic has allowed the public access to more health-related data at a granular level than ever before. The county that I was assigned, Orange County, California,  had quite a high number of cases given the high population of the county. The lockdowns and mask mandates enforced did prevent people from leaving their homes and the spread of other diseases like the flu, and likely even other kinds of death. However, at the same time, with hospitals being overloaded and sometimes even needing to turn away patients, I wonder how many deaths there were in excess of what is typical to see in Orange County as this may provide evidence that there were actually more COVID-related deaths than is officially being reported, since the healthcare industry may not want to use tests on already deceased persons. Essentially, this question will help answer if we can be certain that there were more deaths due to the pandemic than what the “official” count may suggest, or if the pandemic reduced other causes of death such as from accidents.  

Another angle here is that COVID added a lot of uncertainty to the world. This had many adverse effects on people’s lives, and I am curious to see how this affected fertility. Perhaps during these stressful times, fewer people were confident in the future and their ability to support a child, or perhaps quarantine gave people the opportunity to move forward with their family planning because of the flexibility that working from home provided. I would like to inspect the pandemic and its effects on the lives of people in Orange County, CA from the angle of effect on mortality and fertility.  

### Hypothesis  
My hypothesis related to mortality is that all-cause mortality was higher during the pandemic than pre-pandemic, indicating that there were more deaths due to COVID-19 than were being reported. My hypothesis relating to fertility is that there were fewer births during the pandemic compared to pre-pandemic due to the stress and uncertainty it caused.  

### Data  
- The RAW_us_confirmed_cases.csv file from the [Kaggle repository of John Hopkins University COVID-19 data.](https://www.kaggle.com/antgoldbloom/covid19-data-from-john-hopkins-university?select=RAW_us_confirmed_cases.csv)  
- The RAW_us_deaths.csv file from the [Kaggle repository of John Hopkins University COVID-19 data.](https://www.kaggle.com/antgoldbloom/covid19-data-from-john-hopkins-university?select=RAW_us_deaths.csv)  
- Deaths from [California Health & Human Services Department](https://data.chhs.ca.gov/dataset/death-profiles-by-county/resource/2e546f88-bba8-4d77-846a-7fb77846cac6)  
- Births from [California Health & Human Services Department](https://data.chhs.ca.gov/dataset/live-birth-profiles-by-county/resource/94a186da-7184-427b-b0b5-224bb0a05cd8)

In [1]:
# Set up file
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:60% !important; }</style>"))

# Import libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Show all columns
pd.set_option('display.max_columns', None)

# Set filepaths
RAW_DATA_PATH = '../data_raw/'
CLEAN_DATA_PATH = '../data_clean/'
RESULTS_PATH = '../results/'

In [2]:
# Import data assuming we are in src folder
cases_raw = pd.read_csv(os.path.join(RAW_DATA_PATH, 'RAW_us_confirmed_cases.csv/RAW_us_confirmed_cases.csv'))
covid_deaths_raw = pd.read_csv(os.path.join(RAW_DATA_PATH, 'RAW_us_deaths.csv'))

# identify Orange County, CA FIPs code
fip = cases_raw.loc[(cases_raw['Province_State'] == 'California') & (cases_raw['Admin2'] == 'Orange'), 'FIPS'].values[0]
print("Orange County FIP: ", str(fip))

# Pivot cases and deaths data to long format
cases = pd.melt(cases_raw, id_vars = cases_raw.columns[0:11], var_name = 'date', value_name = 'cases')
cases['date'] = pd.to_datetime(cases['date'], format = '%m/%d/%y')
cases = cases.loc[:, ['Province_State', 'Admin2', 'FIPS', 'date', 'cases']]
covid_deaths = pd.melt(covid_deaths_raw, id_vars = covid_deaths_raw.columns[0:12], var_name = 'date', value_name = 'covid_deaths')
covid_deaths['date'] = pd.to_datetime(covid_deaths['date'], format = '%m/%d/%y')
# Drop common columns from covid_deaths
covid_deaths = covid_deaths.loc[:, ['FIPS', 'Population', 'date', 'covid_deaths']]

# Print column names
print(cases.columns)
print(covid_deaths.columns)

Orange County FIP:  6059.0
Index(['Province_State', 'Admin2', 'FIPS', 'date', 'cases'], dtype='object')
Index(['FIPS', 'Population', 'date', 'covid_deaths'], dtype='object')


In [3]:
# Combine cases and deaths into one df
covid_df = cases.merge(covid_deaths, how = 'outer', on = ['FIPS', 'date'])
covid_df.to_csv(os.path.join(CLEAN_DATA_PATH, 'us_covid_cases-deaths.csv'), index = False)
covid_df

Unnamed: 0,Province_State,Admin2,FIPS,date,cases,Population,covid_deaths
0,Alabama,Autauga,1001.0,2020-01-22,0.0,55869,0
1,Alabama,Baldwin,1003.0,2020-01-22,0.0,223234,0
2,Alabama,Barbour,1005.0,2020-01-22,0.0,24686,0
3,Alabama,Bibb,1007.0,2020-01-22,0.0,22394,0
4,Alabama,Blount,1009.0,2020-01-22,0.0,57826,0
...,...,...,...,...,...,...,...
2337649,,,56039.0,2021-12-03,,23464,14
2337650,,,56041.0,2021-12-03,,20226,31
2337651,,,90056.0,2021-12-03,,0,0
2337652,,,56043.0,2021-12-03,,7805,36


In [4]:
# Aggregate data to monthly level instead of daily level to match granularity from CHHS data
covid_df.set_index('date', inplace = True)
covid = covid_df.groupby([pd.Grouper(freq='MS'), 'FIPS', 'Province_State', 'Admin2'])\
    .agg({'cases': np.sum, 'covid_deaths': np.sum, 'Population': np.mean})\
    .reset_index()
covid

Unnamed: 0,date,FIPS,Province_State,Admin2,cases,covid_deaths,Population
0,2020-01-01,1001.0,Alabama,Autauga,0.0,0,55869
1,2020-01-01,1003.0,Alabama,Baldwin,0.0,0,223234
2,2020-01-01,1005.0,Alabama,Barbour,0.0,0,24686
3,2020-01-01,1007.0,Alabama,Bibb,0.0,0,22394
4,2020-01-01,1009.0,Alabama,Blount,0.0,0,57826
...,...,...,...,...,...,...,...
73167,2021-10-01,90051.0,Virginia,Unassigned,0.0,0,0
73168,2021-10-01,90053.0,Washington,Unassigned,48413.0,111,0
73169,2021-10-01,90054.0,West Virginia,Unassigned,0.0,0,0
73170,2021-10-01,90055.0,Wisconsin,Unassigned,0.0,0,0


In [38]:
# Get data from CHHS of deaths and drop extract date column
deaths_2021 = pd.read_csv('https://data.chhs.ca.gov/dataset/58619b69-b3cb-41a7-8bfc-fc3a524a9dd4/resource/2e546f88-bba8-4d77-846a-7fb77846cac6/download/2021-11_deaths_provisional_county_month_sup.csv')
deaths_19_20 = pd.read_csv('https://data.chhs.ca.gov/dataset/58619b69-b3cb-41a7-8bfc-fc3a524a9dd4/resource/078185a9-e3a7-403f-a546-b8c582f0a9d8/download/2021-11-29_deaths_final_2019-2020_occurrence_county_month_sup.csv')
deaths_chhs = pd.concat([deaths_2021, deaths_19_20], join = 'outer')

# Drop useless columns
# Filter strata for "Total Population"
# Filter for Orange County
# Filter for All causes (total), Accidents (unintentional injuries), Intentional self-harm (suicide)
deaths_chhs = deaths_chhs.loc[(deaths_chhs['Strata'] == 'Total Population') & 
                              (deaths_chhs['County'] == 'Orange') & 
                              (deaths_chhs['Cause'].isin(['ALL', 'INJ', 'SUI'])), ]\
    .drop(['Geography_Type', 'Strata', 'Strata_Name', 'Data_Extract_Date', 'Annotation_Code', 'Annotation_Desc', 'Cause'], axis = 1)

In [39]:
# Create new date column to join on
deaths_chhs['date'] = pd.to_datetime(deaths_chhs['Year'].astype(str) + '/' + deaths_chhs['Month'].astype(str) + '/01',
                                   format = '%Y/%m/%d')
deaths_chhs.drop(['Year', 'Month'], axis = 1, inplace = True)
deaths_chhs

Unnamed: 0,County,Cause_Desc,Count,date
5278,Orange,All causes (total),4290.0,2021-01-01
5383,Orange,Accidents (unintentional injuries),136.0,2021-01-01
5449,Orange,Intentional self-harm (suicide),24.0,2021-01-01
15834,Orange,All causes (total),2430.0,2021-02-01
15939,Orange,Accidents (unintentional injuries),115.0,2021-02-01
...,...,...,...,...
237615,Orange,Accidents (unintentional injuries),109.0,2020-11-01
237681,Orange,Intentional self-harm (suicide),28.0,2020-11-01
248066,Orange,All causes (total),3330.0,2020-12-01
248171,Orange,Accidents (unintentional injuries),107.0,2020-12-01


In [37]:
deaths_chhs['Annotation_Desc'].unique()

array([nan], dtype=object)