# Georgia LDU Closure Analysis (2012-2016)

In [None]:
import pandas as pd

## Datasets

### Regional Data (Counties and PCSAs)

- Data from [*OASIS*](https://oasis.state.ga.us) is used to obtain birth and population counts in 2001 and 2011 by county.
- Data from the [*Office of Management and Budget (OMB)*](https://obamawhitehouse.archives.gov/sites/default/files/omb/bulletins/2013/b-13-01.pdf) is used to identify counties contained in the Atlanta-Sandy Springs-Roswell Metropolitan Statistical Area (MSA) based on the 2010 Census (see page 23).
- Data from the [*U.S. Census Bureau*](https://data.census.gov/cedsci/table?q=&t=Income%20and%20Poverty&g=0400000US13%240500000&y=2011&tid=ACSST5Y2011.S1903) is used to obtain 2011 median household income by county.
- Data from the [*Georgia Board of Health Care Workforce*](https://healthcareworkforce.georgia.gov/basic-physician-needs-reports-pcsa-primary-care-service-area) in the year 2008 is used to map counties to PCSAs.

In [None]:
dtypes = {'County': str, 'PCSA': int, 'In MSA (2010)': int, '# Births (2001)': int, '# Births (2011)': int, \
          'Population (2001)': int, 'Population (2011)': int, 'Females 15-44 (2011)': int, \
          'Black Females 15-44 (2011)': int, 'White Females 15-44 (2011)': int, \
          'Median Household Income (2011)': int}
counties = pd.read_csv('data/counties.csv', dtype=dtypes)
display(counties)

### Labor & Delivery Units

- Data from *Georgia Maternal and Infant Health Research Group (GMIHRG)* (private data source) is used to identify the LDUs of interest and their birth counts in 2008, 2011, and 2012; numbers of OBs, FPs, and CNMs in 2011 and 2016; and average ages of OBs in 2011 and 2016.
- Data from the *Emory MCH Linked Vital Records Data Repository* (private data source) is used to obtain 2001 and 2011 number of births per-LDU to residents and non-residents of the county the LDU is in. It is also the source of LDU names that we consider standard.
- Data from the [*U.S. Census Bureau*]() is used to identify urban areas in 2010.
- Data from [*Google Maps*](https://www.google.com/maps/d/u/0/edit?mid=1_xMZrJgPbcInCcq8CgdmwuncWMWSOoJj&usp=sharing) is used to identify, for each LDU, the closest (other) LDU (within Georgia), the number of driving miles to the closest LDU, the closest urban area (in any state), and the number of driving miles to the closest urban area in 2011.

In [None]:
ldus = pd.read_csv('data/ldus.csv')
ldus

### Patients

- Data from the *Emory MCH Linked Vital Records Data Repository* (private data source) is used to identify per-patient birth data for births in 2011 by birthing LDU, payor status, race, ethnicity, and county of residence.

In [None]:
patients = pd.read_csv('data/patients.csv')
patients

## Derived Columns

Based on the raw data above, we derive a series of new columns at the LDU and PCSA levels.

### Inclusion Criteria for Rural PCSAs

PCSAs included in the sample are *rural*, meaning that in 2011:

1. They did not contain any counties that were within the Atlanta MSA.
2. They did not contain any counties with population at least 50,000.
3. They contained exactly one LDU.

In [None]:
# Construct a DataFrame of 96 PCSAs.
pcsas = pd.DataFrame({'PCSA' : [x+1 for x in range(96)]})

# Count how many counties are in the Atlanta MSA per PCSA.
include1 = counties.groupby('PCSA', as_index=False)['In MSA (2010)'].sum()
pcsas['Inc. AMSA'] = (include1['In MSA (2010)'] == 0)

# Mark all PCSAs whose counties all have population strictly less than 50K.
include2 = counties.groupby('PCSA', as_index=False)['Population (2011)'].max()
pcsas['Inc. Population (2011)'] = (include2['Population (2011)'] < 50000)

# Count how many LDUs are in each PCSA.
include3 = ldus.groupby('County', as_index=False).size()
include3 = counties.join(include3.set_index('County'), on='County')
include3 = include3.groupby('PCSA', as_index=False)['size'].sum()
pcsas = pcsas.join(include3.set_index('PCSA'), on='PCSA')
pcsas['Inc. Single LDU'] = (pcsas['size'] == 1)
del pcsas['size']

# Determine which PCSAs are in sample.
pcsas['In Sample'] = pcsas['Inc. AMSA'] & pcsas['Inc. Population (2011)'] & pcsas['Inc. Single LDU']
display(pcsas)

Thus, we determine that there are 30 PCSAs that meet our inclusion criteria; the other 66 do not.