# Georgia LDU Closure Analysis (2012-2016)

In [None]:
import pandas as pd

## Datasets

### Regional Data (Counties and PCSAs)

- Data from [*OASIS*](https://oasis.state.ga.us) is used to obtain birth and population counts in 2001 and 2011 by county.
- Data from the [*Office of Management and Budget (OMB)*](https://obamawhitehouse.archives.gov/sites/default/files/omb/bulletins/2013/b-13-01.pdf) is used to identify counties contained in the Atlanta-Sandy Springs-Roswell Metropolitan Statistical Area (MSA) based on the 2010 Census (see page 23).
- Data from the [*U.S. Census Bureau*](https://data.census.gov/cedsci/table?q=&t=Income%20and%20Poverty&g=0400000US13%240500000&y=2011&tid=ACSST5Y2011.S1903) is used to obtain 2011 median household income by county.
- Data from the [*Georgia Board of Health Care Workforce*](https://healthcareworkforce.georgia.gov/basic-physician-needs-reports-pcsa-primary-care-service-area) in the year 2008 is used to map counties to PCSAs.

In [None]:
counties = pd.read_csv('data/counties.csv')
display(counties)

### Labor & Delivery Units

- Data from *Georgia Maternal and Infant Health Research Group (GMIHRG)* (private data source) is used to identify the LDUs of interest and their birth counts in 2008, 2011, and 2012; numbers of OBs, FPs, and CNMs in 2011 and 2016; and average ages of OBs in 2011 and 2016.
- Data from the *Emory MCH Linked Vital Records Data Repository* (private data source) is used to obtain 2001 and 2011 number of births per-LDU to residents and non-residents of the county the LDU is in. It is also the source of LDU names that we consider standard.
- Data from the [*U.S. Census Bureau*]() is used to identify urban areas in 2010.
- Data from [*Google Maps*](https://www.google.com/maps/d/u/0/edit?mid=1_xMZrJgPbcInCcq8CgdmwuncWMWSOoJj&usp=sharing) is used to identify, for each LDU, the closest (other) LDU (within Georgia), the number of driving miles to the closest LDU, the closest urban area (in any state), and the number of driving miles to the closest urban area in 2011.

In [None]:
ldus = pd.read_csv('data/ldus.csv')
display(ldus)

### Patients

- Data from the *Emory MCH Linked Vital Records Data Repository* (private data source) is used to identify per-patient birth data for births in 2011 by birthing LDU, payor status, race, ethnicity, and county of residence.

In [None]:
patients = pd.read_csv('data/patients.csv')
display(patients)

## Derived Columns

Based on the raw data above, we derive a series of new columns at the LDU and PCSA levels.

### Inclusion Criteria for Rural PCSAs

PCSAs included in the sample are *rural*, meaning that in 2011:

1. They did not contain any counties that were within the Atlanta MSA.
2. They did not contain any counties with population at least 50,000.
3. They contained exactly one LDU.

In [None]:
# Construct a DataFrame of 96 PCSAs and a groupby of counties by PCSA.
pcsas = pd.DataFrame({'PCSA' : [x+1 for x in range(96)]})
countybypcsa = counties.groupby('PCSA', as_index=False)

# Identify PCSAs that have no counties in the Atlanta MSA.
pcsas['Inc. AMSA'] = (countybypcsa['In MSA (2010)'].sum()['In MSA (2010)'] == 0)

# Identify PCSAs whose counties all have population strictly less than 50K.
pcsas['Inc. Pop.'] = (countybypcsa['Population (2011)'].max()['Population (2011)'] < 50000)

# Identify PCSAs containing exactly one LDU.
df = ldus.groupby('County', as_index=False).size()
df = counties.join(df.set_index('County'), on='County')
df = df.groupby('PCSA', as_index=False)['size'].sum()
pcsas = pcsas.join(df.set_index('PCSA'), on='PCSA')
pcsas['Inc. Single LDU'] = (pcsas['size'] == 1)
del pcsas['size']

# Determine which PCSAs are in sample.
pcsas['In Sample'] = pcsas['Inc. AMSA'] & pcsas['Inc. Pop.'] & pcsas['Inc. Single LDU']
display(pcsas)
print(pcsas.groupby('In Sample').size())

Thus, we determine that there are 30 PCSAs that meet our inclusion criteria; the other 66 do not.

### Aggregate Birth Volume and Population Demographics by PCSA

We additionally calculate, per-PCSA, the aggregate number of births (2001 and 2011), population (2001 and 2011), female population (2011), Black female population (2011), White female population (2011), and household income (2011). Median household income is available on a per-county basis; to calculate a PCSA's household income, we take a weighted average of its counties' median household incomes weighted by each county's proportion of the PCSA population. Mathematically, for a PCSA $p$ containing counties $c_1, \ldots, c_k$ we have:
$$
income(p) = \sum_{i=1}^k \left(\frac{population(c_i)}{population(p)}\right) \cdot income(c_i)
$$

In [None]:
# Calculate the number of births per-PCSA in 2001 and 2011, the total population
# per-PCSA in 2001 and 2011, and the female, Black female, and White female
# populations per-PCSA in 2011.
for count in ['# Births (2001)', '# Births (2011)', 'Population (2001)', \
              'Population (2011)', 'Females 15-44 (2011)', \
              'Black Females 14-45 (2011)', 'White Females 14-45 (2011)']:
    pcsas = pcsas.join(countybypcsa[count].sum().set_index('PCSA'), on='PCSA')
    
# Calculate the median household income per PCSA using population-weighted
# proportions by county.
incprod = counties.groupby('PCSA')\
                  .apply(lambda x: (x['Population (2011)'] * \
                                    x['Median Household Income (2011)']).sum())\
                  .to_frame('incprod').reset_index()
pcsas = pcsas.join(incprod.set_index('PCSA'), on='PCSA')
pcsas['Household Income (2011)'] = pcsas['incprod'] / pcsas['Population (2011)']
del pcsas['incprod']

display(pcsas)

### Provider Count and Load by LDU

In addition to the raw LDU data, we also calculate the number of OB equivalents per LDU in 2011 and the number of births per provider (i.e., OB equivalent) in 2011. An OB equivalent is calculated as:
$$
(\#OBs) + \frac{1}{1.55} \cdot (\#CNMs) + \frac{0.7}{1.55} \cdot (\#FPs)
$$

In [None]:
ldus['OB Equiv. (2011)'] = ldus['# OBs (2011)'] + (1/1.55) * ldus['# CNMs (2011)']\
                           + (0.7/1.55) * ldus['# FPs (2011)']
ldus['# Births per Provider (2011)'] = ldus['# Births (2011)'] / ldus['OB Equiv. (2011)']
display(ldus)

### Patient Payor Types and Groups

Finally, we aggregate different payor statuses into types and groups according to the dictionaries below.

In [None]:
payortypes = {'Unknown': 'Other/Unknown',
              'Champus': 'Commercial/Employer-Based',
              'Medicaid': 'Medicaid',
              'Commercial Insurance': 'Commercial/Employer-Based',
              'Other Government Assistance': 'Other Govt.',
              'Other': 'Other/Unknown',
              'Self Pay': 'Self Pay'}

payorgroups = {'Commercial/Employer-Based': 'Commercial/Employer-Based',
               'Medicaid': 'Assistance/Self Pay',
               'Other Govt.': 'Assistance/Self Pay',
               'Self Pay': 'Assistance/Self Pay',
               'Other/Unknown': 'Other/Unknown'}

patients['Payor Type'] = patients['Payor'].map(lambda x: payortypes[x])
patients['Payor Group'] = patients['Payor Type'].map(lambda x: payorgroups[x])
display(patients)