# Mental Health HPSA data analysis

Introduction to Data Analysis, Reporting 2  
Group 7: Julia Ingram, Chuqin Jiang, Richard Abbey  

This workbook examines a dataset of federally designated mental health healthcare provider shortage areas (HPSAs), downloaded on 10/18/21 from the Health Resources & Service Administration's website (https://data.hrsa.gov/data/download). An explanation of the shortage areas and how they are designated can be found in the federal register, here: https://www.ecfr.gov/current/title-42/chapter-I/subchapter-A/part-5

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.options.display.max_rows = 100
pd.options.display.max_columns = 100

In [5]:
#Reading in the HPSA file
HPSAs = pd.read_csv('BCD_HPSA_FCT_DET_MH.csv')

## Data cleaning

In [None]:
HPSAs.dtypes #Understanding the datatypes

We'll want the dates as datetime64 so we can work with them

In [6]:
HPSAs['HPSA Designation Date'] = HPSAs['HPSA Designation Date'].astype(np.datetime64)
HPSAs['Withdrawn Date'] = HPSAs['Withdrawn Date'].astype(np.datetime64)

We'll also want the respective 'Year' column for designation date as an integer rather than a string

In [7]:
HPSAs['HPSA Designation Year'] = HPSAs['HPSA Designation Year'].astype(int)

### Separating different HPSA Types
Subsetting the data by type: geographic, population and facility HPSA

In [8]:
facilityTypes = ['Correctional Facility', 'Federally Qualified Health Center', 'Federally Qualified Health Center Look A Like',
                 'Indian Health Service, Tribal Health, and Urban Indian Health Organizations', 'Rural Health Clinic',
                 'Other Facility', 'State Mental Hospital']
geoTypes = ['Geographic HPSA', 'High Needs Geographic HPSA']

In [9]:
facilityHPSAs = HPSAs[HPSAs['Designation Type'].isin(facilityTypes)==True]
geoHPSAs = HPSAs[HPSAs['Designation Type'].isin(geoTypes)==True]
popHPSAs = HPSAs[HPSAs['Designation Type']=='HPSA Population']

## Answering specific questions

### What is the total estimated underserved population at present?

Using both HPSA geographic and population designations, which don't geographically overlap, but excluding facilities, which do

In [10]:
HPSAsClean = HPSAs.drop_duplicates(subset='HPSA ID')
HPSAsClean = HPSAsClean[(HPSAsClean['Designation Type'].isin(geoTypes)==True) | (HPSAsClean['Designation Type']=='HPSA Population')]

In [11]:
years = [2015, 2016, 2017, 2018, 2019, 2020, 2021]

In [12]:
populationFigs = []
for year in years:
    value = HPSAsClean[(HPSAsClean['HPSA Designation Year']<=year) 
                       & ((HPSAsClean['Withdrawn Date']>= np.datetime64(str(year)+'-12-31')) 
                          |HPSAsClean['Withdrawn Date'].isnull() == True)]['HPSA Designation Population'].sum()
    populationFigs.append(value)

In [13]:
populationFigs

[96923783.0,
 105031580.0,
 120675494.0,
 112816930.0,
 114907038.0,
 119746344.0,
 128691227.0]

A percent of the total population using "Population Figs" was calculated in Excel using population data from World Population Review, then copied into Datawrapper

### How many of the non-rural, low-income HPSAs were designated during the pandemic?

In [14]:
newLIPops = HPSAs[(HPSAs['Designation Type']=='HPSA Population')
                  & (HPSAs['HPSA Designation Date']>=np.datetime64('2020-03-01'))
                  & (HPSAs['HPSA Status']=='Designated')
                  & (HPSAs['HPSA Population Type'].str.contains('Low Income'))]

In [15]:
len(newLIPops)

1436

In [16]:
allLIPops = HPSAs[(HPSAs['Designation Type']=='HPSA Population')
                  & (HPSAs['HPSA Status']=='Designated')
                  & (HPSAs['HPSA Population Type'].str.contains('Low Income'))]

In [17]:
len(allLIPops)

2869

In [18]:
len(newLIPops)/len(allLIPops)

0.5005228302544441