## Week 3 Coding Tasks

This week, you will build off of your work from last week.
1. First, (if you haven't done so already), create a new column indicating the SHADAC classification for each Tennessee county. You can use the following code to accomplish this:
```
physicians.loc[physicians['residents_per_pcp'] < 1500, 'shadac_category'] = 'adequate'
physicians.loc[(physicians['residents_per_pcp'] >= 1500) & 
          (physicians['residents_per_pcp'] < 3500), 'shadac_category'] = 'moderately inadequate'
physicians.loc[(physicians['residents_per_pcp'] >= 3500), 'shadac_category'] = 'low inadequate'
```

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
physicians = pd.read_csv('../data/primary_care_physicians.csv')

physicians = physicians[physicians['state'] == 'Tennessee']

population = pd.read_csv('../data/population_by_county.csv')

physicians = pd.merge(left = physicians[['FIPS', 'county', 'primary_care_physicians']],
        right = population[['FIPS', 'population', 'urban']])

physicians['residents_per_pcp'] = physicians['population'] / physicians['primary_care_physicians']

In [None]:
physicians.loc[physicians['residents_per_pcp'] < 1500, 'shadac_category'] = 'adequate'
physicians.loc[(physicians['residents_per_pcp'] >= 1500) & 
       (physicians['residents_per_pcp'] < 3500), 'shadac_category'] = 'moderately inadequate'
physicians.loc[(physicians['residents_per_pcp'] >= 3500), 'shadac_category'] = 'low inadequate'

2. Use this new column to investigate if there is any relationship between a county's status as urban or rural and its SHADAC classification. Create a plot showing what you find.

In [None]:
# Note the .loc to sort the categories

pd.crosstab(physicians['shadac_category'],
            physicians['urban'],
            normalize = 'index').loc[['adequate', 'moderately inadequate', 'low inadequate']].plot(kind = 'bar', 
                                                                    stacked = True,
                                                                   edgecolor = 'black')

plt.legend(bbox_to_anchor = (1, 0.75));

3. Merge the unemployment data (contained in tn_unemployment.csv) into the physicians DataFrame.

In [None]:
unemployment = pd.read_csv('../data/tn_unemployment.csv')
unemployment.head(2)

In [None]:
unemployment['Name'] = unemployment['Name'].str.split(' County', expand = True).loc[:,0]
unemployment.head(2)

In [None]:
physicians = pd.merge(left = physicians,
        right = unemployment[['Name', 'unemployment_rate']].rename(columns = {'Name': 'county'}))

4. How do unemployment rates compare for urban counties versus rural counties?

In [None]:
physicians.groupby('urban')['unemployment_rate'].describe()

In [None]:
sns.boxplot(data = physicians, x = 'urban', y = 'unemployment_rate');

5. Create a new column, `pcp_per_100k` which contains the number of primary care physicians per 100,000 residents. Investigate the relationship between this new measure and the unemployment rate per county. What do you find?

In [None]:
physicians['pcp_per_100k'] = physicians['primary_care_physicians'] / physicians['population'] * 100000

In [None]:
physicians.plot(kind = 'scatter', x = 'unemployment_rate', y = 'pcp_per_100k');