First, you'll work with a dataset containing the number of primary care physicians per county for each county in the United States. It was obtained from the Area Health Resources File, published by the [Health Resources and Services Administration](https://data.hrsa.gov/topics/health-workforce/ahrf). This data is contained in the file `primary_care_physicians.csv`.

Second, the file `population_by_county.csv` contains the Census Bureau's 2019 population estimates for each US County. It also contains a column `urban`. The `urban` column uses data from the National Bureau of Economic Research to classify each county as either urban or rural. The U.S. Office of Management and Budget designates counties as metropolitan (a core urban area of 50,000 or more population), micropolitan (an urban core of at least 10,000 but less than 50,000 population), or neither. Here, a county is considered "urban" if it is part of a metropolitan or micropolitan area and "rural" if it is not.

In [1]:
import pandas as pd

1. First, import the primary care physicians dataset (`primary_care_physicians.csv`) into a data frame named `physicians`. 
 2. Filter `physicians` down to just the counties in Tennessee. Save the filtered dataframe back to `physicians`. Verify that the resulting dataframe has 95 rows.

In [2]:
physicians = pd.read_csv('../data/primary_care_physicians.csv')

In [3]:
physicians.head()

Unnamed: 0,FIPS,state,county,primary_care_physicians
0,1001,Alabama,Autauga,26.0
1,1003,Alabama,Baldwin,153.0
2,1005,Alabama,Barbour,8.0
3,1007,Alabama,Bibb,12.0
4,1009,Alabama,Blount,12.0


In [4]:
physicians = physicians.loc[physicians['state'] == 'Tennessee']

In [5]:
physicians = physicians[['county', 'primary_care_physicians']]

In [6]:
physicians

Unnamed: 0,county,primary_care_physicians
2432,Anderson,39.0
2433,Bedford,15.0
2434,Benton,3.0
2435,Bledsoe,1.0
2436,Blount,90.0
...,...,...
2522,Wayne,5.0
2523,Weakley,18.0
2524,White,9.0
2525,Williamson,338.0


Look at the distribution of the number of primary care physicians. What do you notice?



In [7]:
physicians['primary_care_physicians'].value_counts()

2.0      8
5.0      7
9.0      5
0.0      4
12.0     4
4.0      4
18.0     4
1.0      4
3.0      4
21.0     3
6.0      3
15.0     3
19.0     2
14.0     2
26.0     2
23.0     2
38.0     2
8.0      2
39.0     2
22.0     2
55.0     1
806.0    1
13.0     1
91.0     1
520.0    1
27.0     1
53.0     1
11.0     1
137.0    1
82.0     1
199.0    1
52.0     1
90.0     1
403.0    1
16.0     1
40.0     1
37.0     1
7.0      1
30.0     1
226.0    1
338.0    1
10.0     1
43.0     1
129.0    1
665.0    1
17.0     1
Name: primary_care_physicians, dtype: int64

In [8]:
physicians['primary_care_physicians'].nunique()

46

In [9]:
physicians['primary_care_physicians'].unique()

array([ 39.,  15.,   3.,   1.,  90.,  55.,  19.,  12.,  22.,  23.,   4.,
         2.,  16.,  37.,   0.,  40., 665.,   9.,  14.,  10.,  27.,  21.,
        11.,   5.,  38., 403.,  13.,   7.,   6., 520.,  18.,  30.,  26.,
       129.,  53.,  82.,   8.,  52., 137., 806., 199.,  91.,  17., 226.,
       338.,  43.])

In [10]:
physicians.describe()

Unnamed: 0,primary_care_physicians
count,95.0
mean,51.042105
std,129.311426
min,0.0
25%,4.5
50%,12.0
75%,26.5
max,806.0


Most TN counties have few physicians

Now, import the population by county dataset (`population_by_county.csv`) into a DataFrame named `population`.
 5. Merge the `physicians` DataFrame with the `population` DataFrame. Keep only the values for Tennessee. When you merge, be sure the include both the `population` and `urban` columns in the merged results. Save the result of the merge back to `physicians`.
 6. How many Tennessee counties are considered urban?

In [11]:
population = pd.read_csv('../data/population_by_county.csv')

In [12]:
population

Unnamed: 0,FIPS,population,county,state,urban
0,17051,21565,Fayette County,ILLINOIS,Rural
1,17107,29003,Logan County,ILLINOIS,Rural
2,17165,23994,Saline County,ILLINOIS,Rural
3,17097,701473,Lake County,ILLINOIS,Urban
4,17127,14219,Massac County,ILLINOIS,Rural
...,...,...,...,...,...
3197,47033,14399,Crockett County,TENNESSEE,Rural
3198,47095,7401,Lake County,TENNESSEE,Rural
3199,47093,461104,Knox County,TENNESSEE,Urban
3200,53005,197518,Benton County,WASHINGTON,Urban


In [13]:
population['county'].str.split(' County', expand = True)

Unnamed: 0,0,1
0,Fayette,
1,Logan,
2,Saline,
3,Lake,
4,Massac,
...,...,...
3197,Crockett,
3198,Lake,
3199,Knox,
3200,Benton,


In [14]:
population['county']=population['county'].str.split(' County', expand = True)[0]

In [15]:
population

Unnamed: 0,FIPS,population,county,state,urban
0,17051,21565,Fayette,ILLINOIS,Rural
1,17107,29003,Logan,ILLINOIS,Rural
2,17165,23994,Saline,ILLINOIS,Rural
3,17097,701473,Lake,ILLINOIS,Urban
4,17127,14219,Massac,ILLINOIS,Rural
...,...,...,...,...,...
3197,47033,14399,Crockett,TENNESSEE,Rural
3198,47095,7401,Lake,TENNESSEE,Rural
3199,47093,461104,Knox,TENNESSEE,Urban
3200,53005,197518,Benton,WASHINGTON,Urban


In [16]:
population = population.loc[population['state'] == 'TENNESSEE']

In [17]:
population

Unnamed: 0,FIPS,population,county,state,urban
283,47165,183437,Sumner,TENNESSEE,Urban
284,47169,10231,Trousdale,TENNESSEE,Urban
285,47027,7654,Clay,TENNESSEE,Rural
405,47157,936374,Shelby,TENNESSEE,Urban
406,47077,27977,Henderson,TENNESSEE,Rural
...,...,...,...,...,...
3195,47123,46064,Monroe,TENNESSEE,Rural
3196,47079,32284,Henry,TENNESSEE,Rural
3197,47033,14399,Crockett,TENNESSEE,Rural
3198,47095,7401,Lake,TENNESSEE,Rural


In [18]:
population['state'].str.replace('TENNESSEE', 'Tennessee')

283     Tennessee
284     Tennessee
285     Tennessee
405     Tennessee
406     Tennessee
          ...    
3195    Tennessee
3196    Tennessee
3197    Tennessee
3198    Tennessee
3199    Tennessee
Name: state, Length: 95, dtype: object

In [19]:
merge = pd.merge(left = population[['population', 'county', 'state', 'urban']],
         right = physicians[['county', 'primary_care_physicians']])

In [20]:
merge

Unnamed: 0,population,county,state,urban,primary_care_physicians
0,183437,Sumner,TENNESSEE,Urban,91.0
1,10231,Trousdale,TENNESSEE,Urban,2.0
2,7654,Clay,TENNESSEE,Rural,2.0
3,936374,Shelby,TENNESSEE,Urban,806.0
4,27977,Henderson,TENNESSEE,Rural,7.0
...,...,...,...,...,...
90,46064,Monroe,TENNESSEE,Rural,9.0
91,32284,Henry,TENNESSEE,Rural,22.0
92,14399,Crockett,TENNESSEE,Rural,0.0
93,7401,Lake,TENNESSEE,Rural,0.0


In [21]:
physicians = merge

In [22]:
physicians

Unnamed: 0,population,county,state,urban,primary_care_physicians
0,183437,Sumner,TENNESSEE,Urban,91.0
1,10231,Trousdale,TENNESSEE,Urban,2.0
2,7654,Clay,TENNESSEE,Rural,2.0
3,936374,Shelby,TENNESSEE,Urban,806.0
4,27977,Henderson,TENNESSEE,Rural,7.0
...,...,...,...,...,...
90,46064,Monroe,TENNESSEE,Rural,9.0
91,32284,Henry,TENNESSEE,Rural,22.0
92,14399,Crockett,TENNESSEE,Rural,0.0
93,7401,Lake,TENNESSEE,Rural,0.0


In [23]:
urban_count = physicians.loc[physicians['urban'] == 'Urban']

In [24]:
urban_count['urban'].value_counts()

Urban    38
Name: urban, dtype: int64

How many Tennessee counties are considered urban?

38 counties are considered urban

In [25]:
rural_count = physicians.loc[physicians['urban'] == 'Rural']

In [26]:
rural_count['urban'].value_counts()

Rural    57
Name: urban, dtype: int64

The State Health Access Data Assistance Center (SHADAC) (https://www.shadac.org/) classifies counties into three groups based on the number of residents per primary care physician. First, counties with fewer than 1500 residents per primary care physician are considered to have an "adequate" supply. Counties with at least 1500 residents but fewer than 3500 residents per primary care physician are considered to have a "moderately inadequate" supply, and counties with at least 3500 residents per primary care physician are considered to have a "low inadequate" supply. How many counties in Tennessee are in each group? 
 8. Does there appear to be any detectable relationship between whether a county is urban or rural and its supply of primary care physicians?

In [27]:
ratio = physicians['population'] / physicians['primary_care_physicians']

In [28]:
ratio

0     2015.791209
1     5115.500000
2     3827.000000
3     1161.754342
4     3996.714286
         ...     
90    5118.222222
91    1467.454545
92            inf
93            inf
94     886.738462
Length: 95, dtype: float64

In [29]:
adequate = physicians.loc[physicians['population'] / physicians['primary_care_physicians'] <= 1500]

In [30]:
adequate.shape

(14, 5)

In [31]:
moderately_inadequate = physicians.loc[(physicians['population'] / physicians['primary_care_physicians'] >= 1500) &
                                       (physicians['population'] / physicians['primary_care_physicians'] < 3500)]

In [32]:
moderately_inadequate.shape

(50, 5)

In [33]:
low_inadequate = physicians.loc[physicians['population'] / physicians['primary_care_physicians'] >= 3500] 

In [34]:
low_inadequate.shape

(31, 5)

How many counties in Tennessee are in each group? 
There are 14 adequate, 50 moderately inadequate, and 31 low inadequate

Does there appear to be any detectable relationship between whether a county is urban or rural and its supply of primary care physicians?

In [35]:
urban_count

Unnamed: 0,population,county,state,urban,primary_care_physicians
0,183437,Sumner,TENNESSEE,Urban,91.0
1,10231,Trousdale,TENNESSEE,Urban,2.0
3,936374,Shelby,TENNESSEE,Urban,806.0
5,53679,Jefferson,TENNESSEE,Urban,21.0
6,14816,Sequatchie,TENNESSEE,Urban,5.0
8,687488,Davidson,TENNESSEE,Urban,665.0
12,200180,Montgomery,TENNESSEE,Urban,82.0
18,61447,Tipton,TENNESSEE,Urban,17.0
20,360919,Hamilton,TENNESSEE,Urban,403.0
21,16814,Polk,TENNESSEE,Urban,8.0


In [36]:
rural_count

Unnamed: 0,population,county,state,urban,primary_care_physicians
2,7654,Clay,TENNESSEE,Rural,2.0
4,27977,Henderson,TENNESSEE,Rural,7.0
7,27886,Carroll,TENNESSEE,Rural,12.0
9,7962,Perry,TENNESSEE,Rural,2.0
10,12104,Meigs,TENNESSEE,Rural,6.0
11,17623,Haywood,TENNESSEE,Rural,5.0
13,13344,Grundy,TENNESSEE,Rural,0.0
14,35552,Cocke,TENNESSEE,Rural,16.0
15,19847,DeKalb,TENNESSEE,Rural,9.0
16,12027,Lewis,TENNESSEE,Rural,2.0


Does there appear to be any detectable relationship between whether a county is urban or rural and its supply of primary care physicians?

Based on these data, the only way to detect if a county is likely to be urban is if there are more than 53 physicians available.