# Homicide Rates per County

In [1]:
import pandas as pd 

In [2]:
hom_by_cty = pd.read_csv('cty_homicides_2017.txt', delimiter='\t')
hom_by_cty.head(1).T

Unnamed: 0,0
Notes,
County,"Autauga County, AL"
County Code,1001
Deaths,Suppressed
Population,55504
Crude Rate,Suppressed


In [3]:
hom_by_cty.drop(['Notes'], inplace=True, axis=1)

In [4]:
#this is homicide rate per county where there is enough data 
hom_by_cty.head()

Unnamed: 0,County,County Code,Deaths,Population,Crude Rate
0,"Autauga County, AL",1001.0,Suppressed,55504,Suppressed
1,"Baldwin County, AL",1003.0,Suppressed,212628,Suppressed
2,"Barbour County, AL",1005.0,Suppressed,25270,Suppressed
3,"Bibb County, AL",1007.0,Suppressed,22668,Suppressed
4,"Blount County, AL",1009.0,Suppressed,58013,Suppressed


# County Features 

We tried using the ACS 1 year data through Cenpy and the API and consistently got ~820 to 840 counties. 

I tried it from the API directly and from cenpy, with different years for each, and it always returned counties within that range. And, not all states had counties in there, so it wasn't a random sample (remember, Arizona had zero...) 

#### However, using the 5 year data does get us more county information! 

In [5]:
!pip install cenpy



In [6]:
import cenpy as c
#find table that we want to query 
available = c.explorer.available()
acs_df = available[available['title'].str.contains('ACS') == True]
acs_df = acs_df[acs_df['vintage'] == 2017]
acs_df

  warn('geopandas not available. Some functionality will be disabled.')


Unnamed: 0,title,temporal,spatial,publisher,programCode,modified,keyword,distribution,description,contactPoint,...,c_isTimeseries,c_isCube,c_isAvailable,c_isAggregate,c_groupsLink,c_geographyLink,c_examplesLink,c_dataset,bureauCode,accessLevel
ACSDP5Y2017,ACS 5-Year Data Profiles,unidentified,,U.S. Census Bureau,006:004,2018-10-19 00:00:00.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",The American Community Survey (ACS) is an ongo...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acs5/prof...,https://api.census.gov/data/2017/acs/acs5/prof...,https://api.census.gov/data/2017/acs/acs5/prof...,"(acs, acs5, profile)",,
ACSCP5Y2017,ACS 5-Year Comparison Profiles,unidentified,,U.S. Census Bureau,006:004,2018-10-19 00:00:00.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",The American Community Survey (ACS) is an ongo...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acs5/cpro...,https://api.census.gov/data/2017/acs/acs5/cpro...,https://api.census.gov/data/2017/acs/acs5/cpro...,"(acs, acs5, cprofile)",,
ACSDP1Y2017,ACS 1-Year Data Profiles,unidentified,,U.S. Census Bureau,006:004,2018-09-13 00:00:00.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",The American Community Survey (ACS) is an ongo...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acs1/prof...,https://api.census.gov/data/2017/acs/acs1/prof...,https://api.census.gov/data/2017/acs/acs1/prof...,"(acs, acs1, profile)",,
ACSCP1Y2017,ACS 1-Year Comparison Profiles,unidentified,,U.S. Census Bureau,006:004,2018-09-13 00:00:00.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",The American Community Survey (ACS) is an ongo...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acs1/cpro...,https://api.census.gov/data/2017/acs/acs1/cpro...,https://api.census.gov/data/2017/acs/acs1/cpro...,"(acs, acs1, cprofile)",,
ACSST5Y2017,ACS 5-Year Subject Tables,unidentified,,U.S. Census Bureau,006:004,2018-10-19 00:00:00.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",The American Community Survey (ACS) is an ongo...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acs5/subj...,https://api.census.gov/data/2017/acs/acs5/subj...,https://api.census.gov/data/2017/acs/acs5/subj...,"(acs, acs5, subject)",,
ACSDT1Y2017,ACS 1-Year Detailed Tables,unidentified,,U.S. Census Bureau,006:004,2018-09-13 00:00:00.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",The American Community Survey (ACS) is an ongo...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acs1/grou...,https://api.census.gov/data/2017/acs/acs1/geog...,https://api.census.gov/data/2017/acs/acs1/exam...,"(acs, acs1)",,
ACSSE2017,ACS 1-Year Supplemental Estimates,unidentified,,U.S. Census Bureau,006:004,2018-10-18 00:00:00.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",The American Community Survey (ACS) is a natio...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acsse/gro...,https://api.census.gov/data/2017/acs/acsse/geo...,https://api.census.gov/data/2017/acs/acsse/exa...,"(acs, acsse)",,
ACSDT5Y2017,ACS 5-Year Detailed Tables,unidentified,,U.S. Census Bureau,006:004,2018-08-21 07:11:43.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",The American Community Survey (ACS) is an ongo...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acs5/grou...,https://api.census.gov/data/2017/acs/acs5/geog...,https://api.census.gov/data/2017/acs/acs5/exam...,"(acs, acs5)",,
ACSSPP1Y2017,ACS 1-Year Selected Population Profiles,unidentified,,U.S. Census Bureau,006:004,2018-09-17 00:00:00.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",Selected Population Profiles provide broad soc...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acs1/spp/...,https://api.census.gov/data/2017/acs/acs1/spp/...,https://api.census.gov/data/2017/acs/acs1/spp/...,"(acs, acs1, spp)",,
ACSST1Y2017,ACS 1-Year Subject Tables,unidentified,,U.S. Census Bureau,006:004,2018-09-13 00:00:00.0,,"{'@type': 'dcat:Distribution', 'accessURL': 'h...",The American Community Survey (ACS) is an ongo...,"{'fn': 'American Community Survey Office', 'ha...",...,,True,True,True,https://api.census.gov/data/2017/acs/acs1/subj...,https://api.census.gov/data/2017/acs/acs1/subj...,https://api.census.gov/data/2017/acs/acs1/subj...,"(acs, acs1, subject)",,


Based on this website and other research, we want to use ACSDP5Y2017.

In [7]:
c.explorer.explain('ACSDP5Y2017')

{'ACS 5-Year Data Profiles': 'The American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. The data profiles include the following geographies: nation, all states (including DC and Puerto Rico), all metropolitan areas, all congressional districts, all counties, all places and all tracts. Data profiles contain broad social, economic, housing, and demographic information. The data are presented as both counts and percentages. There are over 2,400 variables in this dataset.'}

In [8]:
con = c.base.Connection('ACSDP5Y2017')
g_unit = 'county:*'

In [9]:
#create education features list from https://api.census.gov/data/2017/acs/acs5/profile/variables.html
cols_edu = [] 
for n in range(52,68): 
    var_name = 'DP02_00'+str(n)+'PE'
    cols_edu.append(var_name)

#create internet features list
cols_internet = []
for n in range(150,153):
    var_name = 'DP02_0'+str(n)+'PE'
    cols_internet.append(var_name)

#create row names for joining and EDA
cols_req = ['NAME']


In [10]:
cols = cols_req + cols_edu + cols_internet 
cols

['NAME',
 'DP02_0052PE',
 'DP02_0053PE',
 'DP02_0054PE',
 'DP02_0055PE',
 'DP02_0056PE',
 'DP02_0057PE',
 'DP02_0058PE',
 'DP02_0059PE',
 'DP02_0060PE',
 'DP02_0061PE',
 'DP02_0062PE',
 'DP02_0063PE',
 'DP02_0064PE',
 'DP02_0065PE',
 'DP02_0066PE',
 'DP02_0067PE',
 'DP02_0150PE',
 'DP02_0151PE',
 'DP02_0152PE']

In [11]:
counties_df = con.query(cols=cols, geo_unit=g_unit)
counties_df.head()

Unnamed: 0,NAME,DP02_0052PE,DP02_0053PE,DP02_0054PE,DP02_0055PE,DP02_0056PE,DP02_0057PE,DP02_0058PE,DP02_0059PE,DP02_0060PE,...,DP02_0063PE,DP02_0064PE,DP02_0065PE,DP02_0066PE,DP02_0067PE,DP02_0150PE,DP02_0151PE,DP02_0152PE,state,county
0,"Pickens County, Alabama",4416,11.8,5.7,39.2,26.8,16.5,14241,6.2,13.9,...,7.5,8.8,3.0,79.8,11.8,7620,71.0,60.9,1,107
1,"Sumter County, Alabama",4106,4.4,5.9,26.9,18.8,44.1,8244,4.7,12.6,...,7.0,10.6,7.6,82.7,18.2,5073,64.8,50.4,1,119
2,"Jefferson County, Alabama",165739,6.8,5.7,39.9,20.5,27.1,447048,3.0,7.6,...,8.1,19.4,12.5,89.4,31.9,261390,84.4,73.0,1,73
3,"Choctaw County, Alabama",2718,3.9,4.3,48.0,24.4,19.4,9449,6.6,13.3,...,9.1,7.9,3.8,80.1,11.6,5463,70.4,52.3,1,23
4,"Franklin County, Alabama",7426,3.3,6.8,49.2,23.3,17.4,20734,11.8,11.9,...,7.5,8.3,5.1,76.4,13.4,11533,74.2,60.3,1,59


In [24]:
counties_df = counties_df.rename(index=str, columns={'NAME': 'county_name', 
                                       'DP02_0052PE': '%_inschool_3+',
                                       'DP02_0053PE': '%_preschool_3+',
                                       'DP02_0054PE': '%_kinderg_3+',
                                       'DP02_0055PE': '%_elementary_3+',
                                       'DP02_0056PE': '%_highschool_3+',
                                       'DP02_0057PE': '%_college_3+',
                                       'DP02_0058PE': '%_25+',
                                       'DP02_0059PE': '%_below9th_25+',
                                       'DP02_0060PE': '%_9th-12th_25+',
                                       'DP02_0061PE': '%_hsgrad_25+',
                                       'DP02_0062PE': '%_somecollege_25+',
                                       'DP02_0063PE': '%_associates_25+',
                                       'DP02_0064PE': '%_bachelors_b25+',
                                       'DP02_0065PE': '%_gradschool_25+',
                                       'DP02_0066PE': '%_hsgrad_or+_25+',
                                       'DP02_0067PE': '%_bachelors_or+_25+',
                                       'DP02_0150PE': '%_useinternet_total_households',
                                       'DP02_0151PE': '%_havecomp_total_households',
                                       'DP02_0152PE': '%_broadband_total_households',
                                      })
counties_df.tail(90)`

Unnamed: 0,county_name,%_inschool_3+,%_preschool_3+,%_kinderg_3+,%_elementary_3+,%_highschool_3+,%_college_3+,%_25+,%_below9th_25+,%_9th-12th_25+,...,%_associates_25+,%_bachelors_b25+,%_gradschool_25+,%_hsgrad_or+_25+,%_bachelors_or+_25+,%_useinternet_total_households,%_havecomp_total_households,%_broadband_total_households,state,county
3130,"Ozaukee County, Wisconsin",22813,5.8,5.2,39.7,23.4,26.0,60769,1.2,2.7,...,8.3,29.9,17.8,96.2,47.7,35044,91.0,85.4,55,089
3131,"Marinette County, Wisconsin",8605,4.5,4.7,43.9,23.8,23.1,29941,2.5,6.3,...,9.5,10.5,4.3,91.2,14.8,18548,82.1,72.5,55,075
3132,"Milwaukee County, Wisconsin",257495,3.5,8.2,39.9,19.4,29.0,627652,4.4,8.1,...,7.7,19.3,10.8,87.4,30.1,382027,83.7,73.7,55,079
3133,"Polk County, Wisconsin",9038,5.2,4.9,47.8,25.6,16.5,30872,1.7,5.2,...,12.1,13.8,6.1,93.1,19.8,18189,84.4,72.2,55,095
3134,"Rock County, Wisconsin",39954,6.2,5.8,42.6,23.7,21.7,108792,3.0,6.9,...,10.7,14.2,7.3,90.2,21.4,64482,86.8,74.4,55,105
3135,"Walworth County, Wisconsin",28324,4.7,5.0,35.8,19.5,35.1,67369,3.3,5.8,...,8.8,18.1,10.4,91.0,28.5,40246,89.5,79.5,55,127
3136,"Bayfield County, Wisconsin",2681,6.0,6.6,44.2,25.9,17.3,11439,1.1,4.4,...,12.0,19.7,11.4,94.4,31.0,6859,85.4,76.1,55,007
3137,"Rusk County, Wisconsin",2690,6.4,3.8,53.5,25.1,11.1,10268,4.8,8.2,...,10.7,10.0,4.8,87.0,14.8,6294,78.9,66.0,55,107
3138,"Vilas County, Wisconsin",3495,6.7,5.8,46.2,23.1,18.1,16692,1.0,6.4,...,9.5,17.8,9.3,92.6,27.1,10758,81.7,68.8,55,125
3139,"Shawano County, Wisconsin",8769,6.5,5.7,46.4,24.3,17.0,29299,2.5,6.5,...,11.5,10.1,5.7,91.0,15.8,17024,80.3,72.8,55,115


In [28]:
counties_df[counties_df['county_name'].str.contains('Wyoming')]

Unnamed: 0,county_name,%_inschool_3+,%_preschool_3+,%_kinderg_3+,%_elementary_3+,%_highschool_3+,%_college_3+,%_25+,%_below9th_25+,%_9th-12th_25+,...,%_associates_25+,%_bachelors_b25+,%_gradschool_25+,%_hsgrad_or+_25+,%_bachelors_or+_25+,%_useinternet_total_households,%_havecomp_total_households,%_broadband_total_households,state,county
1857,"Wyoming County, New York",7920,5.7,4.3,46.4,25.3,18.3,29320,3.5,8.2,...,13.4,9.7,5.8,88.3,15.4,15686,83.8,73.1,36,121
2267,"Wyoming County, Pennsylvania",5697,5.8,4.8,42.4,25.4,21.6,19657,1.8,6.1,...,8.5,12.4,6.5,92.1,18.8,10801,84.7,78.6,42,131
2966,"Wyoming County, West Virginia",4207,6.6,7.6,49.3,24.6,11.9,15859,8.5,13.1,...,4.1,5.4,3.9,78.4,9.3,9169,71.8,63.6,54,109
3047,"Goshen County, Wyoming",2938,5.4,3.2,42.1,24.1,25.3,9449,2.0,6.6,...,11.0,15.3,9.1,91.4,24.4,5328,81.9,71.3,56,15
3048,"Uinta County, Wyoming",5557,6.8,6.3,53.9,21.6,11.5,12978,2.3,5.9,...,10.3,11.9,5.5,91.8,17.4,7705,91.2,83.8,56,41
3049,"Washakie County, Wyoming",1783,4.7,2.5,52.7,33.5,6.7,5705,2.9,8.6,...,11.3,15.4,5.6,88.5,21.0,3490,87.7,76.4,56,43
3050,"Hot Springs County, Wyoming",943,10.0,6.0,49.9,19.3,14.7,3604,1.2,6.2,...,14.1,13.3,8.3,92.5,21.6,2246,85.1,74.4,56,17
3051,"Fremont County, Wyoming",10169,9.1,5.7,45.1,21.1,19.0,26781,2.1,6.8,...,10.6,15.5,7.7,91.1,23.3,15167,86.1,71.9,56,13
3052,"Sublette County, Wyoming",2482,6.8,7.7,49.4,21.5,14.6,7051,1.6,2.1,...,9.2,19.2,6.2,96.2,25.4,3197,93.9,82.9,56,35
3053,"Weston County, Wyoming",1441,13.4,2.7,46.2,19.4,18.3,5074,1.5,6.6,...,9.1,14.1,5.7,91.9,19.8,3182,82.9,71.1,56,45


In [21]:
counties_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3220 entries, 0 to 3219
Data columns (total 22 columns):
county_name                       3220 non-null object
%_inschool_3+                     3142 non-null object
%_preschool_3+                    3142 non-null object
%_kinderg_3+                      3142 non-null object
%_elementary_3+                   3142 non-null object
%_highschool_3+                   3142 non-null object
%_college_3+                      3142 non-null object
%_25+                             3142 non-null object
%_below9th_25+                    3142 non-null object
%_9th-12th_25+                    3142 non-null object
%_hsgrad_25+                      3142 non-null object
%_somecollege_25+                 3142 non-null object
%_associates_25+                  3142 non-null object
%_bachelors_b25+                  3142 non-null object
%_gradschool_25+                  3142 non-null object
%_hsgrad_or+_25+                  3142 non-null object
%_bachelors_or+_25

In [32]:
counties_df['county_code'] = counties_df['state'] + counties_df['county']

counties_df.head()

Unnamed: 0,county_name,%_inschool_3+,%_preschool_3+,%_kinderg_3+,%_elementary_3+,%_highschool_3+,%_college_3+,%_25+,%_below9th_25+,%_9th-12th_25+,...,%_bachelors_b25+,%_gradschool_25+,%_hsgrad_or+_25+,%_bachelors_or+_25+,%_useinternet_total_households,%_havecomp_total_households,%_broadband_total_households,state,county,county_code
0,"Pickens County, Alabama",4416,11.8,5.7,39.2,26.8,16.5,14241,6.2,13.9,...,8.8,3.0,79.8,11.8,7620,71.0,60.9,1,107,1107
1,"Sumter County, Alabama",4106,4.4,5.9,26.9,18.8,44.1,8244,4.7,12.6,...,10.6,7.6,82.7,18.2,5073,64.8,50.4,1,119,1119
2,"Jefferson County, Alabama",165739,6.8,5.7,39.9,20.5,27.1,447048,3.0,7.6,...,19.4,12.5,89.4,31.9,261390,84.4,73.0,1,73,1073
3,"Choctaw County, Alabama",2718,3.9,4.3,48.0,24.4,19.4,9449,6.6,13.3,...,7.9,3.8,80.1,11.6,5463,70.4,52.3,1,23,1023
4,"Franklin County, Alabama",7426,3.3,6.8,49.2,23.3,17.4,20734,11.8,11.9,...,8.3,5.1,76.4,13.4,11533,74.2,60.3,1,59,1059


In [31]:
counties_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3220 entries, 0 to 3219
Data columns (total 23 columns):
county_name                       3220 non-null object
%_inschool_3+                     3142 non-null object
%_preschool_3+                    3142 non-null object
%_kinderg_3+                      3142 non-null object
%_elementary_3+                   3142 non-null object
%_highschool_3+                   3142 non-null object
%_college_3+                      3142 non-null object
%_25+                             3142 non-null object
%_below9th_25+                    3142 non-null object
%_9th-12th_25+                    3142 non-null object
%_hsgrad_25+                      3142 non-null object
%_somecollege_25+                 3142 non-null object
%_associates_25+                  3142 non-null object
%_bachelors_b25+                  3142 non-null object
%_gradschool_25+                  3142 non-null object
%_hsgrad_or+_25+                  3142 non-null object
%_bachelors_or+_25

In [50]:
counties_df['county_code'] = counties_df['county_code'].astype(int)
counties_df['state'] = counties_df['state'].astype(int)
counties_df['county'] = counties_df['county'].astype(int)
counties_df.iloc[:,1:-3] = counties_df.iloc[:,1:-3].astype(float)
counties_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3220 entries, 0 to 3219
Data columns (total 23 columns):
county_name                       3220 non-null object
%_inschool_3+                     3142 non-null float64
%_preschool_3+                    3142 non-null float64
%_kinderg_3+                      3142 non-null float64
%_elementary_3+                   3142 non-null float64
%_highschool_3+                   3142 non-null float64
%_college_3+                      3142 non-null float64
%_25+                             3142 non-null float64
%_below9th_25+                    3142 non-null float64
%_9th-12th_25+                    3142 non-null float64
%_hsgrad_25+                      3142 non-null float64
%_somecollege_25+                 3142 non-null float64
%_associates_25+                  3142 non-null float64
%_bachelors_b25+                  3142 non-null float64
%_gradschool_25+                  3142 non-null float64
%_hsgrad_or+_25+                  3142 non-null float64
%_b

In [52]:
counties_df[pd.isnull(counties_df).any(axis=1)]

Unnamed: 0,county_name,%_inschool_3+,%_preschool_3+,%_kinderg_3+,%_elementary_3+,%_highschool_3+,%_college_3+,%_25+,%_below9th_25+,%_9th-12th_25+,...,%_bachelors_b25+,%_gradschool_25+,%_hsgrad_or+_25+,%_bachelors_or+_25+,%_useinternet_total_households,%_havecomp_total_households,%_broadband_total_households,state,county,county_code
3142,"Jayuya Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,73,72073
3143,"Quebradillas Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,115,72115
3144,"Guayama Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,57,72057
3145,"Guánica Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,55,72055
3146,"Rincón Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,117,72117
3147,"Villalba Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,149,72149
3148,"Aguas Buenas Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,7,72007
3149,"Bayamón Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,21,72021
3150,"Hormigueros Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,67,72067
3151,"Manatí Municipio, Puerto Rico",,,,,,,,,,...,,,,,,,,72,91,72091


In [55]:
counties_df = counties_df.dropna()
counties_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3142 entries, 0 to 3141
Data columns (total 23 columns):
county_name                       3142 non-null object
%_inschool_3+                     3142 non-null float64
%_preschool_3+                    3142 non-null float64
%_kinderg_3+                      3142 non-null float64
%_elementary_3+                   3142 non-null float64
%_highschool_3+                   3142 non-null float64
%_college_3+                      3142 non-null float64
%_25+                             3142 non-null float64
%_below9th_25+                    3142 non-null float64
%_9th-12th_25+                    3142 non-null float64
%_hsgrad_25+                      3142 non-null float64
%_somecollege_25+                 3142 non-null float64
%_associates_25+                  3142 non-null float64
%_bachelors_b25+                  3142 non-null float64
%_gradschool_25+                  3142 non-null float64
%_hsgrad_or+_25+                  3142 non-null float64
%_b

In [57]:
counties_df.tail().T

Unnamed: 0,3137,3138,3139,3140,3141
county_name,"Rusk County, Wisconsin","Vilas County, Wisconsin","Shawano County, Wisconsin","Juneau County, Wisconsin","Washington County, Wisconsin"
%_inschool_3+,2690,3495,8769,5183,31410
%_preschool_3+,6.4,6.7,6.5,4.8,5.8
%_kinderg_3+,3.8,5.8,5.7,7.2,5.5
%_elementary_3+,53.5,46.2,46.4,46.5,44.6
%_highschool_3+,25.1,23.1,24.3,25.9,24.4
%_college_3+,11.1,18.1,17,15.6,19.8
%_25+,10268,16692,29299,19276,93734
%_below9th_25+,4.8,1,2.5,3.6,1.8
%_9th-12th_25+,8.2,6.4,6.5,9.4,3.8


P.S. I got the API to return the same information except it wasn't as easy to work with especially because the strings are long and the column names don't autopopulate with the state and county names. Also of note: When I was getting errors in the wrapper they didn't have details, but the errors through the browser version of the API had details.

# Compare the target to the features


In [None]:
#how many counties are in the target set? 
hom_by_cty.County.nunique()
#note that some are suppressed 

In [None]:
#how many counties are in the target set? 
print(len(counties_df))
#note that these include Puerto Rico

Pretty close! Maybe we can limit by population first (on homicides df) then join the data on the county name. 

In [None]:
import numpy as np
hom_by_cty.info()

In [None]:
#convert columns to usable datatypes
hom = hom_by_cty
hom.head()

In [None]:
len(hom[hom['Crude Rate']!='Suppressed'])

Only 414 rows have usable data. We're technically supposed to have 1000 rows, which we start out with, before we remove the ones that are suppressed. What if we looked at suicides instead? 

# Suicide Rates by County
Just out of curiosity, is there more data to work with around suicides?  

In [None]:
suicides = pd.read_csv('suicides.txt', delimiter='\t').drop(['Notes'], axis=1)

print(len(suicides))
suicides.head()
#we don't have all counties here

In [None]:
suicides.drop(suicides[suicides['Crude Rate'] == 'Unreliable'].index, inplace=True)

In [None]:
print(len(suicides))
suicides.head()

In [None]:
suicides.drop(suicides[suicides['Crude Rate'] == 'Suppressed'].index, inplace=True)
suicides.info()

# To Alex from Mia - where do we go from here? 
So we know how to get the data from the CDC on deaths/other health things, and we know how to get data from the Census. That's pretty powerful, even if we aren't sure if we have "enough" data for the mod 3 project. I don't want to scrap our idea entirely and use a different data set, because I think this is really cool and just worked on it a bunch, lol. What I would consider doing from here is: 
1. Get permission from an instructor to target homicide rates OR suicide rates even though there aren't technically over 1000 of them. 
2. Target both of those and compare if we have time. 
3. Choose a new death or injury variable that is more "common" than suicide and homicide like cardiac disease and use that if it returns more than 1000 rows. 
3. Choose a new target variable from the actual Census data itself, now that we know how to query it.

Let me know your thoughts! Text me when/if you work on this at all. :) 