# Tim's Data Analysis for Congressional District Mapping

The primary purpose of this notebook is for initial data exploration by Tim Romer. This notebook will feature http requests to api's as well as trying to explore the data set.

In [2]:
# Meta-information for requesting data

CENSUS_API_KEY = '535963d4f87c22491889d73b8cc7889ee7d36465'
CURL_REQUEST = 'https://api.census.gov/data/2014/pep/natstprc.json?get=STNAME,POP&for=state:*&DATE_=7&key='+ CENSUS_API_KEY

To run this notebook, you will need to install an open source US Census Python wrapper. The link for install instructions is https://github.com/datamade/census.


This is a call to the api that will pull the population for all school districts in Ohio.

In [3]:
import census 
from us import states

c = census.Census(CENSUS_API_KEY)
oh_school_pop = c.acs5.get(('NAME', 'B01001_001E'),
                              {'for': 'school district (unified):*',
                               'in': 'state:{}'.format(states.OH.fips)})

Now, let's parse the json and put it into a pandas data frame. Here, we will change the column names to more appropriate names.

In [4]:
import pandas as pd
import numpy as np

oh_school_pop_df = pd.DataFrame(oh_school_pop)

oh_school_pop_df.rename(columns = {'B01001_001E' : 'POP', 
                                   'NAME' : 'SD', 
                                   'school district (unified)' : 'SD Code',
                                   'state' : 'State'
                                  }, 
                        inplace = True)
oh_school_pop_df

Unnamed: 0,POP,SD,SD Code,State
0,5161,Manchester Local School District (Adams County...,00537,39
1,189180,"Akron City School District, Ohio",04348,39
2,26577,"Ashtabula Area City School District, Ohio",04351,39
3,13840,"Monroe Local School District, Ohio",00094,39
4,26145,"Ashland City School District, Ohio",04350,39
5,11743,"Beachwood City School District, Ohio",04355,39
6,35725,"Athens City School District, Ohio",04352,39
7,25928,"Barberton City School District, Ohio",04353,39
8,15426,"Bay Village City School District, Ohio",04354,39
9,9721,"Bellaire Local School District, Ohio",04357,39


Now we have population data into a pandas dataframe so now we can manipulate the data. Let's verify that everything is in order.

In [5]:
oh_school_pop_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 613 entries, 0 to 612
Data columns (total 4 columns):
POP        613 non-null object
SD         613 non-null object
SD Code    613 non-null object
State      613 non-null object
dtypes: object(4)
memory usage: 19.2+ KB


The data appears to be completely there and we don't have any missing values. So now we will transform the numerical data into numerical data types and categorical data types into the categorical type.

In [6]:
oh_school_pop_df['POP'] = pd.to_numeric(oh_school_pop_df['POP'])
oh_school_pop_df['SD Code'] = pd.to_numeric(oh_school_pop_df['SD Code'])
oh_school_pop_df['State'] = pd.to_numeric(oh_school_pop_df['State'])
oh_school_pop_df['SD'] = pd.Categorical(oh_school_pop_df['SD'])

In [7]:
oh_school_pop_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 613 entries, 0 to 612
Data columns (total 4 columns):
POP        613 non-null int64
SD         613 non-null category
SD Code    613 non-null int64
State      613 non-null int64
dtypes: category(1), int64(3)
memory usage: 40.4 KB


The data is now transformed into respective numerical and categorical types.

Now let's do some data exploration.

Let's see the total population in ohio and verify that it is correct

In [8]:
sum(oh_school_pop_df['POP'])

11609756

According to google, the 2017 population is 11.66 million people so our 11.61 million people is within an acceptable margin of error.

In [9]:
oh_school_pop_df.describe()

Unnamed: 0,POP,SD Code,State
count,613.0,613.0,613.0
mean,18939.243067,5145.696574,39.0
std,35187.234388,4042.284914,0.0
min,0.0,94.0,39.0
25%,6215.0,4514.0,39.0
50%,10760.0,4716.0,39.0
75%,19843.0,4927.0,39.0
max,536947.0,99997.0,39.0


Upon review, the school district route does not make sense because major metropolitan areas. When researching this issue and other solutions to this problem, there are some important things that need to be considered.

In order to draw congressional districts as fair as possible, we don't want to consider any demographics other than geographic information. We want to draw congressional districts as fair/compact as possible so we don't want districts that snake around just to hit a certain demographic. Here are two algorithms that we could use:
1. <strong>Shortest-Splitline</strong>
2. <strong>Forest Fire Fill</strong>

To acomplish both we need to pull county population data and get individual census block data.

In [10]:
# Getting County Data
oh_county_pop = c.acs5.get(('NAME', 'B01001_001E'),
                              {'for': 'county:*',
                               'in': 'state:{}'.format(states.OH.fips)})
oh_county_pop_df = pd.DataFrame(oh_county_pop)

oh_county_pop_df

Unnamed: 0,B01001_001E,NAME,county,state
0,30203,"Gallia County, Ohio",053,39
1,58497,"Huron County, Ohio",077,39
2,231857,"Mahoning County, Ohio",099,39
3,65563,"Athens County, Ohio",009,39
4,27926,"Adams County, Ohio",001,39
5,76871,"Scioto County, Ohio",145,39
6,176362,"Medina County, Ohio",103,39
7,23234,"Meigs County, Ohio",105,39
8,54688,"Union County, Ohio",159,39
9,39005,"Champaign County, Ohio",021,39


In [14]:
# Getting Census Block data
oh_block_pop = c.acs5.get(('NAME', 'B01001_001E'),
                              {'for': 'tract:*',
                               'in': 'state:{}'.format(states.OH.fips)})
oh_block_pop_df = pd.DataFrame(oh_block_pop)

oh_block_pop_df

Unnamed: 0,B01001_001E,NAME,county,state,tract
0,4026,"Census Tract 9737, Athens County, Ohio",009,39,973700
1,4932,"Census Tract 9738, Athens County, Ohio",009,39,973800
2,4055,"Census Tract 9727, Athens County, Ohio",009,39,972700
3,3939,"Census Tract 9736, Athens County, Ohio",009,39,973600
4,4357,"Census Tract 9726, Athens County, Ohio",009,39,972600
5,3640,"Census Tract 9733, Athens County, Ohio",009,39,973300
6,5672,"Census Tract 9729, Athens County, Ohio",009,39,972900
7,5168,"Census Tract 9732, Athens County, Ohio",009,39,973200
8,3749,"Census Tract 78.11, Franklin County, Ohio",049,39,007811
9,4072,"Census Tract 26, Franklin County, Ohio",049,39,002600


According to wikipedia, a census tract is the smallest geographical unit for population data.