This notebook is used for downloading census data

# Census Reference

API Key: 2b07e25a69e507e080faa2c31f9da3b42d178b4e
School District Example Call (1, alabama): https://api.census.gov/data/2017/acs/acs5?get=NAME,B01001_001E&for=school%20district%20(unified):02700&in=state:01&key=YOUR_KEY_GOES_HERE

## Description:
School Districts
School Districts are geographic entities within which state, county, or local officials the Bureau of Indian Affairs, or the Department of Defense provide public educational services for the areas residents. The Census Bureau obtains the boundaries, names, local education agency codes, and school district levels for school districts from state and local school officials for the primary purpose of providing the U.S. Department of Education with estimates of the number of children at risk within each school district, county, and state. This information serves as the basis for the Department of Education to determine the annual allocation of Title I funding to states and school districts.

The Census Bureau tabulates data for three types of school districts: **elementary (950), secondary (960), and unified (970). Each school district is assigned a five-digit code that is unique within state. **School district codes are the local education agency number assigned by the Department of Education** and are not necessarily in alphabetical order by school district name.

**The elementary school districts provide education to the lower grade/age levels and the secondary school districts provide education to the upper grade/age levels. Unified school districts provide education to children of all school ages in their service areas. In general, where there is a unified school district, no elementary or secondary school district exists, and where there is an elementary school district the secondary school district may or may not exist. In additional to regular school districts, some Census Bureau products contain so-called pseudo school districts described below.**

The Census Bureau's representation of school districts in various data products is based both on the grade range that a school district operates and also the grade range for which the school district is financially responsible. For example, a school district is defined as an elementary school district if its operational grade range is less than the full kindergarten through 12 or pre-kindergarten through 12 grade range (for example, K-6 or pre-K-8). These elementary school districts do not provide direct educational services for grades 7-12, 9-12, or similar ranges. Some elementary school districts are financially responsible for the education of all school-aged children within their service areas and rely on other school districts to provide service for those grade ranges that are not operated by these elementary school districts. In these situations, in order to allocate all school-aged children to these school districts the secondary school district code field is blank. For elementary school districts where the operational grade range and financially responsible grade range are the same, the secondary school district code field will contain a secondary school district code. There are no situations where an elementary school district does not exist and a secondary school district exists in Census Bureau records.

## ACS Overview

Overview: https://www.census.gov/data/developers/data-sets/acs-5year.html
Slide deck: https://www.census.gov/content/dam/Census/programs-surveys/acs/guidance/training-presentations/20180614_API.pdf

### Data Profiles Variables
[Data Profiles Format]('https://www.census.gov/data/developers/data-sets/acs-1year/notes-on-acs-api-variable-formats.html')


Data Profiles Variable format: \[Table ID\] \[Row Number\] \[Variable Type\]


- Table ID = DP02 (Social Characteristics),
    - [DP02]('https://api.census.gov/data/2018/acs/acs5/profile/groups/DP02.html') - Social Characteristics
    - [DP02PR]('https://api.census.gov/data/2018/acs/acs5/profile/groups/DP02PR.html') - Social Characteristics, Puerto Rico
    - [DP03]('https://api.census.gov/data/2018/acs/acs5/profile/groups/DP03.html') - Economic Characteristtics
    - [DP04]('https://api.census.gov/data/2018/acs/acs5/profile/groups/DP04.html') - Housing Characteristics
    - [DP05]('https://api.census.gov/data/2018/acs/acs5/profile/groups/DP02.html') - Demographic / Housing Estimates
- Row Number = {Digital: \[0150, 0151, 0152\]}
    - 0150 Computers & Internet Use - Total households
    - 0151 Computers & Internet Use - Total households with a computer
    - 0152 Computers & Internet Use - Total households with a broadband internet subscription
- Variable Type = PE (% estimate)
    - E - Estimate
    - PE - Percent Estimate
    - PEA - Percent Estimate Annotation
    - M - Margin of Error
    - MA - Margin of Error Annotation
    - PMA - Percent Margin Annotation of Error


Example: Variable DP02_0002PE, “Family households (families)”, represents the percent estimate for table DP02 row number 2.

Example API Call:

https://api.census.gov/data/2018/acs/acs5/profile?get=group(DP02)&for=us:1


GEOID = State FIPS + LEAID (except pseudo-districts)

# Methods Overview

1. [census Module](#census-module)
2. [censusdata module](#censusdata-module)
3. [API](#API)

# Imports

## Installs

## module Imports

In [1]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import json

In [2]:
from census import Census

In [3]:
import censusdata as cd

## Variable Definitions

In [4]:
my_key = "2b07e25a69e507e080faa2c31f9da3b42d178b4e"
my_key

'2b07e25a69e507e080faa2c31f9da3b42d178b4e'

In [5]:
c = Census(my_key, year=2018)

In [11]:
state_list = pd.read_csv("https://www2.census.gov/geo/docs/reference/state.txt", sep="|")
state_list = pd.DataFrame(state_list[["STATE","STUSAB","STATE_NAME"]])
state_list.columns = map(str.lower, state_list.columns)
state_list.state = state_list.state.apply(str)
state_list.state = state_list.state.str.zfill(2)
state_list = state_list[:51]

print(state_list.info())
print(state_list.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   state       51 non-null     object
 1   stusab      51 non-null     object
 2   state_name  51 non-null     object
dtypes: object(3)
memory usage: 1.3+ KB
None
  state stusab  state_name
0    01     AL     Alabama
1    02     AK      Alaska
2    04     AZ     Arizona
3    05     AR    Arkansas
4    06     CA  California


In [7]:
year = '2018' ##2013, 2014, 2015, 2016, 2017, 2018
group = 'DP02' ##DP02, DP02PR, DP03, DP04, DP05
geofor = 'school%20district%20(unified)' ##school district (unified), school district (elementary), school district (secondary)
# stateID = #state_list['state'][0:57] 

In [8]:
digital_columns = 'DP02_0151PE','DP02_0152PE'

# census module

In [10]:
# c.acs5.get('NAME', {'for': 'school district (secondary):*',
#                     'in': 'state:01'})

[]

# censusdata module

# API

In [12]:
print(state_list)

   state stusab            state_name
0     01     AL               Alabama
1     02     AK                Alaska
2     04     AZ               Arizona
3     05     AR              Arkansas
4     06     CA            California
5     08     CO              Colorado
6     09     CT           Connecticut
7     10     DE              Delaware
8     11     DC  District of Columbia
9     12     FL               Florida
10    13     GA               Georgia
11    15     HI                Hawaii
12    16     ID                 Idaho
13    17     IL              Illinois
14    18     IN               Indiana
15    19     IA                  Iowa
16    20     KS                Kansas
17    21     KY              Kentucky
18    22     LA             Louisiana
19    23     ME                 Maine
20    24     MD              Maryland
21    25     MA         Massachusetts
22    26     MI              Michigan
23    27     MN             Minnesota
24    28     MS           Mississippi
25    29    

In [13]:
state_dfs = []

for state in state_list['state']:  
    state = str(state)
    get_acs_data = requests.get('https://api.census.gov/data/{year}/acs/acs5/profile?get=group({group})&for={geofor}:*&in=state:{stateID}&key={key}'
                                .format(year=year, group=group, geofor=geofor, stateID=state, key=my_key))
    acs_content = json.loads(get_acs_data.content)
    state_census_info = pd.DataFrame(data=acs_content)
    state_census_info.columns = state_census_info.iloc[0]
    state_census_info = state_census_info[1:]
    state_census_info['GEO_ID'] = state_census_info['GEO_ID'].str.lstrip('9700000US').str.zfill(7)
    state_census_info.rename(columns={'GEO_ID':'leaid'}, inplace=True)
    state_dfs.append(state_census_info) 
    
all_census_info = pd.concat(state_dfs)
print(all_census_info.info())
print(all_census_info.head())
print(all_census_info.tail())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10896 entries, 1 to 48
Columns: 1220 entries, leaid to school district (unified)
dtypes: object(1220)
memory usage: 101.5+ MB
None
0    leaid DP02_0001E DP02_0001M DP02_0001PE DP02_0001PM DP02_0002E  \
1  0102650      11557        430       11557  -888888888       7879   
2  0102670       3079        238        3079  -888888888       1588   
3  0102700      14462        421       14462  -888888888       8813   
4  0102730       7530        262        7530  -888888888       4935   
5  0102760       2671        205        2671  -888888888       1839   

0 DP02_0002M DP02_0002PE DP02_0002PM DP02_0003E  ... DP02_0151EA DP02_0151MA  \
1        368        68.2         3.4       2803  ...        None        None   
2        256        51.6         7.8        451  ...        None        None   
3        424        60.9         3.1       4413  ...        None        None   
4        279        65.5         3.1       1688  ...        None        

In [None]:
# state_dfs = []
# for state in state_list['state']:
#     state = str(state)
#     if len(state) == 1:
#         state = '0' + state
    
#     get_acs_data = requests.get('https://api.census.gov/data/{year}/acs/acs5/profile?get=group({group})&for={geofor}:*&in=state:{stateID}&key={key}'
#                  .format(year=year, group=group, geofor=geofor, stateID=state, key=my_key))
    
#     acs_content = json.loads(get_acs_data.content)
#     census_info = pd.DataFrame(data=acs_content)
#     census_info.columns = census_info.iloc[0]
#     census_info = census_info[1:]
#     broadband_only = census_info[['GEO_ID', 'state', 'school district (unified)', 'DP02_0152PE', 'DP02_0151PE']]
#     state_dfs.append(broadband_only)
    
# all_states = pd.concat(state_dfs)
# print(all_states.head())
# print(all_states.tail())

In [59]:
acs5_2018 = all_census_info

In [60]:
acs5_2018.to_pickle('../data/digital/acs5_2018.pkl')

In [61]:
## Merge census dataset with school assessment dataset
hs_assessment_2018 = pd.read_pickle('../data/education/hs_assessments_2018.pkl')

hs_assessment_2018.head()

Unnamed: 0,leaid,rla_score,math_score,year
0,100005,32,37,2018
1,100006,40,43,2018
2,100007,69,70,2018
3,100008,71,75,2018
4,100011,47,52,2018


In [62]:
census_hs_assessment_2018 = pd.merge(hs_assessment_2018, acs5_2018, on='leaid', how='inner')
census_hs_assessment_2018.drop(columns=['school district (unified)'])
census_hs_assessment_2018.head()

Unnamed: 0,leaid,rla_score,math_score,year,DP02_0001E,DP02_0001M,DP02_0001PE,DP02_0001PM,DP02_0002E,DP02_0002M,...,DP02_0151EA,DP02_0151MA,DP02_0151PEA,DP02_0151PMA,DP02_0152EA,DP02_0152MA,DP02_0152PEA,DP02_0152PMA,state,school district (unified)
0,100005,32,37,2018,7343,337,7343,-888888888,5103,254,...,,,,,,,,,1,5
1,100006,40,43,2018,17966,457,17966,-888888888,12530,421,...,,,,,,,,,1,6
2,100007,69,70,2018,32225,580,32225,-888888888,22678,504,...,,,,,,,,,1,7
3,100008,71,75,2018,18591,505,18591,-888888888,12977,349,...,,,,,,,,,1,8
4,100011,47,52,2018,4783,277,4783,-888888888,3374,205,...,,,,,,,,,1,11


In [65]:
census_hs_assessment_2018.set_index('leaid', inplace=True)
census_hs_assessment_2018.head()

Unnamed: 0_level_0,rla_score,math_score,year,DP02_0001E,DP02_0001M,DP02_0001PE,DP02_0001PM,DP02_0002E,DP02_0002M,DP02_0002PE,...,DP02_0151EA,DP02_0151MA,DP02_0151PEA,DP02_0151PMA,DP02_0152EA,DP02_0152MA,DP02_0152PEA,DP02_0152PMA,state,school district (unified)
leaid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100005,32,37,2018,7343,337,7343,-888888888,5103,254,69.5,...,,,,,,,,,1,5
100006,40,43,2018,17966,457,17966,-888888888,12530,421,69.7,...,,,,,,,,,1,6
100007,69,70,2018,32225,580,32225,-888888888,22678,504,70.4,...,,,,,,,,,1,7
100008,71,75,2018,18591,505,18591,-888888888,12977,349,69.8,...,,,,,,,,,1,8
100011,47,52,2018,4783,277,4783,-888888888,3374,205,70.5,...,,,,,,,,,1,11


In [66]:
census_hs_assessment_2018.to_pickle('../data/census_hs_assessment_2018.pkl')

In [69]:
broadband_hs_assessment_2018 = census_hs_assessment_2018[['rla_score','math_score','DP02_0152PE', 'DP02_0151PE']]
broadband_hs_assessment_2018.head()

Unnamed: 0_level_0,rla_score,math_score,DP02_0152PE,DP02_0151PE
leaid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
100005,32,37,75.4,83.4
100006,40,43,72.9,80.8
100007,69,70,91.1,95.5
100008,71,75,89.9,96.1
100011,47,52,76.7,88.0


In [70]:
broadband_hs_assessment_2018.to_pickle('../data/broadband_hs_assessment_2018.pkl')

In [73]:
combined_assessment_2018 = pd.read_pickle('../data/education/combined_assessments_2018.pkl')

combined_assessment_2018.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14874 entries, 0 to 14873
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   leaid       14874 non-null  object
 1   rla_score   14874 non-null  int64 
 2   math_score  14874 non-null  int64 
 3   year        14874 non-null  int64 
dtypes: int64(3), object(1)
memory usage: 581.0+ KB


In [74]:
census_assessment_2018 = pd.merge(combined_assessment_2018, acs5_2018, on='leaid', how='inner')
census_assessment_2018.drop(columns=['school district (unified)'])
census_assessment_2018.set_index('leaid', inplace=True)
census_assessment_2018.head()

Unnamed: 0_level_0,rla_score,math_score,year,DP02_0001E,DP02_0001M,DP02_0001PE,DP02_0001PM,DP02_0002E,DP02_0002M,DP02_0002PE,...,DP02_0151EA,DP02_0151MA,DP02_0151PEA,DP02_0151PMA,DP02_0152EA,DP02_0152MA,DP02_0152PEA,DP02_0152PMA,state,school district (unified)
leaid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100005,38,45,2018,7343,337,7343,-888888888,5103,254,69.5,...,,,,,,,,,1,5
100006,36,43,2018,17966,457,17966,-888888888,12530,421,69.7,...,,,,,,,,,1,6
100007,65,70,2018,32225,580,32225,-888888888,22678,504,70.4,...,,,,,,,,,1,7
100008,74,76,2018,18591,505,18591,-888888888,12977,349,69.8,...,,,,,,,,,1,8
100011,41,40,2018,4783,277,4783,-888888888,3374,205,70.5,...,,,,,,,,,1,11


In [75]:
census_assessment_2018.to_pickle('../data/census_combined_assessment_2018.pkl')
broadband_assessment_2018 = census_assessment_2018[['rla_score','math_score','DP02_0152PE', 'DP02_0151PE']]
broadband_assessment_2018.head()

Unnamed: 0_level_0,rla_score,math_score,DP02_0152PE,DP02_0151PE
leaid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
100005,38,45,75.4,83.4
100006,36,43,72.9,80.8
100007,65,70,91.1,95.5
100008,74,76,89.9,96.1
100011,41,40,76.7,88.0


In [76]:
broadband_assessment_2018.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10211 entries, 0100005 to 5606240
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   rla_score    10211 non-null  int64 
 1   math_score   10211 non-null  int64 
 2   DP02_0152PE  10211 non-null  object
 3   DP02_0151PE  10211 non-null  object
dtypes: int64(2), object(2)
memory usage: 398.9+ KB


## Different Api method (specify parameters in get)

In [69]:
# get_acs5_data = requests.get('https://api.census.gov/data/{year}/acs/acs5/profile?get=group({group})&for={geofor}:*&in=state:{stateID}&key={key}'
#                  .format(year=year, group=group, geofor=geofor, stateID="03", key=my_key))
# get_acs5_data.status_code

204

In [55]:
test1 = requests.get('https://api.census.gov/data/2018/acs/acs5?get=NAME&for=school%20district%20(unified):*&in=state:24')
test1.status_code

200

In [57]:
api_test_data = json.loads(test1.content)
api_test = pd.DataFrame(data=api_test_data)
api_test.head()

Unnamed: 0,0,1,2
0,NAME,state,school district (unified)
1,"Allegany County Public Schools, Maryland",24,00030
2,"Anne Arundel County Public Schools, Maryland",24,00060
3,"Baltimore City Public Schools, Maryland",24,00090
4,"Baltimore County Public Schools, Maryland",24,00120
