# ACS 5-Year Data Profiles

The 5-year data profile tables return relevant information to our problem at hand. They can be found at https://api.census.gov/data.html by searching for "ACS 5-Year Data Profiles" on the page. These tables are available from 2010-2020 and are able to be accessed by an identical API call format. The base URL for making an API call looks like this:  

https://api.census.gov/data/{YEAR}/acs/acs5/profile?get=group({GROUP)&for=zip%20code%20tabulation%20area:{ZIP_CODE}&key={API_KEY}

The words written in all caps contained by curly braces are replaced by the relevant information. For example, {YEAR} may be replaced by 2020 and {GROUP} may be replaced by DP03. The group codes are differentiated below.

DP02 - Social Characteristics  
DP03 - Economic Characteristics  
DP04 - Housing Characteristics  
DP05 - Demographics and Housing

In [1]:
%autosave 0

Autosave disabled


In [26]:
from requests import get
from env import census_api_key
import pandas as pd

pd.set_option('display.html.use_mathjax', False)

In [27]:
url = f'https://api.census.gov/data/2020/acs/acs5/profile?get=group(DP03)&for=zip%20code%20tabulation%20area:78209&key={census_api_key}'

In [28]:
response = get(url)

content = response.json()

df = pd.DataFrame(content).T

df

Unnamed: 0,0,1
0,DP03_0001E,36142
1,DP03_0001EA,
2,DP03_0001M,1864
3,DP03_0001MA,
4,DP03_0001PE,36142
...,...,...
1094,DP03_0137PM,3.9
1095,DP03_0137PMA,
1096,GEO_ID,8600000US78209
1097,NAME,ZCTA5 78209


In [29]:
df.columns = ['variable', 'value']

regex = '^DP03_\d*E$'

df = df[df['variable'].str.match(regex)]

df = df[df['value'] != '-888888888']

df.reset_index(drop=True, inplace=True)

df

Unnamed: 0,variable,value
0,DP03_0001E,36142
1,DP03_0002E,24260
2,DP03_0003E,23615
3,DP03_0004E,22845
4,DP03_0005E,770
...,...,...
112,DP03_0114E,5237
113,DP03_0115E,4431
114,DP03_0116E,4032
115,DP03_0117E,705


In [30]:
variable_list = df['variable'].tolist()

label_list = []

base_url = 'https://api.census.gov/data/2020/acs/acs5/profile/variables/'

for var in variable_list:
    
    url = base_url + var + '.json'
    
    response = get(url)
    
    content = response.json()
    
    label_list.append(content['label'].split('!!')[-1])
    
label_list    

['Population 16 years and over',
 'In labor force',
 'Civilian labor force',
 'Employed',
 'Unemployed',
 'Armed Forces',
 'Not in labor force',
 'Civilian labor force',
 'Females 16 years and over',
 'In labor force',
 'Civilian labor force',
 'Employed',
 'Own children of the householder under 6 years',
 'All parents in family in labor force',
 'Own children of the householder 6 to 17 years',
 'All parents in family in labor force',
 'Workers 16 years and over',
 'Car, truck, or van -- drove alone',
 'Car, truck, or van -- carpooled',
 'Public transportation (excluding taxicab)',
 'Walked',
 'Other means',
 'Worked from home',
 'Mean travel time to work (minutes)',
 'Civilian employed population 16 years and over',
 'Management, business, science, and arts occupations',
 'Service occupations',
 'Sales and office occupations',
 'Natural resources, construction, and maintenance occupations',
 'Production, transportation, and material moving occupations',
 'Civilian employed population 

In [31]:
df = pd.concat([df, pd.Series(label_list)], axis=1)

df.columns = ['variable', 'value', 'label']

df = df[['label', 'value']]

df

Unnamed: 0,label,value
0,Population 16 years and over,36142
1,In labor force,24260
2,Civilian labor force,23615
3,Employed,22845
4,Unemployed,770
...,...,...
112,Not in labor force:,5237
113,With health insurance coverage,4431
114,With private health insurance,4032
115,With public coverage,705


In [39]:
df.to_csv('DP03_2020_78209.csv')