# Working Project Outline

[1. Importing Libraries](#1.-Importing-Libraries)

[2. Trying to better understand the API Call Structure for U.S. Census](#2.-Trying-to-better-understand-the-API-Call-Structure-for-U.S.-Census)

-[2.2: Setting up variables](#2.2:-Setting-up-variables)

[3. Getting Actual API Calls to Work!](#3.-Getting-Actual-API-Calls-to-Work!)

[4. Cleaning up API csv file!](#4.-Cleaning-up-API-csv-file!)

[5. Merging datasets/info!](#5.-Merging-datasets/info!)
- [5.1: 2014 DP03 and DP05](#5.1:-2014-DP03-and-DP05)
- [5.2: 2015 Data](#5.2:-2015-Data)
- [5.3: 2016 Data](#5.3:-2016-Data)
- [5.4: 2017 Data](#5.4:-2017-Data)

[6. Zipcode/County/Tract Relationships!](#6.-Zipcode/County/Tract-Relationships!)

[7. EDA and Other Crap!](#7.-EDA-and-Other-Crap!)

[8. Export Results to CSV](#8.-Export-Results-to-CSV)

## 1. Importing Libraries
[Return to outline](#Working-Project-Outline)

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os
import requests
%matplotlib inline

My API key for US census is: 3ada656ee00c7265bca8c8209fa3e7b10f49f38b

In [3]:
# Checking what working directory I am in
os.getcwd()

'C:\\Users\\gothv\\Jupyter\\final_capstone'

API Call Format:
http://api.census.gov/data/2018/acs/acs1/profile?get=group(DP03)&for=us:1&key=YOUR_KEY_GOES_HERE

## 2. Trying to better understand the API Call Structure for U.S. Census
[Return to outline](#Working-Project-Outline)

In [4]:
# Sample API Call- should return the DP02 profile for Texas
fake_request = requests.get('https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,DP02_0001E&for=state:48&key=3ada656ee00c7265bca8c8209fa3e7b10f49f38b')

In [5]:
fake_data = fake_request.json

In [6]:
len(fake_data())

2

In [7]:
fake_data()

[['NAME', 'DP02_0001E', 'state'], ['Texas', '9776083', '48']]

Ok, we're getting a little closer! What we have above is one row of data from the Data profiles section of the ACS 1year report from 2018. The DP02__## is only one of the variables. This query selected the name of the state, the info for just that variable and the code for the state. Let's see if we can go down more levels with the same variable.

In [8]:
# Sample call for counties in Texas for just that specific variable
sample = requests.get('https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,DP02_0001E&for=state:48&key=3ada656ee00c7265bca8c8209fa3e7b10f49f38b')


In [9]:
sample.text

'[["NAME","DP02_0001E","state"],\n["Texas","9776083","48"]]'

## 2.2: Setting up variables

Making a comprehensive list of what variables I actually want to call from the census API:

**DP03:**
- 0001E- number of individuals 16 & over (pop able to work)
- 0001PE- % of pop working able
- 0004PE - % of pop employed
- 0005PE- % of pop unemployed
- 0062E- Mean household income 
- 0118PE- % of families in poverty last 12 months
- 0127PE- % of individuals in poverty last 12 months

**DP05:**
- 0001E- total pop
- 0002 E& PE- males
- 0003 E & PE- females
- 0018E- median age in years
- 0019PE- Under 18
- 0020PE- 16 & over
- 0021PE- 18 & over
- 00022PE - 21 & over
- 0023PE - 62 & older
- 0024PE - 65 and older
- 0037 PE- 1 race white
- 0038 PE- Black
- 0039PE- Native
- 0044PE- Asian
- 0071PE - Hispanic

## 3. Getting Actual API Calls to Work!
[Return to outline](#Working-Project-Outline)

In [192]:
# Defining my key 
key = '3ada656ee00c7265bca8c8209fa3e7b10f49f38b'

In [191]:
# Making a list of the DP03 features I want
dp03_features = ['DP03_0001E','DP03_0001PE', 'DP03_0004PE', 'DP03_0005PE', 'DP03_0062E', 'DP03_0118PE', 'DP03_0127PE']

In [188]:
# Making a list of features I want from DP05
dp05_features = ['DP05_0001E', 'DP05_0002E', 'DP05_0002PE', 'DP05_0003E', 'DP05_0003PE', 'DP05_0018E', 'DP05_0021PE',
                 'DP05_0022PE', 'DP05_0023PE','DP05_0024PE', 'DP05_0037PE', 'DP05_0038PE', 'DP05_0039PE', 'DP05_0044PE',
                 'DP05_0071PE']

In [13]:
# First test: seeing if I wrote the formatting for my API call correctly so it changes with my features list
for x in dp03_features:
   api_url = 'https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,{}&for=county:113&in=state:48&key={}'.format(x, key)
   print(api_url)

https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,DP03_0001E&for=county:113&in=state:48&key=3ada656ee00c7265bca8c8209fa3e7b10f49f38b
https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,DP03_0001PE&for=county:113&in=state:48&key=3ada656ee00c7265bca8c8209fa3e7b10f49f38b
https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,DP03_0004PE&for=county:113&in=state:48&key=3ada656ee00c7265bca8c8209fa3e7b10f49f38b
https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,DP03_0005PE&for=county:113&in=state:48&key=3ada656ee00c7265bca8c8209fa3e7b10f49f38b
https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,DP03_0062E&for=county:113&in=state:48&key=3ada656ee00c7265bca8c8209fa3e7b10f49f38b
https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,DP03_0118PE&for=county:113&in=state:48&key=3ada656ee00c7265bca8c8209fa3e7b10f49f38b
https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,DP03_0127PE&for=county:113&in=state:48&key=3ada656ee00c7265bca8c8209fa3e7b10f49f38b


In [194]:
# Adding sleep option so I don't blow up the API calls!
from time import sleep

In [195]:
# Opening my csv file to write results of API calls into!
f = open('2018_census_results.csv', 'w')

In [196]:
# Getting DP03 data from 2018
for x in dp03_features:
    sample = requests.get('https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,{}&for=county:113&in=state:48&key={}'.format(x, key))
    f.write(sample.text)
    sleep(1)
f.close()    

In [197]:
# Re-opening my csv file to write results of API calls into!
f = open('2018_census_results.csv', 'a')

# Getting DP05 data from 2018
for x in dp05_features:
    sample = requests.get('https://api.census.gov/data/2018/acs/acs1/profile?get=NAME,{}&for=county:113&in=state:48&key={}'.format(x, key))
    f.write(sample.text)
    sleep(1)
f.close()    

AWESOME! The resulting csv file has all of the results of my API calls in it! The formatting is a bit wonky, but we will take care of that next!

## 4. Cleaning up API csv file!
[Return to outline](#Working-Project-Outline)

Let's set up a nameing convention now for all of our sub-census data:
- 2014 = df1
- 2015 = df2
- 2016 = df3
- 2017 = df4
- 2018 = df5

### IMPORTED NEW DF HERE!- NOTE: Move this section down below other dfs when making pretty!

In [362]:
# Let's load in the results of our API calls and make a dataframe!(Note: A little manual manipulation of the csv file was done first)
df5= pd.read_csv('2018_census_results.csv', header = None)

In [162]:
# Just how messed up is the formatting of our results?
df5.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,,,,,"[[""NAME""",DP03_0001E,state,county]
1,"[""Dallas County","Texas""",2022625.0,48.0,"113]][[""NAME""",DP03_0001PE,state,county]
2,"[""Dallas County","Texas""",2022625.0,48.0,"113]][[""NAME""",DP03_0004PE,state,county]
3,"[""Dallas County","Texas""",65.9,48.0,"113]][[""NAME""",DP03_0005PE,state,county]
4,"[""Dallas County","Texas""",2.9,48.0,"113]][[""NAME""",DP03_0062E,state,county]


In [25]:
#What does this file think the columns are?
df5.columns

Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')

In [363]:
# Getting rid of the columns we don't really need
df5.drop(axis = 1, columns = [0, 1, 3, 4, 6, 7], inplace= True)

In [27]:
# Sanity check: did it actually drop the columns (and did I drop the right ones)?
df5.head()

Unnamed: 0,2,5
0,,DP03_0001E
1,2022625.0,DP03_0001PE
2,2022625.0,DP03_0004PE
3,65.9,DP03_0005PE
4,2.9,DP03_0062E


In [364]:
# Renaming our columns to reflect values
df5.rename(columns = {2:'number', 5:'name'}, inplace = True)

In [29]:
df5.head()

Unnamed: 0,number,name
0,,DP03_0001E
1,2022625.0,DP03_0001PE
2,2022625.0,DP03_0004PE
3,65.9,DP03_0005PE
4,2.9,DP03_0062E


In [365]:
# Reordering our columns
cols = ['name', 'number'] 
df5= df5[cols]

In [31]:
# Did it reorder our df?
df5

Unnamed: 0,name,number
0,DP03_0001E,
1,DP03_0001PE,2022625.0
2,DP03_0004PE,2022625.0
3,DP03_0005PE,65.9
4,DP03_0062E,2.9
5,DP03_0118PE,59839.0
6,DP03_0127PE,35.3
7,DP05_0001E,26.8
8,DP05_0002E,2637772.0
9,DP05_0002PE,1301856.0


Awesome! We now have the info semi-better looking. Our biggest issue now is that the numbers column is shifted down by one value. (Ie: the number for each feature is actually on the row below.)

In [366]:
# Shifting the "number' column values up by one
df5.number = df5.number.shift(-1)

In [33]:
df5.head()

Unnamed: 0,name,number
0,DP03_0001E,2022625.0
1,DP03_0001PE,2022625.0
2,DP03_0004PE,65.9
3,DP03_0005PE,2.9
4,DP03_0062E,59839.0


In [367]:
# Before dealing with our decimal points we need to drop the new, empty row
df5.dropna(inplace = True)

In [368]:
# Getting rid of the decimal points in our numbers
df5['number']= df5['number'].astype(int)

In [36]:
#Sanity check: are our numbers whole numbers now?
df5.head()

Unnamed: 0,name,number
0,DP03_0001E,2022625
1,DP03_0001PE,2022625
2,DP03_0004PE,65
3,DP03_0005PE,2
4,DP03_0062E,59839


In [369]:
# Resetting index to name column
df5.set_index(df5['name'], inplace = True)

In [38]:
df5.head()

Unnamed: 0_level_0,name,number
name,Unnamed: 1_level_1,Unnamed: 2_level_1
DP03_0001E,DP03_0001E,2022625
DP03_0001PE,DP03_0001PE,2022625
DP03_0004PE,DP03_0004PE,65
DP03_0005PE,DP03_0005PE,2
DP03_0062E,DP03_0062E,59839


In [370]:
df5= df5.drop(axis= 1, columns= 'name').T

Before we can merge all of the subdata frames together we have to make this one match all of the others. This means we need the same ids, geography, and to fill in the info for each row with the general Dallas county info for this year!

In [171]:
df5.head()

name,DP03_0001E,DP03_0001PE,DP03_0004PE,DP03_0005PE,DP03_0062E,DP03_0118PE,DP03_0127PE,DP05_0001E,DP05_0002E,DP05_0002PE,...,DP05_0019PE,DP05_0020PE,DP05_0021PE,DP05_0022PE,DP05_0023PE,DP05_0024PE,DP05_0037PE,DP05_0039PE,DP05_0044PE,DP05_0071PE
number,2022625,2022625,65,2,59839,35,26,2637772,1301856,49,...,26,76,73,70,13,10,60,0,6,40


In [371]:
# Adding the columns I'm missing
df5['id']= pd.Series()
df5['id2'] = pd.Series()
df5['geography'] = pd.Series()

In [173]:
df5.head()

name,DP03_0001E,DP03_0001PE,DP03_0004PE,DP03_0005PE,DP03_0062E,DP03_0118PE,DP03_0127PE,DP05_0001E,DP05_0002E,DP05_0002PE,...,DP05_0022PE,DP05_0023PE,DP05_0024PE,DP05_0037PE,DP05_0039PE,DP05_0044PE,DP05_0071PE,id,id2,geography
number,2022625,2022625,65,2,59839,35,26,2637772,1301856,49,...,70,13,10,60,0,6,40,,,


In [174]:
# Getting a list of the column names so we can rename
df5.columns.tolist()

['DP03_0001E',
 'DP03_0001PE',
 'DP03_0004PE',
 'DP03_0005PE',
 'DP03_0062E',
 'DP03_0118PE',
 'DP03_0127PE',
 'DP05_0001E',
 'DP05_0002E',
 'DP05_0002PE',
 'DP05_0003E',
 'DP05_0003PE',
 'DP05_0018E',
 'DP05_0019PE',
 'DP05_0020PE',
 'DP05_0021PE',
 'DP05_0022PE',
 'DP05_0023PE',
 'DP05_0024PE',
 'DP05_0037PE',
 'DP05_0039PE',
 'DP05_0044PE',
 'DP05_0071PE',
 'id',
 'id2',
 'geography']

In [372]:
# Reordering df to match others
df5= df5[['id', 'id2', 'geography', 'DP05_0001E', 'DP05_0002E', 'DP05_0002PE', 'DP05_0003E', 'DP05_0003PE', 'DP05_0018E', 'DP05_0021PE',
                 'DP05_0022PE', 'DP05_0023PE','DP05_0024PE', 'DP05_0037PE', 'DP05_0038PE', 'DP05_0039PE', 'DP05_0044PE',
                 'DP05_0071PE','DP03_0001E','DP03_0001PE', 'DP03_0004PE', 'DP03_0005PE', 'DP03_0062E', 'DP03_0118PE', 'DP03_0127PE' ]]

In [210]:
df5

name,id,id2,geography,DP05_0001E,DP05_0002E,DP05_0002PE,DP05_0003E,DP05_0003PE,DP05_0018E,DP05_0021PE,...,DP05_0039PE,DP05_0044PE,DP05_0071PE,DP03_0001E,DP03_0001PE,DP03_0004PE,DP03_0005PE,DP03_0062E,DP03_0118PE,DP03_0127PE
number,,,,2637772,1301856,49,1335916,50,33,73,...,0,6,40,2022625,2022625,65,2,59839,35,26


In [212]:
# Renaming the columns to match others
df5_cols = {'DP05_0001E':'total_pop', 'DP05_0002E': 'male', 'DP05_0002PE': '%_male',
 'DP05_0003E':'female', 'DP05_0003PE': '%_female', 'DP05_0018E': 'median_age', 'DP05_0021PE' :'18_&_over', 
 'DP05_0022PE': '21_&_over', 'DP05_0023PE':'62_&_over', 'DP05_0024PE':'65_&_over',
 'DP05_0037PE': '%_white', 'DP05_0038PE':'%_black', 'DP05_0039PE': '%_native', 'DP05_0044PE': '%_asian', 
 'DP05_0071PE': '%_hispanic','DP03_0001E' : 'pop_over_16', 'DP03_0001PE': '%_pop_over_16',
 'DP03_0004PE': '%_employed', 'DP03_0005PE': '%_unemployed', 'DP03_0062E': 'median_household_income' ,
 'DP03_0118PE': '%_families_poverty', 'DP03_0127PE': '%_all_people_poverty'}

In [373]:
df5.rename(columns = df5_cols, inplace= True)

In [280]:
df5.head()

name,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_native,%_asian,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,median_household_income,%_families_poverty,%_all_people_poverty
number,,,,2637772,1301856,49,1335916,50,33,73,...,0,6,40,2022625,2022625,65,2,59839,35,26


In [350]:
df5_dict= df5.to_dict()

In [351]:
df5_dict

{'id': {'number': nan},
 'id2': {'number': nan},
 'geography': {'number': nan},
 'total_pop': {'number': 2637772},
 'male': {'number': 1301856},
 '%_male': {'number': 49},
 'female': {'number': 1335916},
 '%_female': {'number': 50},
 'median_age': {'number': 33},
 '18_&_over': {'number': 73},
 '21_&_over': {'number': 70},
 '62_&_over': {'number': 13},
 '65_&_over': {'number': 10},
 '%_white': {'number': 60},
 '%_black': {'number': 22},
 '%_native': {'number': 0},
 '%_asian': {'number': 6},
 '%_hispanic': {'number': 40},
 'pop_over_16': {'number': 2022625},
 '%_pop_over_16': {'number': 2022625},
 '%_employed': {'number': 65},
 '%_unemployed': {'number': 2},
 'median_household_income': {'number': 59839},
 '%_families_poverty': {'number': 35},
 '%_all_people_poverty': {'number': 26}}

In [377]:
# Let's make a new dataframe that we can copy our values into
df5_new= df4.copy()

In [378]:
# Filling in 2018 county data for all 
df5_new['total_pop']= 2637772 
df5_new['male']= 1301856
df5_new['%_male'] = 49
df5_new['female'] =1335916
df5_new['%_female'] = 50
df5_new['median_age'] = 33
df5_new['18_&_over'] = 73
df5_new['21_&_over'] = 70
df5_new['62_&_over'] = 13
df5_new['65_&_over'] = 10
df5_new['%_white'] = 60
df5_new['%_black'] = 22
df5_new['%_native'] =  0
df5_new['%_asian'] =  6
df5_new['%_hispanic'] = 40
df5_new['pop_over_16'] = 2022625
df5_new['%_employed'] = 65
df5_new['%_unemployed'] = 2
df5_new['median_household_income'] =59839
df5_new['%_families_poverty'] = 35
df5_new['%_all_people_poverty'] = 26

In [379]:
# Dropping an extra column
df5_new.drop(axis= 1, columns= 'median_household_income', inplace= True )

In [380]:
df5_new

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_asian,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year
0,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,3240,65,2,179297,35,26,2017
1,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,2358,65,2,176739,35,26,2017
2,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,3145,65,2,116022,35,26,2017
3,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,3614,65,2,155670,35,26,2017
4,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,4792,65,2,60299,35,26,2017
5,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,3360,65,2,75984,35,26,2017
6,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,1917,65,2,45824,35,26,2017
7,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,6938,65,2,54007,35,26,2017
8,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,5232,65,2,114163,35,26,2017
9,1400000US48113000601,48113000601,"Census Tract 6.01, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,5345,65,2,80825,35,26,2017


In [381]:
df5= df5_new.copy()

In [382]:
df5

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_asian,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year
0,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,3240,65,2,179297,35,26,2017
1,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,2358,65,2,176739,35,26,2017
2,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,3145,65,2,116022,35,26,2017
3,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,3614,65,2,155670,35,26,2017
4,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,4792,65,2,60299,35,26,2017
5,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,3360,65,2,75984,35,26,2017
6,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,1917,65,2,45824,35,26,2017
7,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,6938,65,2,54007,35,26,2017
8,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,5232,65,2,114163,35,26,2017
9,1400000US48113000601,48113000601,"Census Tract 6.01, Dallas County, Texas",2637772,1301856,49,1335916,50,33,73,...,6,40,2022625,5345,65,2,80825,35,26,2017


Beautiful! We can now add this dataset to the other datasets to get our overall census data!

## 5. Merging datasets/info!
[Return to outline](#Working-Project-Outline)

### 5.1: 2014 DP03 and DP05
[Return to outline](#Working-Project-Outline)

In [67]:
# Importing the 2014 data
dp0314= pd.read_csv('ACS_14_5YR_DP03_with_ann.csv', header = 1)

In [45]:
dp0314.head()

Unnamed: 0,Id,Id2,Geography,Estimate; EMPLOYMENT STATUS - Population 16 years and over,Margin of Error; EMPLOYMENT STATUS - Population 16 years and over,Percent; EMPLOYMENT STATUS - Population 16 years and over,Percent Margin of Error; EMPLOYMENT STATUS - Population 16 years and over,Estimate; EMPLOYMENT STATUS - Population 16 years and over - In labor force,Margin of Error; EMPLOYMENT STATUS - Population 16 years and over - In labor force,Percent; EMPLOYMENT STATUS - Population 16 years and over - In labor force,...,Percent; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over,Percent Margin of Error; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over,Estimate; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over - People in families,Margin of Error; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over - People in families,Percent; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over - People in families,Percent Margin of Error; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over - People in families,Estimate; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over - Unrelated individuals 15 years and over,Margin of Error; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over - Unrelated individuals 15 years and over,Percent; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over - Unrelated individuals 15 years and over,Percent Margin of Error; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over - Unrelated individuals 15 years and over
0,0500000US48113,48113,"Dallas County, Texas",1851637,1126,1851637,(X),1270697,4461,68.6,...,11.2,0.4,(X),(X),18.5,0.4,(X),(X),23.0,0.5
1,1400000US48001950100,48001950100,"Census Tract 9501, Anderson County, Texas",4252,391,4252,(X),2422,325,57.0,...,7.4,5.5,(X),(X),16.0,6.3,(X),(X),26.2,8.5
2,1400000US48001950401,48001950401,"Census Tract 9504.01, Anderson County, Texas",5789,496,5789,(X),187,93,3.2,...,0.0,100.0,(X),(X),28.6,33.3,(X),(X),41.4,29.0
3,1400000US48001950402,48001950402,"Census Tract 9504.02, Anderson County, Texas",5558,520,5558,(X),121,65,2.2,...,-,**,(X),(X),0.0,19.9,(X),(X),62.3,43.5
4,1400000US48001950500,48001950500,"Census Tract 9505, Anderson County, Texas",3366,387,3366,(X),1886,305,56.0,...,9.8,7.4,(X),(X),32.5,9.9,(X),(X),42.3,17.0


In [46]:
# What do our column names look like?
dp0314.columns

Index(['Id', 'Id2', 'Geography',
       'Estimate; EMPLOYMENT STATUS - Population 16 years and over',
       'Margin of Error; EMPLOYMENT STATUS - Population 16 years and over',
       'Percent; EMPLOYMENT STATUS - Population 16 years and over',
       'Percent Margin of Error; EMPLOYMENT STATUS - Population 16 years and over',
       'Estimate; EMPLOYMENT STATUS - Population 16 years and over - In labor force',
       'Margin of Error; EMPLOYMENT STATUS - Population 16 years and over - In labor force',
       'Percent; EMPLOYMENT STATUS - Population 16 years and over - In labor force',
       ...
       'Percent; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over',
       'Percent Margin of Error; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL - 65 years and over',
       'Estimate; PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY L

In [3]:
# Creating a resuable function to help us clean up all column names 
def clean_columns(name):
    old_column_names= list(name.columns)
    new_column_names= [str(x).lower().replace(' ', '_').replace(';', '_').replace('-', '').replace('__', '_') for x in list(name.columns)]
    return name.rename(columns= dict(zip(old_column_names, new_column_names)), inplace = True)

In [68]:
clean_columns(dp0314)

In [5]:
# Sanity check: did our column names turn out like we wanted?
dp0314.head()

Unnamed: 0,id,id2,geography,estimate_employment_status_population_16_years_and_over,margin_of_error_employment_status_population_16_years_and_over,percent_employment_status_population_16_years_and_over,percent_margin_of_error_employment_status_population_16_years_and_over,estimate_employment_status_population_16_years_and_over_in_labor_force,margin_of_error_employment_status_population_16_years_and_over_in_labor_force,percent_employment_status_population_16_years_and_over_in_labor_force,...,percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over,percent_margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over,estimate_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_people_in_families,margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_people_in_families,percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_people_in_families,percent_margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_people_in_families,estimate_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_unrelated_individuals_15_years_and_over,margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_unrelated_individuals_15_years_and_over,percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_unrelated_individuals_15_years_and_over,percent_margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_unrelated_individuals_15_years_and_over
0,0500000US48113,48113,"Dallas County, Texas",1851637,1126,1851637,(X),1270697,4461,68.6,...,11.2,0.4,(X),(X),18.5,0.4,(X),(X),23.0,0.5
1,1400000US48001950100,48001950100,"Census Tract 9501, Anderson County, Texas",4252,391,4252,(X),2422,325,57.0,...,7.4,5.5,(X),(X),16.0,6.3,(X),(X),26.2,8.5
2,1400000US48001950401,48001950401,"Census Tract 9504.01, Anderson County, Texas",5789,496,5789,(X),187,93,3.2,...,0.0,100.0,(X),(X),28.6,33.3,(X),(X),41.4,29.0
3,1400000US48001950402,48001950402,"Census Tract 9504.02, Anderson County, Texas",5558,520,5558,(X),121,65,2.2,...,-,**,(X),(X),0.0,19.9,(X),(X),62.3,43.5
4,1400000US48001950500,48001950500,"Census Tract 9505, Anderson County, Texas",3366,387,3366,(X),1886,305,56.0,...,9.8,7.4,(X),(X),32.5,9.9,(X),(X),42.3,17.0


We have waaaaaayyyy too much information in this data set (all of Texas counties!) Let's start whittling this down to just Dallas County.

## Keeping Dallas County

In [69]:
# Getting rid of any counties outside of Dallas County
dp0314 = dp0314[dp0314['geography'].str.contains('Dallas County')]

In [7]:
# Sanity Check: Did it only keep Dallas County?
dp0314

Unnamed: 0,id,id2,geography,estimate_employment_status_population_16_years_and_over,margin_of_error_employment_status_population_16_years_and_over,percent_employment_status_population_16_years_and_over,percent_margin_of_error_employment_status_population_16_years_and_over,estimate_employment_status_population_16_years_and_over_in_labor_force,margin_of_error_employment_status_population_16_years_and_over_in_labor_force,percent_employment_status_population_16_years_and_over_in_labor_force,...,percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over,percent_margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over,estimate_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_people_in_families,margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_people_in_families,percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_people_in_families,percent_margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_people_in_families,estimate_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_unrelated_individuals_15_years_and_over,margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_unrelated_individuals_15_years_and_over,percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_unrelated_individuals_15_years_and_over,percent_margin_of_error_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_65_years_and_over_unrelated_individuals_15_years_and_over
0,0500000US48113,48113,"Dallas County, Texas",1851637,1126,1851637,(X),1270697,4461,68.6,...,11.2,0.4,(X),(X),18.5,0.4,(X),(X),23.0,0.5
1036,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3234,269,3234,(X),2439,213,75.4,...,4.0,6.2,(X),(X),5.1,5.6,(X),(X),11.5,7.0
1037,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2288,141,2288,(X),1630,178,71.2,...,5.9,6.9,(X),(X),2.8,4.2,(X),(X),11.5,8.9
1038,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",2926,222,2926,(X),2587,217,88.4,...,14.1,22.9,(X),(X),1.3,2.2,(X),(X),5.2,3.3
1039,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",3721,313,3721,(X),3110,302,83.6,...,2.0,3.9,(X),(X),2.0,2.4,(X),(X),15.4,9.6
1040,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",4724,488,4724,(X),2706,288,57.3,...,40.0,21.1,(X),(X),23.9,14.6,(X),(X),49.5,10.0
1041,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",3206,304,3206,(X),2265,215,70.6,...,5.4,6.0,(X),(X),18.9,15.5,(X),(X),10.7,4.7
1042,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",1817,306,1817,(X),1433,248,78.9,...,30.6,38.3,(X),(X),33.3,17.6,(X),(X),34.0,10.3
1043,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",5922,506,5922,(X),3934,439,66.4,...,7.5,10.2,(X),(X),20.2,10.8,(X),(X),32.9,16.0
1044,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",4160,399,4160,(X),3434,397,82.5,...,7.4,8.5,(X),(X),29.3,17.9,(X),(X),12.3,4.2


Beautiful! Let's now get rid of all columns except the same ones we pulled for the 2018API DP03 earlier!


In [12]:
dp0314.columns.tolist()

['id',
 'id2',
 'geography',
 'estimate_employment_status_population_16_years_and_over',
 'margin_of_error_employment_status_population_16_years_and_over',
 'percent_employment_status_population_16_years_and_over',
 'percent_margin_of_error_employment_status_population_16_years_and_over',
 'estimate_employment_status_population_16_years_and_over_in_labor_force',
 'margin_of_error_employment_status_population_16_years_and_over_in_labor_force',
 'percent_employment_status_population_16_years_and_over_in_labor_force',
 'percent_margin_of_error_employment_status_population_16_years_and_over_in_labor_force',
 'estimate_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force',
 'margin_of_error_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force',
 'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force',
 'percent_margin_of_error_employment_status_population_16_years_and_over_in_labor_force_civilia

## Keeping specific columns

In [70]:
dp0314 = dp0314[['id',
 'id2',
 'geography',
 'estimate_employment_status_population_16_years_and_over',
 'percent_employment_status_population_16_years_and_over',
 'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_employed',
 'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_unemployed',
'estimate_income_and_benefits_(in_2014_inflationadjusted_dollars)_total_households_mean_household_income_(dollars)',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_families',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_people'
]]



In [71]:
dp0314.head()

Unnamed: 0,id,id2,geography,estimate_employment_status_population_16_years_and_over,percent_employment_status_population_16_years_and_over,percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_employed,percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_unemployed,estimate_income_and_benefits_(in_2014_inflationadjusted_dollars)_total_households_mean_household_income_(dollars),percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_families,percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_people
0,0500000US48113,48113,"Dallas County, Texas",1851637,1851637,62.7,5.8,73982,15.9,19.3
1036,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3234,3234,71.8,3.6,162983,4.4,6.8
1037,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2288,2288,70.2,1.0,130061,4.1,5.1
1038,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",2926,2926,84.3,4.1,120322,1.6,2.9
1039,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",3721,3721,80.2,3.4,130318,2.7,8.3


## Renaming columns

In [72]:
# Making columns easier to read
dp03_rn_cols = {'estimate_employment_status_population_16_years_and_over':'pop_over_16',
'percent_employment_status_population_16_years_and_over':'%_pop_over_16',
'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_employed' :'%_employed',
'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_unemployed': '%_unemployed',
'estimate_income_and_benefits_(in_2014_inflationadjusted_dollars)_total_households_mean_household_income_(dollars)'
: 'mean_household_income',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_families': '%_families_poverty',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_people':'%_all_people_poverty'}


In [73]:
dp0314.rename(columns = dp03_rn_cols, inplace = True)

In [20]:
dp0314.head()

Unnamed: 0,id,id2,geography,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty
0,0500000US48113,48113,"Dallas County, Texas",1851637,1851637,62.7,5.8,73982,15.9,19.3
1036,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3234,3234,71.8,3.6,162983,4.4,6.8
1037,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2288,2288,70.2,1.0,130061,4.1,5.1
1038,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",2926,2926,84.3,4.1,120322,1.6,2.9
1039,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",3721,3721,80.2,3.4,130318,2.7,8.3


In [74]:
# Since the first row is all of Dallas county dropping it
dp0314.drop(dp0314.index[0], inplace = True)

Note: each year's data should have 529 rows!

In [22]:
dp0314

Unnamed: 0,id,id2,geography,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty
1036,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3234,3234,71.8,3.6,162983,4.4,6.8
1037,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2288,2288,70.2,1.0,130061,4.1,5.1
1038,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",2926,2926,84.3,4.1,120322,1.6,2.9
1039,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",3721,3721,80.2,3.4,130318,2.7,8.3
1040,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",4724,4724,53.9,3.2,56284,23.6,35.8
1041,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",3206,3206,67.3,3.3,64360,13.6,15.0
1042,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",1817,1817,74.5,4.3,40875,24.9,33.6
1043,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",5922,5922,62.0,4.5,53948,16.6,23.1
1044,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",4160,4160,78.8,3.7,90279,22.4,17.9
1045,1400000US48113000601,48113000601,"Census Tract 6.01, Dallas County, Texas",5123,5123,68.6,4.8,69227,20.8,27.9


## Importing dp05 for same year

In [75]:
# Time to work with the 2014 DP05
dp0514= pd.read_csv('ACS_14_5YR_DP05_with_ann.csv', header = 1, low_memory= False)

In [25]:
dp0514

Unnamed: 0,Id,Id2,Geography,Estimate; SEX AND AGE - Total population,Margin of Error; SEX AND AGE - Total population,Percent; SEX AND AGE - Total population,Percent Margin of Error; SEX AND AGE - Total population,Estimate; SEX AND AGE - Total population - Male,Margin of Error; SEX AND AGE - Total population - Male,Percent; SEX AND AGE - Total population - Male,...,Percent; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino - Two or more races - Two races including Some other race,Percent Margin of Error; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino - Two or more races - Two races including Some other race,"Estimate; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino - Two or more races - Two races excluding Some other race, and Three or more races","Margin of Error; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino - Two or more races - Two races excluding Some other race, and Three or more races","Percent; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino - Two or more races - Two races excluding Some other race, and Three or more races","Percent Margin of Error; HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino - Two or more races - Two races excluding Some other race, and Three or more races",Estimate; HISPANIC OR LATINO AND RACE - Total housing units,Margin of Error; HISPANIC OR LATINO AND RACE - Total housing units,Percent; HISPANIC OR LATINO AND RACE - Total housing units,Percent Margin of Error; HISPANIC OR LATINO AND RACE - Total housing units
0,1400000US48001950100,48001950100,"Census Tract 9501, Anderson County, Texas",5142,463,5142,(X),2565,294,49.9,...,0.0,0.7,16,25,0.3,0.5,2371,127,(X),(X)
1,1400000US48001950401,48001950401,"Census Tract 9504.01, Anderson County, Texas",5842,494,5842,(X),5717,496,97.9,...,0.1,0.2,0,18,0.0,0.6,106,54,(X),(X)
2,1400000US48001950402,48001950402,"Census Tract 9504.02, Anderson County, Texas",5586,530,5586,(X),5474,508,98.0,...,0.0,0.7,11,21,0.2,0.4,98,39,(X),(X)
3,1400000US48001950500,48001950500,"Census Tract 9505, Anderson County, Texas",4666,500,4666,(X),2128,290,45.6,...,0.0,0.8,49,48,1.1,1.0,1610,135,(X),(X)
4,1400000US48001950600,48001950600,"Census Tract 9506, Anderson County, Texas",6480,579,6480,(X),3475,468,53.6,...,0.0,0.6,95,129,1.5,2.0,2432,168,(X),(X)
5,1400000US48001950700,48001950700,"Census Tract 9507, Anderson County, Texas",2606,330,2606,(X),1361,230,52.2,...,0.0,0.1,73,49,2.8,1.8,1090,79,(X),(X)
6,1400000US48001950800,48001950800,"Census Tract 9508, Anderson County, Texas",4937,467,4937,(X),2503,363,50.7,...,0.0,0.8,114,80,2.3,1.7,2225,149,(X),(X)
7,1400000US48001950901,48001950901,"Census Tract 9509.01, Anderson County, Texas",5659,545,5659,(X),3455,372,61.1,...,0.0,0.7,39,27,0.7,0.5,2181,133,(X),(X)
8,1400000US48001950902,48001950902,"Census Tract 9509.02, Anderson County, Texas",5029,676,5029,(X),2670,460,53.1,...,0.0,0.7,46,39,0.9,0.8,2310,126,(X),(X)
9,1400000US48001951000,48001951000,"Census Tract 9510, Anderson County, Texas",7385,612,7385,(X),3795,345,51.4,...,0.2,0.3,78,62,1.1,0.8,3346,183,(X),(X)


In [76]:
# Cleaning up column names
clean_columns(dp0514)

In [77]:
# Getting rid of any counties outside of Dallas County
dp0514 = dp0514[dp0514['geography'].str.contains('Dallas County')]

In [28]:
dp0514

Unnamed: 0,id,id2,geography,estimate_sex_and_age_total_population,margin_of_error_sex_and_age_total_population,percent_sex_and_age_total_population,percent_margin_of_error_sex_and_age_total_population,estimate_sex_and_age_total_population_male,margin_of_error_sex_and_age_total_population_male,percent_sex_and_age_total_population_male,...,percent_hispanic_or_latino_and_race_total_population_not_hispanic_or_latino_two_or_more_races_two_races_including_some_other_race,percent_margin_of_error_hispanic_or_latino_and_race_total_population_not_hispanic_or_latino_two_or_more_races_two_races_including_some_other_race,"estimate_hispanic_or_latino_and_race_total_population_not_hispanic_or_latino_two_or_more_races_two_races_excluding_some_other_race,_and_three_or_more_races","margin_of_error_hispanic_or_latino_and_race_total_population_not_hispanic_or_latino_two_or_more_races_two_races_excluding_some_other_race,_and_three_or_more_races","percent_hispanic_or_latino_and_race_total_population_not_hispanic_or_latino_two_or_more_races_two_races_excluding_some_other_race,_and_three_or_more_races","percent_margin_of_error_hispanic_or_latino_and_race_total_population_not_hispanic_or_latino_two_or_more_races_two_races_excluding_some_other_race,_and_three_or_more_races",estimate_hispanic_or_latino_and_race_total_housing_units,margin_of_error_hispanic_or_latino_and_race_total_housing_units,percent_hispanic_or_latino_and_race_total_housing_units,percent_margin_of_error_hispanic_or_latino_and_race_total_housing_units
1035,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3923,318,3923,(X),1974,293,50.3,...,0.0,0.9,50,62,1.3,1.6,2033,29,(X),(X)
1036,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,218,2927,(X),1396,165,47.7,...,0.3,0.5,6,10,0.2,0.3,1450,58,(X),(X)
1037,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3465,253,3465,(X),1762,217,50.9,...,0.0,1.1,92,49,2.7,1.4,2029,36,(X),(X)
1038,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4243,403,4243,(X),2415,399,56.9,...,0.0,0.9,69,53,1.6,1.3,2407,91,(X),(X)
1039,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,620,5816,(X),3281,516,56.4,...,0.2,0.3,150,74,2.6,1.2,1905,74,(X),(X)
1040,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",3630,453,3630,(X),2138,252,58.9,...,0.0,1.0,35,36,1.0,1.0,2020,59,(X),(X)
1041,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",2222,431,2222,(X),1306,279,58.8,...,0.0,1.7,156,211,7.0,8.7,1388,67,(X),(X)
1042,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",7824,956,7824,(X),4232,521,54.1,...,0.0,0.5,86,109,1.1,1.4,2694,96,(X),(X)
1043,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",4441,455,4441,(X),2718,455,61.2,...,0.0,0.8,175,89,3.9,2.1,3418,190,(X),(X)
1044,1400000US48113000601,48113000601,"Census Tract 6.01, Dallas County, Texas",6450,927,6450,(X),3553,630,55.1,...,0.0,0.6,225,152,3.5,2.4,3333,107,(X),(X)


In [29]:
dp0514.columns.to_list()

['id',
 'id2',
 'geography',
 'estimate_sex_and_age_total_population',
 'margin_of_error_sex_and_age_total_population',
 'percent_sex_and_age_total_population',
 'percent_margin_of_error_sex_and_age_total_population',
 'estimate_sex_and_age_total_population_male',
 'margin_of_error_sex_and_age_total_population_male',
 'percent_sex_and_age_total_population_male',
 'percent_margin_of_error_sex_and_age_total_population_male',
 'estimate_sex_and_age_total_population_female',
 'margin_of_error_sex_and_age_total_population_female',
 'percent_sex_and_age_total_population_female',
 'percent_margin_of_error_sex_and_age_total_population_female',
 'estimate_sex_and_age_under_5_years',
 'margin_of_error_sex_and_age_under_5_years',
 'percent_sex_and_age_under_5_years',
 'percent_margin_of_error_sex_and_age_under_5_years',
 'estimate_sex_and_age_5_to_9_years',
 'margin_of_error_sex_and_age_5_to_9_years',
 'percent_sex_and_age_5_to_9_years',
 'percent_margin_of_error_sex_and_age_5_to_9_years',
 'esti

In [78]:
dp05_keep = ['id', 'id2', 'geography', 'estimate_sex_and_age_total_population',  'estimate_sex_and_age_total_population_male',
 'percent_sex_and_age_total_population_male', 'estimate_sex_and_age_total_population_female',
 'percent_sex_and_age_total_population_female','estimate_sex_and_age_median_age_(years)',
'percent_sex_and_age_18_years_and_over','percent_sex_and_age_21_years_and_over',
'percent_sex_and_age_62_years_and_over','percent_sex_and_age_65_years_and_over',
'percent_race_one_race_white','percent_race_one_race_black_or_african_american',
'estimate_race_one_race_american_indian_and_alaska_native','percent_race_one_race_asian',
'percent_hispanic_or_latino_and_race_total_population_hispanic_or_latino_(of_any_race)',
]

In [79]:
dp0514= dp0514[dp05_keep]

In [80]:
dp0514.head()

Unnamed: 0,id,id2,geography,estimate_sex_and_age_total_population,estimate_sex_and_age_total_population_male,percent_sex_and_age_total_population_male,estimate_sex_and_age_total_population_female,percent_sex_and_age_total_population_female,estimate_sex_and_age_median_age_(years),percent_sex_and_age_18_years_and_over,percent_sex_and_age_21_years_and_over,percent_sex_and_age_62_years_and_over,percent_sex_and_age_65_years_and_over,percent_race_one_race_white,percent_race_one_race_black_or_african_american,estimate_race_one_race_american_indian_and_alaska_native,percent_race_one_race_asian,percent_hispanic_or_latino_and_race_total_population_hispanic_or_latino_(of_any_race)
1035,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3923,1974,50.3,1949,49.7,39.7,80.8,78.7,11.9,8.4,89.7,7.2,23,0.3,11.2
1036,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,1396,47.7,1531,52.3,36.6,77.1,76.7,17.3,10.4,96.7,0.3,0,0.3,14.4
1037,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3465,1762,50.9,1703,49.1,35.3,83.7,83.7,7.5,5.5,89.4,2.5,5,2.0,15.0
1038,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4243,2415,56.9,1828,43.1,32.1,87.5,87.1,9.8,7.1,96.3,1.2,0,0.9,10.6
1039,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,3281,56.4,2535,43.6,32.3,80.0,77.7,6.3,4.5,36.6,15.1,0,13.4,47.0


In [81]:
# Making column names prettier
dp05_cols= {'estimate_sex_and_age_total_population':'total_pop','estimate_sex_and_age_total_population_male':'male',
            'percent_sex_and_age_total_population_male' :'%_male', 'estimate_sex_and_age_total_population_female' : 'female',
            'percent_sex_and_age_total_population_female': '%_female', 'estimate_sex_and_age_median_age_(years)' : 'median_age', 
            'percent_sex_and_age_18_years_and_over': '18_&_over', 'percent_sex_and_age_21_years_and_over':'21_&_over',
            'percent_sex_and_age_62_years_and_over' : '62_&_over', 'percent_sex_and_age_65_years_and_over': '65_&_over',
            'percent_race_one_race_white' :'%_white', 'percent_race_one_race_black_or_african_american' : '%_black',
            'estimate_race_one_race_american_indian_and_alaska_native' :  '%_native', 'percent_race_one_race_asian' : '%_asian',
            'percent_hispanic_or_latino_and_race_total_population_hispanic_or_latino_(of_any_race)' : '%_hispanic'}

In [82]:
dp0514 = dp0514.rename(columns = dp05_cols)

In [38]:
dp0514

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,21_&_over,62_&_over,65_&_over,%_white,%_black,%_native,%_asian,%_hispanic
1035,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3923,1974,50.3,1949,49.7,39.7,80.8,78.7,11.9,8.4,89.7,7.2,23,0.3,11.2
1036,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,1396,47.7,1531,52.3,36.6,77.1,76.7,17.3,10.4,96.7,0.3,0,0.3,14.4
1037,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3465,1762,50.9,1703,49.1,35.3,83.7,83.7,7.5,5.5,89.4,2.5,5,2.0,15.0
1038,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4243,2415,56.9,1828,43.1,32.1,87.5,87.1,9.8,7.1,96.3,1.2,0,0.9,10.6
1039,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,3281,56.4,2535,43.6,32.3,80.0,77.7,6.3,4.5,36.6,15.1,0,13.4,47.0
1040,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",3630,2138,58.9,1492,41.1,37.8,85.2,83.3,14.8,9.2,59.6,2.8,0,11.3,49.1
1041,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",2222,1306,58.8,916,41.2,32.0,81.1,75.9,5.3,3.2,29.2,22.4,13,13.3,41.0
1042,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",7824,4232,54.1,3592,45.9,29.9,73.3,69.3,10.7,8.2,56.4,3.2,0,7.4,72.6
1043,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",4441,2718,61.2,1723,38.8,41.6,93.7,91.4,12.5,8.7,56.9,7.5,49,3.4,41.9
1044,1400000US48113000601,48113000601,"Census Tract 6.01, Dallas County, Texas",6450,3553,55.1,2897,44.9,30.8,76.7,74.4,9.0,6.8,54.2,7.1,0,3.1,53.7


In [83]:
# Mergine these two tables for this year's info
df1= pd.merge(dp0514,dp0314, how = 'inner', on = 'id2')

In [84]:
# Sanity check- did these merge right?
df1

Unnamed: 0,id_x,id2,geography_x,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_hispanic,id_y,geography_y,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty
0,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3923,1974,50.3,1949,49.7,39.7,80.8,...,11.2,1400000US48113000100,"Census Tract 1, Dallas County, Texas",3234,3234,71.8,3.6,162983,4.4,6.8
1,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,1396,47.7,1531,52.3,36.6,77.1,...,14.4,1400000US48113000201,"Census Tract 2.01, Dallas County, Texas",2288,2288,70.2,1.0,130061,4.1,5.1
2,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3465,1762,50.9,1703,49.1,35.3,83.7,...,15.0,1400000US48113000202,"Census Tract 2.02, Dallas County, Texas",2926,2926,84.3,4.1,120322,1.6,2.9
3,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4243,2415,56.9,1828,43.1,32.1,87.5,...,10.6,1400000US48113000300,"Census Tract 3, Dallas County, Texas",3721,3721,80.2,3.4,130318,2.7,8.3
4,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,3281,56.4,2535,43.6,32.3,80.0,...,47.0,1400000US48113000401,"Census Tract 4.01, Dallas County, Texas",4724,4724,53.9,3.2,56284,23.6,35.8
5,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",3630,2138,58.9,1492,41.1,37.8,85.2,...,49.1,1400000US48113000404,"Census Tract 4.04, Dallas County, Texas",3206,3206,67.3,3.3,64360,13.6,15.0
6,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",2222,1306,58.8,916,41.2,32.0,81.1,...,41.0,1400000US48113000405,"Census Tract 4.05, Dallas County, Texas",1817,1817,74.5,4.3,40875,24.9,33.6
7,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",7824,4232,54.1,3592,45.9,29.9,73.3,...,72.6,1400000US48113000406,"Census Tract 4.06, Dallas County, Texas",5922,5922,62.0,4.5,53948,16.6,23.1
8,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",4441,2718,61.2,1723,38.8,41.6,93.7,...,41.9,1400000US48113000500,"Census Tract 5, Dallas County, Texas",4160,4160,78.8,3.7,90279,22.4,17.9
9,1400000US48113000601,48113000601,"Census Tract 6.01, Dallas County, Texas",6450,3553,55.1,2897,44.9,30.8,76.7,...,53.7,1400000US48113000601,"Census Tract 6.01, Dallas County, Texas",5123,5123,68.6,4.8,69227,20.8,27.9


In [85]:
# Getting rid of extra id and geography columns
df1.drop(axis= 1, columns= ['id_y', 'geography_y'], inplace = True )

In [86]:
# Rename id and geography
df1.rename(columns = {'id_x' : 'id', 'geography_x' : 'geography'}, inplace = True)

In [87]:
# Create a year column and assign 2014 to this set for later merging
df1['year'] = 2014

In [88]:
# Sanity check: does everything look good so far?
df1.head()

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_asian,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year
0,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3923,1974,50.3,1949,49.7,39.7,80.8,...,0.3,11.2,3234,3234,71.8,3.6,162983,4.4,6.8,2014
1,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,1396,47.7,1531,52.3,36.6,77.1,...,0.3,14.4,2288,2288,70.2,1.0,130061,4.1,5.1,2014
2,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3465,1762,50.9,1703,49.1,35.3,83.7,...,2.0,15.0,2926,2926,84.3,4.1,120322,1.6,2.9,2014
3,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4243,2415,56.9,1828,43.1,32.1,87.5,...,0.9,10.6,3721,3721,80.2,3.4,130318,2.7,8.3,2014
4,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,3281,56.4,2535,43.6,32.3,80.0,...,13.4,47.0,4724,4724,53.9,3.2,56284,23.6,35.8,2014


### 5.2: 2015 Data
[Return to outline](#Working-Project-Outline)

In [89]:
# Importing DP03 for this year (header = 1)
dp0315= pd.read_csv('ACS_15_5YR_DP03_with_ann.csv', header = 1)

In [90]:
# Cleaning up column names, renaming columns, & dropping columns I don't need
clean_columns(dp0315)


In [91]:
# Keeping only the columns we want
dp0315= dp0315[['id',
 'id2',
 'geography',
 'estimate_employment_status_population_16_years_and_over',
 'percent_employment_status_population_16_years_and_over',
 'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_employed',
 'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_unemployed',
'estimate_income_and_benefits_(in_2015_inflationadjusted_dollars)_total_households_mean_household_income_(dollars)',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_families',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_people',
]]

In [None]:
dp0315

In [94]:
# Renaming columns
dp0315_cols = {'estimate_employment_status_population_16_years_and_over':'pop_over_16',
'percent_employment_status_population_16_years_and_over':'%_pop_over_16',
'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_employed' :'%_employed',
'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_unemployed': '%_unemployed',
'estimate_income_and_benefits_(in_2015_inflationadjusted_dollars)_total_households_mean_household_income_(dollars)'
: 'mean_household_income',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_families': '%_families_poverty',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_people':'%_all_people_poverty'}

dp0315.rename(columns = dp0315_cols, inplace= True)

In [95]:
#keeping only Dallas county 
dp0315 = dp0315[dp0315['geography'].str.contains('Dallas County')]

In [96]:
# Importing DP05 for this year (header = 1)
dp0515= pd.read_csv('ACS_15_5YR_DP05_with_ann.csv', header = 1, low_memory = False)

In [97]:
# Cleaning up column names, renaming columns, & dropping what I don't need
clean_columns(dp0515)

In [98]:
# Keeping only columns we need
dp0515= dp0515[dp05_keep]

In [99]:
# Renaming columns for easier understanding
dp0515.rename(columns = dp05_cols, inplace = True)

In [100]:
# Keeping only Dallas County
dp0515 = dp0515[dp0515['geography'].str.contains('Dallas County')]

In [101]:
# Merging  two dataframes into one(DP05 first) on id2 & dropping extra columns
df2= pd.merge(dp0515, dp0315, how = 'inner', on= 'id2')

In [102]:
# Getting rid of repeat columns from merge
df2.drop(axis= 1, columns=['id_y', 'geography_y'], inplace = True)

In [103]:
# Rename id and geography
df2.rename(columns = {'id_x' : 'id', 'geography_x' : 'geography'}, inplace = True)

In [104]:
# Adding year column with this year
df2['year']= 2015

In [105]:
df2.head()

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_asian,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year
0,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3951,2099,53.1,1852,46.9,40.0,79.9,...,2.4,10.5,3236,3236,72.0,3.3,183048,2.8,6.5,2015
1,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2907,1378,47.4,1529,52.6,37.1,77.5,...,0.3,16.6,2308,2308,71.7,1.3,133584,0.0,3.6,2015
2,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3759,1921,51.1,1838,48.9,34.4,82.5,...,2.3,17.6,3118,3118,82.7,2.3,112295,2.0,3.7,2015
3,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4316,2504,58.0,1812,42.0,33.7,85.4,...,3.3,6.0,3698,3698,79.4,0.0,150866,0.0,8.1,2015
4,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",6430,3469,54.0,2961,46.0,32.1,77.2,...,8.5,50.0,5052,5052,54.7,2.2,56120,20.6,34.1,2015


Awesome! We are done with our 2015 data and can move on to 2016!

### 5.3: 2016 Data
[Return to outline](#Working-Project-Outline)

In [106]:
# Importing DP03 for this year (header = 1)
dp0316= pd.read_csv('ACS_16_5YR_DP03_with_ann.csv', header= 1)

In [107]:
# Cleaning up column names, renaming columns, & dropping columns I don't need
clean_columns(dp0316)


# Keeping only these columns
dp0316 = dp0316[['id',
 'id2',
 'geography',
 'estimate_employment_status_population_16_years_and_over',
 'percent_employment_status_population_16_years_and_over',
 'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_employed',
 'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_unemployed',
'estimate_income_and_benefits_(in_2016_inflationadjusted_dollars)_total_households_mean_household_income_(dollars)',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_families',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_people']]

In [109]:
# Renaming columns
dp0316_cols = {'estimate_employment_status_population_16_years_and_over':'pop_over_16',
'percent_employment_status_population_16_years_and_over':'%_pop_over_16',
'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_employed' :'%_employed',
'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_unemployed': '%_unemployed',
'estimate_income_and_benefits_(in_2016_inflationadjusted_dollars)_total_households_mean_household_income_(dollars)'
: 'mean_household_income',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_families': '%_families_poverty',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_people':'%_all_people_poverty'
}

dp0316.rename(columns = dp0316_cols, inplace= True)

In [110]:
#keeping only Dallas county 
dp0316 = dp0316[dp0316['geography'].str.contains('Dallas County')]

In [111]:
# Drop general Dallas county row
dp0316.drop(dp0316.index[0], inplace = True)

In [112]:
# Importing DP05 for this year (header = 1)
dp0516 = pd.read_csv('ACS_16_5YR_DP05_with_ann.csv', header= 1, low_memory = False)

In [113]:
# Cleaning up column names, renaming columns, & dropping what I don't need
clean_columns(dp0516)

In [114]:
# Keeping only columns we need
dp0516= dp0516[dp05_keep]

# Renaming columns for easier understanding
dp0516.rename(columns = dp05_cols, inplace = True)

In [115]:
# Keeping only Dallas County
dp0516= dp0516[dp0516['geography'].str.contains('Dallas County')]

In [116]:
# Merging  two dataframes into one(DP05 first) on id2 & dropping extra columns
df3= pd.merge(dp0516, dp0316, how= 'inner', on = 'id2')

In [117]:
# Getting rid of repeat columns
df3.drop(axis= 1, columns= ['id_y', 'geography_y'], inplace = True)

In [118]:
# Rename id and geography
df3.rename(columns = {'id_x' : 'id', 'geography_x' : 'geography'}, inplace = True)

In [119]:
# Adding year column with this year
df3['year']= 2016

In [120]:
df3

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_asian,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year
0,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",4081,2148,52.6,1933,47.4,39.5,79.8,...,2.8,10.8,3332,3332,72.9,2.9,182203,2.4,6.9,2016
1,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2850,1331,46.7,1519,53.3,38.2,78.1,...,0.0,13.3,2277,2277,79.1,0.9,154738,0.0,3.1,2016
2,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3789,1922,50.7,1867,49.3,33.7,82.2,...,2.2,17.9,3129,3129,79.7,2.0,109847,3.5,6.1,2016
3,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4288,2294,53.5,1994,46.5,33.8,84.8,...,3.7,6.8,3647,3647,81.5,0.0,141231,0.0,4.9,2016
4,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",6131,3356,54.7,2775,45.3,32.2,77.8,...,6.8,47.8,4860,4860,54.2,2.8,54245,18.8,36.2,2016
5,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",3682,2195,59.6,1487,40.4,36.5,89.6,...,9.1,44.8,3337,3337,69.5,5.0,72949,6.8,12.4,2016
6,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",2206,1319,59.8,887,40.2,37.2,85.8,...,8.8,49.5,1921,1921,75.4,7.7,42432,23.4,29.2,2016
7,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",7640,4036,52.8,3604,47.2,31.0,80.4,...,8.2,64.9,6217,6217,64.4,3.2,54379,11.6,15.6,2016
8,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",4969,2882,58.0,2087,42.0,40.2,98.7,...,5.7,23.1,4904,4904,84.0,1.8,121711,1.4,8.7,2016
9,1400000US48113000601,48113000601,"Census Tract 6.01, Dallas County, Texas",6167,3396,55.1,2771,44.9,32.0,79.6,...,4.1,43.4,5089,5089,74.4,1.9,79620,12.1,19.9,2016


Moving on to our last year- 2017!

### 5.4: 2017 Data
[Return to outline](#Working-Project-Outline)

In [232]:
# Importing DP03 for this year (header = 1)
dp0317= pd.read_csv('ACS_17_5YR_DP03_with_ann.csv', header= 1)

In [233]:
# Cleaning up column names, renaming columns, & dropping columns I don't need
clean_columns(dp0317)

# Keeping only these columns
dp0317 = dp0317[['id',
 'id2',
 'geography',
 'estimate_employment_status_population_16_years_and_over',
 'percent_employment_status_population_16_years_and_over',
 'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_employed',
 'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_unemployed',
'estimate_income_and_benefits_(in_2017_inflationadjusted_dollars)_total_households_mean_household_income_(dollars)',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_families',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_people'
]]

In [234]:
# Renaming columns
dp0317_cols = {'estimate_employment_status_population_16_years_and_over':'pop_over_16',
'percent_employment_status_population_16_years_and_over':'%_pop_over_16',
'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_employed' :'%_employed',
'percent_employment_status_population_16_years_and_over_in_labor_force_civilian_labor_force_unemployed': '%_unemployed',
'estimate_income_and_benefits_(in_2017_inflationadjusted_dollars)_total_households_mean_household_income_(dollars)'
: 'mean_household_income',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_families': '%_families_poverty',
'percent_percentage_of_families_and_people_whose_income_in_the_past_12_months_is_below_the_poverty_level_all_people':'%_all_people_poverty'
}

dp0317.rename(columns = dp0317_cols, inplace= True)

In [235]:
#keeping only Dallas county 
dp0317= dp0317[dp0317['geography'].str.contains('Dallas County')]

In [236]:
# Getting rid of Dallas county general row
dp0317.drop(dp0317.index[0], inplace = True)

In [237]:
# Importing DP05 for this year (header = 1)
dp0517= pd.read_csv('ACS_17_5YR_DP05_with_ann.csv', header= 1, low_memory= False)

In [238]:
# Cleaning up column names, renaming columns, & dropping what I don't need
clean_columns(dp0517)

#Keeping specific columns
dp0517 = dp0517[dp05_keep]

# Renaming columns for understandin g
dp0517.rename(columns= dp05_cols, inplace = True)

In [239]:
# Keeping only Dallas County
dp0517= dp0517[dp0517['geography'].str.contains('Dallas County')]

In [240]:
dp0517.head()

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,21_&_over,62_&_over,65_&_over,%_white,%_black,%_native,%_asian,%_hispanic
1035,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",4105,2009,48.9,2096,51.1,38.4,77.5,76.6,12.1,8.0,84.8,7.4,41,3.1,9.3
1036,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2949,1381,46.8,1568,53.2,37.5,77.8,77.0,11.6,8.9,95.6,0.5,0,0.0,12.1
1037,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3857,1866,48.4,1991,51.6,33.8,80.9,80.6,8.6,7.3,90.5,1.3,13,2.6,20.5
1038,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4235,2363,55.8,1872,44.2,35.2,84.9,84.6,11.4,8.1,86.7,2.9,28,5.2,7.8
1039,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",5755,3129,54.4,2626,45.6,34.9,81.8,78.9,6.5,4.2,59.5,14.0,11,13.2,37.9


In [241]:
# Merging  two dataframes into one(DP05 first) on id2 & dropping extra columns
df4= pd.merge(dp0517, dp0317, how = 'inner', on = 'id2')

# Getting rid of repeat columns
df4.drop(axis= 1, columns= ['id_y', 'geography_y'], inplace = True)

# Rename id and geography
df4.rename(columns = {'id_x' : 'id', 'geography_x' : 'geography'}, inplace = True)

In [242]:
# Adding year column with this year
df4['year'] = 2017

In [243]:
#Sanity check: Did it work?
df4

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_asian,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year
0,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",4105,2009,48.9,2096,51.1,38.4,77.5,...,3.1,9.3,3240,3240,74.7,2.8,179297,1.2,5.7,2017
1,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2949,1381,46.8,1568,53.2,37.5,77.8,...,0.0,12.1,2358,2358,80.4,0.8,176739,0.0,4.2,2017
2,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3857,1866,48.4,1991,51.6,33.8,80.9,...,2.6,20.5,3145,3145,80.7,1.9,116022,1.1,5.0,2017
3,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4235,2363,55.8,1872,44.2,35.2,84.9,...,5.2,7.8,3614,3614,78.9,0.0,155670,4.5,8.4,2017
4,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",5755,3129,54.4,2626,45.6,34.9,81.8,...,13.2,37.9,4792,4792,58.6,2.2,60299,13.5,32.0,2017
5,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",3673,2145,58.4,1528,41.6,36.8,91.1,...,10.1,36.0,3360,3360,71.5,3.2,75984,8.9,13.3,2017
6,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",2282,1304,57.1,978,42.9,32.9,81.9,...,12.2,46.4,1917,1917,73.1,5.3,45824,26.4,31.2,2017
7,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",8494,4317,50.8,4177,49.2,31.1,80.2,...,8.1,63.8,6938,6938,63.0,4.1,54007,10.8,18.6,2017
8,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",5357,2945,55.0,2412,45.0,36.5,97.7,...,7.0,18.8,5232,5232,86.9,1.3,114163,1.6,8.2,2017
9,1400000US48113000601,48113000601,"Census Tract 6.01, Dallas County, Texas",6253,3614,57.8,2639,42.2,33.5,82.8,...,4.3,42.4,5345,5345,75.1,2.6,80825,7.5,13.0,2017


Excellent! Let's move on to our next step: figuring out the relationship between zip codes within Dallas City limits and U.S. Census tracts!

## 6. Zipcode/County/Tract Relationships!
[Return to outline](#Working-Project-Outline)

In [255]:
zip_df = pd.read_csv('zcta_tract_rel_10.txt', sep = ',')

In [256]:
zip_df.head()

Unnamed: 0,ZCTA5,STATE,COUNTY,TRACT,GEOID,POPPT,HUPT,AREAPT,AREALANDPT,ZPOP,...,TRAREA,TRAREALAND,ZPOPPCT,ZHUPCT,ZAREAPCT,ZAREALANDPCT,TRPOPPCT,TRHUPCT,TRAREAPCT,TRAREALANDPCT
0,601,72,1,956300,72001956300,4271,1706,44663250,44572589,18570,...,44924558,44833897,23.0,22.03,26.67,26.74,98.5,98.33,99.42,99.42
1,601,72,1,956400,72001956400,2384,1037,32830481,32492074,18570,...,37782601,37191697,12.84,13.39,19.61,19.5,79.6,80.14,86.89,87.36
2,601,72,1,956500,72001956500,3126,1240,44969548,44809680,18570,...,44969548,44809680,16.83,16.01,26.85,26.89,100.0,100.0,100.0,100.0
3,601,72,1,956600,72001956600,2329,972,1981101,1981101,18570,...,1981101,1981101,12.54,12.55,1.18,1.19,100.0,100.0,100.0,100.0
4,601,72,1,956700,72001956700,2053,948,1380041,1380041,18570,...,1380041,1380041,11.06,12.24,0.82,0.83,100.0,100.0,100.0,100.0


In [257]:
# Cleaning up column names
zip_df.columns = zip_df.columns.str.lower()

In [258]:
zip_df= zip_df.rename(columns = {'zcta5': 'zip_code'})

In [251]:
# Creating a dictionary for our Dallas zipcodes to use later
zip_codes= {'zip_code': [75201, 75202,75203,75204,75205,75206,75207,75208,75209,75210,75211,75212,75214,75215,
            75216,75217,75218,75219,75220,75221,75222,75223,75224,75225,75226,75227,75228,75229, 
            75230, 75231, 75232, 75233, 75234, 75235, 75236, 75237, 75238, 75240, 75241, 75242, 75243,
            75244, 75245, 75246, 75247, 75248, 75249, 75250, 75251, 75252, 75253, 75254, 75258, 75260, 
            75261, 75262, 75263, 75264, 75265, 75266, 75267, 75270, 75275, 75277, 75283, 75284, 75285, 
            75286, 75287,75301, 75303, 75310, 75312, 75313,75315,75320,75323,75326,75334, 75366, 75339, 
            75340, 75342, 75343, 75344, 75354, 75355, 75356, 75357, 75358, 75359, 75360, 75367, 75368, 
            75370, 75371, 75372, 75373, 75374, 75376, 75378, 75379, 75380, 75381, 75382, 75387, 75389, 
            75390, 75391, 75392, 75393, 75394]}

In [259]:
inscope_zips= pd.DataFrame(data = zip_codes)

In [260]:
inscope_zips.head()

Unnamed: 0,zip_code
0,75201
1,75202
2,75203
3,75204
4,75205


In [261]:
dallas_zips= pd.merge(inscope_zips, zip_df, how= 'inner', on= 'zip_code')

In [262]:
dallas_zips.head()

Unnamed: 0,zip_code,state,county,tract,geoid,poppt,hupt,areapt,arealandpt,zpop,...,trarea,trarealand,zpoppct,zhupct,zareapct,zarealandpct,trpoppct,trhupct,trareapct,trarealandpct
0,75201,48,113,500,48113000500,142,97,46417,46417,9409,...,1495995,1495995,1.51,1.4,1.24,1.24,2.56,2.62,3.1,3.1
1,75201,48,113,1701,48113001701,75,54,612785,612785,9409,...,612785,612785,0.8,0.78,16.38,16.38,100.0,100.0,100.0,100.0
2,75201,48,113,1703,48113001703,41,55,25265,25265,9409,...,676078,676078,0.44,0.79,0.68,0.68,1.3,2.27,3.74,3.74
3,75201,48,113,1704,48113001704,350,323,205444,205444,9409,...,329627,329627,3.72,4.66,5.49,5.49,25.11,28.28,62.33,62.33
4,75201,48,113,1800,48113001800,792,527,356409,356409,9409,...,754462,754462,8.42,7.61,9.53,9.53,22.15,18.48,47.24,47.24


In [263]:
# Dropping all data except zipcode and tract number
dallas_zips= dallas_zips[['zip_code', 'tract']]

In [264]:
# Sanity check: Did it really drop everything else
dallas_zips

Unnamed: 0,zip_code,tract
0,75201,500
1,75201,1701
2,75201,1703
3,75201,1704
4,75201,1800
5,75201,1900
6,75201,2100
7,75201,3101
8,75201,20400
9,75202,1900


## 7. EDA and Other Crap!
[Return to outline](#Working-Project-Outline)

Really need to do the following things here:
- merge all dfs into one dataframe- organized by year
- assign tract a zipcode
- drop any tracts outside of our scope
- do some pretty visualizations to see whats up with these stats over time

In [383]:
# Starting to merge our subdataframes together
census_df = pd.concat([df1, df2, df3, df4, df5])

In [275]:
# Sanity check; Does our new df look like we want it to (26 columns and 2,116 rows)?
census_df

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_asian,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year
0,1400000US48113000100,48113000100,"Census Tract 1, Dallas County, Texas",3923,1974,50.3,1949,49.7,39.7,80.8,...,0.3,11.2,3234,3234,71.8,3.6,162983,4.4,6.8,2014
1,1400000US48113000201,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,1396,47.7,1531,52.3,36.6,77.1,...,0.3,14.4,2288,2288,70.2,1.0,130061,4.1,5.1,2014
2,1400000US48113000202,48113000202,"Census Tract 2.02, Dallas County, Texas",3465,1762,50.9,1703,49.1,35.3,83.7,...,2.0,15.0,2926,2926,84.3,4.1,120322,1.6,2.9,2014
3,1400000US48113000300,48113000300,"Census Tract 3, Dallas County, Texas",4243,2415,56.9,1828,43.1,32.1,87.5,...,0.9,10.6,3721,3721,80.2,3.4,130318,2.7,8.3,2014
4,1400000US48113000401,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,3281,56.4,2535,43.6,32.3,80.0,...,13.4,47.0,4724,4724,53.9,3.2,56284,23.6,35.8,2014
5,1400000US48113000404,48113000404,"Census Tract 4.04, Dallas County, Texas",3630,2138,58.9,1492,41.1,37.8,85.2,...,11.3,49.1,3206,3206,67.3,3.3,64360,13.6,15.0,2014
6,1400000US48113000405,48113000405,"Census Tract 4.05, Dallas County, Texas",2222,1306,58.8,916,41.2,32.0,81.1,...,13.3,41.0,1817,1817,74.5,4.3,40875,24.9,33.6,2014
7,1400000US48113000406,48113000406,"Census Tract 4.06, Dallas County, Texas",7824,4232,54.1,3592,45.9,29.9,73.3,...,7.4,72.6,5922,5922,62.0,4.5,53948,16.6,23.1,2014
8,1400000US48113000500,48113000500,"Census Tract 5, Dallas County, Texas",4441,2718,61.2,1723,38.8,41.6,93.7,...,3.4,41.9,4160,4160,78.8,3.7,90279,22.4,17.9,2014
9,1400000US48113000601,48113000601,"Census Tract 6.01, Dallas County, Texas",6450,3553,55.1,2897,44.9,30.8,76.7,...,3.1,53.7,5123,5123,68.6,4.8,69227,20.8,27.9,2014


In [384]:
new = census_df['id'].str.split("481130", n = 1, expand = True) 
census_df['id']= new[0]
census_df['tract']= new[1]

In [281]:
census_df

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year,tract
0,1400000US,48113000100,"Census Tract 1, Dallas County, Texas",3923,1974,50.3,1949,49.7,39.7,80.8,...,11.2,3234,3234,71.8,3.6,162983,4.4,6.8,2014,00100
1,1400000US,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,1396,47.7,1531,52.3,36.6,77.1,...,14.4,2288,2288,70.2,1.0,130061,4.1,5.1,2014,00201
2,1400000US,48113000202,"Census Tract 2.02, Dallas County, Texas",3465,1762,50.9,1703,49.1,35.3,83.7,...,15.0,2926,2926,84.3,4.1,120322,1.6,2.9,2014,00202
3,1400000US,48113000300,"Census Tract 3, Dallas County, Texas",4243,2415,56.9,1828,43.1,32.1,87.5,...,10.6,3721,3721,80.2,3.4,130318,2.7,8.3,2014,00300
4,1400000US,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,3281,56.4,2535,43.6,32.3,80.0,...,47.0,4724,4724,53.9,3.2,56284,23.6,35.8,2014,00401
5,1400000US,48113000404,"Census Tract 4.04, Dallas County, Texas",3630,2138,58.9,1492,41.1,37.8,85.2,...,49.1,3206,3206,67.3,3.3,64360,13.6,15.0,2014,00404
6,1400000US,48113000405,"Census Tract 4.05, Dallas County, Texas",2222,1306,58.8,916,41.2,32.0,81.1,...,41.0,1817,1817,74.5,4.3,40875,24.9,33.6,2014,00405
7,1400000US,48113000406,"Census Tract 4.06, Dallas County, Texas",7824,4232,54.1,3592,45.9,29.9,73.3,...,72.6,5922,5922,62.0,4.5,53948,16.6,23.1,2014,00406
8,1400000US,48113000500,"Census Tract 5, Dallas County, Texas",4441,2718,61.2,1723,38.8,41.6,93.7,...,41.9,4160,4160,78.8,3.7,90279,22.4,17.9,2014,00500
9,1400000US,48113000601,"Census Tract 6.01, Dallas County, Texas",6450,3553,55.1,2897,44.9,30.8,76.7,...,53.7,5123,5123,68.6,4.8,69227,20.8,27.9,2014,00601


In [385]:
# Need to get rid of leading zeros in tract so we can merge tract and zipcode together
census_df['tract'] = census_df['tract'].str.lstrip('0')

In [300]:
census_df

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,%_hispanic,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year,tract
0,1400000US,48113000100,"Census Tract 1, Dallas County, Texas",3923,1974,50.3,1949,49.7,39.7,80.8,...,11.2,3234,3234,71.8,3.6,162983,4.4,6.8,2014,100
1,1400000US,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,1396,47.7,1531,52.3,36.6,77.1,...,14.4,2288,2288,70.2,1.0,130061,4.1,5.1,2014,201
2,1400000US,48113000202,"Census Tract 2.02, Dallas County, Texas",3465,1762,50.9,1703,49.1,35.3,83.7,...,15.0,2926,2926,84.3,4.1,120322,1.6,2.9,2014,202
3,1400000US,48113000300,"Census Tract 3, Dallas County, Texas",4243,2415,56.9,1828,43.1,32.1,87.5,...,10.6,3721,3721,80.2,3.4,130318,2.7,8.3,2014,300
4,1400000US,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,3281,56.4,2535,43.6,32.3,80.0,...,47.0,4724,4724,53.9,3.2,56284,23.6,35.8,2014,401
5,1400000US,48113000404,"Census Tract 4.04, Dallas County, Texas",3630,2138,58.9,1492,41.1,37.8,85.2,...,49.1,3206,3206,67.3,3.3,64360,13.6,15.0,2014,404
6,1400000US,48113000405,"Census Tract 4.05, Dallas County, Texas",2222,1306,58.8,916,41.2,32.0,81.1,...,41.0,1817,1817,74.5,4.3,40875,24.9,33.6,2014,405
7,1400000US,48113000406,"Census Tract 4.06, Dallas County, Texas",7824,4232,54.1,3592,45.9,29.9,73.3,...,72.6,5922,5922,62.0,4.5,53948,16.6,23.1,2014,406
8,1400000US,48113000500,"Census Tract 5, Dallas County, Texas",4441,2718,61.2,1723,38.8,41.6,93.7,...,41.9,4160,4160,78.8,3.7,90279,22.4,17.9,2014,500
9,1400000US,48113000601,"Census Tract 6.01, Dallas County, Texas",6450,3553,55.1,2897,44.9,30.8,76.7,...,53.7,5123,5123,68.6,4.8,69227,20.8,27.9,2014,601


In [386]:
census_df['tract']= pd.to_numeric(census_df['tract'])

In [387]:
census_df= census_df.dropna()

In [388]:
# Trying out a mock dataframe for my merge first
census_df= census_df.join(dallas_zips.set_index('tract'),on= 'tract', how = 'left')

In [389]:
census_df

Unnamed: 0,id,id2,geography,total_pop,male,%_male,female,%_female,median_age,18_&_over,...,pop_over_16,%_pop_over_16,%_employed,%_unemployed,mean_household_income,%_families_poverty,%_all_people_poverty,year,tract,zip_code
0,1400000US,48113000100,"Census Tract 1, Dallas County, Texas",3923,1974,50.3,1949,49.7,39.7,80.8,...,3234,3234,71.8,3.6,162983,4.4,6.8,2014,100.0,75214.0
1,1400000US,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,1396,47.7,1531,52.3,36.6,77.1,...,2288,2288,70.2,1.0,130061,4.1,5.1,2014,201.0,75206.0
1,1400000US,48113000201,"Census Tract 2.01, Dallas County, Texas",2927,1396,47.7,1531,52.3,36.6,77.1,...,2288,2288,70.2,1.0,130061,4.1,5.1,2014,201.0,75214.0
2,1400000US,48113000202,"Census Tract 2.02, Dallas County, Texas",3465,1762,50.9,1703,49.1,35.3,83.7,...,2926,2926,84.3,4.1,120322,1.6,2.9,2014,202.0,75206.0
3,1400000US,48113000300,"Census Tract 3, Dallas County, Texas",4243,2415,56.9,1828,43.1,32.1,87.5,...,3721,3721,80.2,3.4,130318,2.7,8.3,2014,300.0,75205.0
3,1400000US,48113000300,"Census Tract 3, Dallas County, Texas",4243,2415,56.9,1828,43.1,32.1,87.5,...,3721,3721,80.2,3.4,130318,2.7,8.3,2014,300.0,75206.0
4,1400000US,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,3281,56.4,2535,43.6,32.3,80.0,...,4724,4724,53.9,3.2,56284,23.6,35.8,2014,401.0,75219.0
4,1400000US,48113000401,"Census Tract 4.01, Dallas County, Texas",5816,3281,56.4,2535,43.6,32.3,80.0,...,4724,4724,53.9,3.2,56284,23.6,35.8,2014,401.0,75235.0
5,1400000US,48113000404,"Census Tract 4.04, Dallas County, Texas",3630,2138,58.9,1492,41.1,37.8,85.2,...,3206,3206,67.3,3.3,64360,13.6,15.0,2014,404.0,75219.0
5,1400000US,48113000404,"Census Tract 4.04, Dallas County, Texas",3630,2138,58.9,1492,41.1,37.8,85.2,...,3206,3206,67.3,3.3,64360,13.6,15.0,2014,404.0,75235.0


In [390]:
# Getting rid of census tracts that are outside of the city of Dallas limits
census_df= census_df.dropna()

## 8. Export Results to CSV
[Return to outline](#Working-Project-Outline)

In [391]:
# Exporting the final product!
census_df.to_csv('census_final.csv')

[Return to outline](#Working-Project-Outline)