# US Census Data

This is the Python code to process US demographic data from the US Census Data in 2018. In order to run the code successfully:

1. Download the **TWO** csv files from the link https://data.census.gov/cedsci/table?q=demographic%20data%20by%20county&hidePreview=false&tid=ACSCP5Y2018.CP05&vintage=2018&g=0100000US.050000
(make sure it is the data, not the metadata)
2. Name one file us-census.csv and the other file us-census-2.csv

In [8]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import tensorflow as tf
from mlxtend.plotting import plot_linear_regression

In [21]:
# Reading the CSV files
data = pd.read_csv("us-census.csv")
data_two = pd.read_csv("us-census-2.csv")

In [22]:
# Organizing dataset 1
data.columns = data.iloc[0]
data.drop(0, inplace=True)
data.drop("id", axis=1, inplace=True)
data = data.set_index('Geographic Area Name')

In [23]:
# Filter columns of dataset 1
cols = [c for c in data.columns if '2018' in c]
data = data[cols]
data.columns = [x.split("!!")[-1] for x in data.columns]

In [24]:
# Organizing dataset 2
data_two.columns = data_two.iloc[0]
data_two.drop(0, inplace=True)
data_two.drop("id", axis=1, inplace=True)
data_two = data_two.set_index('Geographic Area Name')

In [25]:
# Filter columns of dataset 2
cols = [c for c in data_two.columns if '2018' in c]
data_two = data_two[cols]
data_two.columns = [x.split("!!")[-1] for x in data_two.columns]

In [26]:
# Concatenate two datasets
df = pd.concat([data, data_two])

In [27]:
# Remove duplicates and replace null values with 0 (in this case 0 is most appropriate)
df.drop_duplicates(inplace=True)
df = df.replace("N", 0.0)

In [28]:
# Lowercase columns
df.columns = [x.lower() for x in df.columns]

In [29]:
df.head()

Unnamed: 0_level_0,total population,male,female,sex ratio (males per 100 females),under 5 years,5 to 9 years,10 to 14 years,15 to 19 years,20 to 24 years,25 to 34 years,...,asian alone,native hawaiian and other pacific islander alone,some other race alone,two or more races,two races including some other race,"two races excluding some other race, and three or more races",total housing units,"citizen, 18 and over population",male,female
Geographic Area Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Maricopa County, Arizona",4410824,49.5,50.5,97.9,6.3,6.4,7.0,6.7,6.6,14.6,...,4.1,0.2,0.2,2.4,0.1,2.3,1762981,3005167,48.8,51.2
"Jefferson County, Arkansas",68114,49.1,50.9,96.4,6.0,6.6,5.5,6.6,7.5,12.1,...,1.0,0.0,0.0,0.3,0.0,0.3,33382,52825,49.3,50.7
"White County, Arkansas",78727,48.4,51.6,93.8,5.9,5.8,6.5,9.7,8.0,11.2,...,1.0,0.0,0.1,1.6,0.0,1.6,33954,58772,46.7,53.3
"Butte County, California",231256,49.2,50.8,96.8,5.6,5.4,5.7,7.3,9.8,13.1,...,4.3,0.2,0.1,5.2,0.1,5.0,100038,177145,48.7,51.3
"El Dorado County, California",190678,49.6,50.4,98.3,4.5,4.8,6.7,6.0,5.2,9.5,...,4.6,0.3,0.1,3.7,0.0,3.6,91094,145622,50.3,49.7


In [30]:
df.to_csv("us-census-data.csv", header=True)