# Fun with Census Data 

I wanted to practice using the NumPy and Pandas libraries to do some basic data analysis and visualizations. So, I got some CSV files from the 2010 US Census (available [here](https://www.census.gov/2010census/data/)). Here are some things I found interesting. 

In [1]:
import pandas as pd
import numpy as np
import math

'''The population of each state at each decade, as well as the change from previous decade. 
X_POPULATION and X_CHANGE are the column names (X referring to a year ending with 10). 
There are these data for United States, Northeast, Midwest, South, West, 
Puerto Rico, and each individual state'''
pop_change_df = pd.read_csv('data/pop_change.csv', index_col=0, header=0, thousands=',')
pop_change_df.apply(pd.to_numeric)

'''The population density of each state. X_POPULATION, X_DENSITY, X_RANK are the keys'''
pop_density_df = pd.read_csv('data/pop_density.csv', index_col=0, header=0, skiprows=3, thousands=',')
pop_density_df.apply(pd.to_numeric)


'''The apportionment of representatives to the House by state. Keys include X_REPS,
X_PEOPLE_PER_REP, X_'''
apportionment_df = pd.read_csv('data/apportionment.csv', index_col=0, header=0, skiprows=1)
apportionment_df.apply(pd.to_numeric)
fix_nan_short = lambda num_or_nan: 0 if math.isnan(num_or_nan) else num_or_nan
apportionment_df = apportionment_df.applymap(fix_nan_short)

'''For some reason, 1920 people per rep is a column of zeroes. We can fix that here by just calculating it.'''
apportionment_df['1920_PEOPLE_PER_REP'] = pop_change_df['1920_POPULATION']/apportionment_df['1920_REPS']
fix_inf = lambda num_or_inf: math.nan if math.isinf(num_or_inf) else num_or_inf #this definitely works 
apportionment_df['1920_PEOPLE_PER_REP'] = apportionment_df['1920_PEOPLE_PER_REP'].apply(fix_inf)

'''The dataframes, but only with states.'''
states_pop_change = pop_change_df.iloc[range(5, len(pop_change_df))]
states_pop_density = pop_density_df.iloc[range(1, len(pop_density_df))]

'''Takes in a function which itself reads an integer value of a valid year ending in 10. 
Valid years are 1910, 1920, ..., 2010. Returns a dictionary whose keys are years and 
values are func(year) for those years all of these years.'''
def values_for_all_years(func): 
    return_dict = {}
    for i in range(11): 
        return_dict[1910 + (10 * i)] = func(1910 + (10 * i))
    return return_dict