Some questions we might ask are: 
1. What were the fastest and slowest growing population centers in the US? 
2. How does population density one census affect the rate of population growth the next decade, if at all? 
3. To what extent do regions mirror one another in population growth, as opposed to diverging from one another? 

Pop_change notes: 
CHANGE EXPRESSED AS PERCENTAGE (0-100).

Pop_density notes: 
DENSITY EXPRESSED AS PEOPLE PER SQUARE MILE.
DENSITY RANKING EXPRESSED IN ORDER OF MOST DENSE (1) TO LEAST DENSE (52).

Interesting observations: 
1. DC was the most dense from every census through 1910

In [73]:
import pandas as pd
import numpy as np

'''The population of each state at each decade, as well as the change from previous decade. 
X_POPULATION and X_CHANGE are the column names. There are these data for 
United States, Northeast, Midwest, South, West, Puerto Rico, and each individual state'''
pop_change_df = pd.read_csv('data/pop_change.csv', index_col=0, header=0, thousands=',')
pop_change_df.apply(pd.to_numeric)
'''The population density of each state. X_POPULATION, X_DENSITY, X_RANK are the keys'''
pop_density_df = pd.read_csv('data/pop_density.csv', index_col=0, header=0, skiprows=3, thousands=',')
pop_density_df.apply(pd.to_numeric)
'''The dataframes, but only with states.'''
states_pop_change = pop_change_df.iloc[range(5, len(pop_change_df))]
states_pop_density = pop_density_df.iloc[range(1, len(pop_density_df))]


STATE_OR_REGION
Alabama                  9.8
Alaska                 -14.5
Arizona                 63.5
Arkansas                11.3
California              44.1
Colorado                17.6
Connecticut             23.9
Delaware                10.2
District of Columbia    32.2
Florida                 28.7
Georgia                 11.0
Hawaii                  33.4
Idaho                   32.6
Illinois                15.0
Indiana                  8.5
Iowa                     8.1
Kansas                   4.6
Kentucky                 5.5
Louisiana                8.6
Maine                    3.5
Maryland                11.9
Massachusetts           14.4
Michigan                30.5
Minnesota               15.0
Mississippi             -0.4
Missouri                 3.4
Montana                 46.0
Nebraska                 8.7
Nevada                  -5.5
New Hampshire            2.9
New Jersey              24.4
New Mexico              10.1
New York                14.0
North Carolina          16.

In [51]:
#Answering 1: What were the fastest and slowest growing population centers in the US? 

'''Returns the NUMBER slowest or fastest growing states by population in YEAR.
TOP is a Boolean. If TOP, we want the NUMBER fastest growing states. If not TOP, 
we want the NUMBER slowest growing states'''
def n_max(year, number, top): 
    key = str(year) + '_CHANGE'
    if top: 
        return states_pop_change[key].nlargest(n=number)
    else: 
        return states_pop_change[key].nsmallest(n=number)

five_fastest_growing_2010 = n_max(2010, 5, True)
five_slowest_growing_2010 = n_max(2010, 5, False)

STATE_OR_REGION
Nevada     35.1
Arizona    24.6
Utah       23.8
Idaho      21.1
Texas      20.6
Name: 2010_CHANGE, dtype: float64
STATE_OR_REGION
Puerto Rico    -2.2
Michigan       -0.6
Rhode Island    0.4
Louisiana       1.4
Ohio            1.6
Name: 2010_CHANGE, dtype: float64


2. Does population density affect the rate of population growth in a state? That is, if a state 

Both directions could be plausible. People might flock to a state if it is growing quickly and is the place to be. Or, they might choose to go to less populated states, which might have cheaper and more abundant land, and more opportunities. 

In [77]:
#Answering 2: How does population density one census affect the rate of population growth the next decade, if at all?

def corr_density_growth(initial_year): 
    density_key = str(initial_year) + '_DENSITY'
    growth_key = str(initial_year + 10) + '_CHANGE'
    return pop_density_df[density_key].corr(pop_change_df[growth_key], method='pearson')

correlations = [corr_density_growth(1910 + (x * 10)) for x in range(10)]
print(correlations)

[0.16759764250300993, -0.033015074352785424, 0.42880048186971909, 0.028952055112537411, -0.18073037791760463, -0.15831595591159223, -0.34969210174663073, -0.19039483919421596, -0.28296925289714092, -0.16320851336858472]


It seems that population density of a state in one census is negatively correlated with its population growth in the next decade. However, these correlations are fairly weak. Interestingly, the decade 1930-1940 is a strong exception - in this decade, the two were positively correlated. 

3. To what extent do regions mirror one another in population growth, as opposed to diverging from one another?