# Analyzing Population Change

After seeing a few stories about the mass exodus of people from the bay area I thought I would go find some source data and try to understand the numbers a little better.  This data is directly from the US Census Bureau.  The explanation of the columns is [here](https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2020/cbsa-est2020-alldata.pdf).

Because the data covers counties and metropolitan areas you have to do a little digging to figure out what you want to look at so I suggest opening the file in excel to get an overview.

In [18]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [16,12]

We will read the file in directly from census.gov so we get any updates they make.  For now we'll just look at the 2010 to 2020 range although the website has significant amount of historical data.

In [19]:
df = pd.read_csv('https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/metro/totals/cbsa-est2020-alldata.csv'
, encoding='latin1')
df.iloc[0]

CBSA                                    10180
MDIV                                      NaN
STCOU                                     NaN
NAME                              Abilene, TX
LSAD            Metropolitan Statistical Area
                            ...              
RESIDUAL2016                               -5
RESIDUAL2017                               -5
RESIDUAL2018                               -4
RESIDUAL2019                                0
RESIDUAL2020                                9
Name: 0, Length: 106, dtype: object

The area we will look at first is labeled as "Santa Clara County, CA".

In [42]:
d = df[df.NAME == 'Santa Clara County, CA'].iloc[0]
d

CBSA                             41940
MDIV                               NaN
STCOU                             6085
NAME            Santa Clara County, CA
LSAD              County or equivalent
                         ...          
RESIDUAL2016                        28
RESIDUAL2017                        66
RESIDUAL2018                        23
RESIDUAL2019                       -31
RESIDUAL2020                       -76
Name: 1318, Length: 106, dtype: object

Let's look at a few of the top level changes.

In [43]:
d.POPESTIMATE2010, d.POPESTIMATE2020, (d.POPESTIMATE2020/d.POPESTIMATE2010 -1)*100

(1786001, 1907105, 6.780735285142625)

To understand these numbers better, we need to look at the breakdown.  These are stored yearly so to get the changes over the 10-year period we need to sum them over the 10 years.

In [44]:
s = {}
for base in ['BIRTHS', 'DEATHS', 'NATURALINC', 'INTERNATIONALMIG', 'DOMESTICMIG', 'NETMIG', 'RESIDUAL']:
    x = 0
    for i in range(11, 21):
        field = f"{base}20{i}"
        x += d[field]
    s[base] = x

s

{'BIRTHS': 228625,
 'DEATHS': 100616,
 'NATURALINC': 128009,
 'INTERNATIONALMIG': 152102,
 'DOMESTICMIG': -157840,
 'NETMIG': -5738,
 'RESIDUAL': -1167}

We want to compare a few counties so we will gather this information across them.  It probably makes sense to normalize these numbers to the size of the county.  Yes, this could be done more efficiently but this was easy.

In [65]:
summary = pd.DataFrame()
for county in ['Santa Clara County, CA', 'Fairfax County, VA', 'Multnomah County, OR', 'San Francisco County, CA',
               'Miami-Dade County, FL', 'Fresno County, CA']:
    d = df[df.NAME == county].iloc[0]
    for base in ['BIRTHS', 'DEATHS', 'NATURALINC', 'INTERNATIONALMIG', 'DOMESTICMIG', 'NETMIG', 'RESIDUAL']:
        x = 0
        for i in range(11, 21):
            field = f"{base}20{i}"
            x += d[field]
        summary.loc[base, county] = (x / d.POPESTIMATE2010) * 100.0
    summary.loc['POPCHG', county] = (d.POPESTIMATE2020 / d.POPESTIMATE2010 - 1) * 100.0
summary

Unnamed: 0,"Santa Clara County, CA","Fairfax County, VA","Multnomah County, OR","San Francisco County, CA","Miami-Dade County, FL","Fresno County, CA"
BIRTHS,12.800945,13.621227,12.146693,11.019231,12.523484,16.373948
DEATHS,5.633591,4.673047,7.78373,7.267861,7.88014,7.278777
NATURALINC,7.167353,8.94818,4.362963,3.75137,4.643345,9.095172
INTERNATIONALMIG,8.516345,8.278761,3.430261,6.71207,16.098056,0.920912
DOMESTICMIG,-8.837621,-11.332256,2.756334,-2.921719,-12.775261,-2.594068
NETMIG,-0.321276,-3.053496,6.186595,3.790351,3.322795,-1.673156
RESIDUAL,-0.065342,0.060396,0.072289,0.041836,0.024212,-0.028648
POPCHG,6.780735,5.95508,10.621847,7.583558,7.990352,7.393368


## Analysis

It's easy to try and single out California as the place people are leaving but when we look at other places we see that many expensive metropolitan areas have the same pattern of positive international migration and negative domestic migration as people seek cheaper places to live.