# U.S. Net Domestic Migration by County: 2020-2022
### 1. Introduction

The dataset explored in this project is from [census.gov](https://www.census.gov/data/datasets/time-series/demo/popest/2020s-counties-total.html#v2022). Labeled `CO-EST2022-COMP`, this dataset contains resident population change by county from 2020 to 2022. This project aims to recreate the U.S. Census Bureau's dot-density map, found here:

[Two Years Into Pandemic, Domestic Migration Trends Shifted](https://www.census.gov/library/stories/2023/03/domestic-migration-trends-shifted.html#titlecore-ebc32996d9)

In [1]:
import pandas as pd
import warnings

pd.set_option('display.max_columns', None)
warnings.filterwarnings('ignore')

### 2. Download the dataset

In [None]:
file = 'https://www2.census.gov/programs-surveys/popest/tables/2020-2022/counties/totals/co-est2022-comp.xlsx'
# file = 'data/co-est2022-comp.xlsx'

df = pd.read_excel(file, sheet_name='CO-EST2022-COMP',
                   skiprows=4, usecols=[0, 14], header=0,
                   names=['geographic_area', 'domestic'])
display(df.head())
display(df.info())

### 3. Clean and wrangle

In [None]:
# check what are the null values
df[df['domestic'].isna()]

In [None]:
# remove those rows
df = df[df['domestic'].notna()]
df.info()

In [None]:
# convert column from float to int
df['domestic'] = df['domestic'].astype('int')
df.head()

In [None]:
# check if all rows contains a comma, i.e. are in `County, State` format
df[~df['geographic_area'].str.contains(',',)]

In [None]:
# remove top-level USA row
df = df[df['geographic_area'].str.contains(',',)]
df.head()

In [None]:
# split into separate `county_name` and `state_name` columns
df[['county_name', 'state_name']] = df['geographic_area'].str.split(', ', 1, expand=True)
df.drop(columns='geographic_area', inplace=True)
df.head()

In [None]:
# strip off the prefixed '.'
df['county_name'] = df['county_name'].str.strip('.')
df.head()

In [None]:
# check if any null values
df.isnull().values.any()

In [None]:
# how many counties
df.count()

### 4. Download the cartographic boundary shapefile for all the counties

From https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html#ti1804832544:

- 1 : 5,000,000 (national) [shapefile](https://www2.census.gov/geo/tiger/GENZ2021/shp/cb_2021_us_county_5m.zip) \[2.6 MB\]

In [None]:
import geopandas as gpd
import random

zip_file = 'zip://data/cb_2021_us_county_5m.zip'

df_counties = gpd.read_file(zip_file)
df_counties = df_counties[['GEOID', 'NAME', 'NAMELSAD', 'STUSPS', 'STATE_NAME', 'geometry']]
df_counties.columns = ['geoid', 'county', 'county_name', 'state', 'state_name', 'geometry']
df_counties.head()

In [None]:
df_counties.info()

In [None]:
# merge geometries to the `migraton` dataframe
df_merged = pd.merge(df, df_counties.drop(columns=['county', 'state'])
                     , on=['county_name', 'state_name']
                     , how='left')
df_merged.head()

In [None]:
df_merged.info()

In [None]:
# check which rows have null values in the `geoid` and `geometry` columns
df_merged[df_merged['geoid'].isna()]

Per the [Office of the Federal Register](https://www.federalregister.gov/documents/2020/12/14/2020-27459/change-to-county-equivalents-in-the-state-of-connecticut), Connecticut is transitioning to `Planning Regions` (county equivalents), which the Census Bureau proposed to implement in 2023.

The planning regions and their new FIPS codes:

| Name                                           | FIPS state-county code  |
|------------------------------------------------|-------------------------|
| Capitol Planning Region                        | 09017                   |
| Greater Bridgeport Planning Region             | 09019                   |
| Lower Connecticut River Valley Planning Region | 09021                   |
| Naugatuck Valley Planning Region               | 09023                   |
| Northeastern Connecticut Planning Region       | 09025                   |
| Northwest Hills Planning Region                | 09027                   |
| South Central Connecticut Planning Region      | 09029                   |
| Southeastern Connecticut Planning Region       | 09031                   |
| Western Connecticut Planning Region            | 09033                   |


<img src='https://portal.ct.gov/lib/opm/igp/org/cogs/rcogs.png' alt='County Equivalents - COGs' title='County Equivalents - COGs' width='600' />

Source: https://libguides.ctstatelibrary.org/regionalplanning/maps

The `.geojson` file for the new planning regions can be found here:
https://geodata.ct.gov/maps/743ea4808b85469d8d9f7c5e6b661ee8

In [None]:
# load the geojson file for the new CT planning regions 
file_ct = open('data/Connecticut_Planning_Region_Index.geojson')
df_ct = gpd.read_file(file_ct)
df_ct

In [None]:
# set the updated `geoid` and `geometry` values accordingly
df_merged.at[309, 'geoid'] = '09017'
df_merged.at[310, 'geoid'] = '09019'
df_merged.at[311, 'geoid'] = '09021'
df_merged.at[312, 'geoid'] = '09023'
df_merged.at[313, 'geoid'] = '09025'
df_merged.at[314, 'geoid'] = '09027'
df_merged.at[315, 'geoid'] = '09029'
df_merged.at[316, 'geoid'] = '09031'
df_merged.at[317, 'geoid'] = '09033'

df_merged.at[309, 'geometry'] = df_ct.at[0, 'geometry']
df_merged.at[310, 'geometry'] = df_ct.at[1, 'geometry']
df_merged.at[311, 'geometry'] = df_ct.at[2, 'geometry']
df_merged.at[312, 'geometry'] = df_ct.at[3, 'geometry']
df_merged.at[313, 'geometry'] = df_ct.at[4, 'geometry']
df_merged.at[314, 'geometry'] = df_ct.at[5, 'geometry']
df_merged.at[315, 'geometry'] = df_ct.at[6, 'geometry']
df_merged.at[316, 'geometry'] = df_ct.at[7, 'geometry']
df_merged.at[317, 'geometry'] = df_ct.at[8, 'geometry']

df_merged.query('state_name=="Connecticut"')

In [None]:
# double-check there are no null values
df_merged[df_merged['geoid'].isna()]

In [None]:
df_merged.info()

In [None]:
# convert the merged dataframe to a geopandas dataframe
df_merged = gpd.GeoDataFrame(df_merged, geometry='geometry')

# calculate bbox for each geometry
df_merged = pd.concat([df_merged, df_merged.bounds], axis=1)

df_merged.head()

### 5. Visualize the top 10 counties of both positive and negative net domestic migration

In [None]:
# initialize plotly
import plotly.express as px
import plotly.graph_objects as go

def show_fig():
    fig.show(renderer='png', width=800)

In [None]:
df_top_10 = df_merged.sort_values('domestic', ascending=False)[['domestic', 'county_name', 'state_name']].head(10)
df_top_10['location'] = df_top_10['county_name'] + ', ' + df_top_10['state_name']
title = 'Positive Net Domestic Migration - Top 10 counties'

fig = px.bar(
    df_top_10
    , x='domestic', y='location'
    , orientation='h'
)

fig.update_layout(
    title=title,
    xaxis_title='Net Domestic Migration',
    yaxis_title='County',
    margin=dict(l=50, r=50, t=50, b=50)
)

fig['layout']['yaxis']['autorange'] = 'reversed'

show_fig()

In [None]:
df_bottom_10 = df_merged.sort_values('domestic', ascending=False)[['domestic', 'county_name', 'state_name']].tail(10)
df_bottom_10['location'] = df_bottom_10['county_name'] + ', ' + df_bottom_10['state_name']
title = 'Negative Net Domestic Migration - Top 10 counties'

fig = px.bar(
    df_bottom_10
    , x='domestic', y='location'
    , orientation='h'
)

fig.update_layout(
    title=title,
    xaxis_title='Net Domestic Migration',
    yaxis_title='County',
    margin=dict(l=50, r=50, t=50, b=50)
)

show_fig()

### 6. Generate the coordinates for the dot-density map

Randomly distribute the `domestic` value throughout the counties to visualize the variation in density across the counties.

In [None]:
# recalculate the `domestic` values as 1 per 100 persons (rounded to nearest 100)
factor = 100

df_merged[['domestic']] = round(df_merged[['domestic']] / factor).astype('int')
df_merged.sort_values('domestic', ascending=False)

In [None]:
# add a column to denote a positive or negative net migration
df_merged['net_migration'] = df_merged['domestic'].apply(lambda x: 'positive' if x>0 else 'negative')
df_merged

In [None]:
from shapely.geometry import Point

# function to randomly distribute coordinates within the county geometry
def random_coordinates(row):
    results = []
    count = 0
    val = row['domestic']
    net_migration = row['net_migration']
    while count < abs(val):
        x = random.uniform(row['minx'], row['maxx'])
        y = random.uniform(row['miny'], row['maxy'])
        pt = Point(x, y)
        if pt.within(row['geometry']):
            count += 1
            results.append([net_migration, x, y])
    return pd.DataFrame(results, columns=('net_migration', 'x', 'y'))

In [None]:
# apply the function to every row of the merged dataframe
results = df_merged.apply(random_coordinates, axis=1)

# unpack the series and concatenate the dataframes
results = pd.concat(results.tolist(), ignore_index=True)

# write to csv
out_csv = 'data/net-domestic-migration-dots-100.csv'
results.to_csv(out_csv, index=False)

In [None]:
results

### 6. Import into QGIS and export the map

![QGIS](images/qgis.png)

![Net Domestic Migration by County: 2020-2022](images/net-domestic-migration-dots-100.png)