# Hard to count Census tract maps

By [Ben Welsh](https://palewi.re/who-is-ben-welsh/)

In [3]:
import pandas as pd
import geopandas as gpd
from tracts import USTractDownloader2010

Download a shapefile of all Census tracts from the 2010 Census

In [4]:
USTractDownloader2010(data_dir="data/tiger/").run()

Read it in.

In [22]:
tracts = gpd.read_file("data/tiger/us.shp")

Read in the hard-to-count estimates from CUNY.

In [25]:
htc = pd.read_excel("./data/cuny/pdb2015tract_2010MRR_2017ACS_US.xlsx", skiprows=5, dtype={"GEOID": str, "GEOIDtxt": str})

Merge the two into a consolidated file.

In [7]:
merged = tracts.merge(
    htc,
    left_on="GEOID10",
    right_on="GEOIDtxt",
    how="inner"
)

Trim down to only the columns we want.

In [27]:
trimmed = merged[[
    'GEOID10',
    'State_name',
    'County_name10',
    'TotPopACS17',
    'MRR2010',
    'MRR20pctthreshold',
    'UE_flag',
    'HTCcomboflag',
    'geometry'
]]

Clean up the headers

In [28]:
cleaned = trimmed.rename(columns={
    "GEOID10": "geoid",
    "State_name": "state",
    "County_name10": "county",
    "TotPopACS17": "pop",
    "MRR2010": "mrr",
    "MRR20pctthreshold": "mrr_htc",
    "UE_flag": "ue",
    "HTCcomboflag": "htc"
})

How many tracts are hard to count?

In [29]:
cleaned.htc.value_counts()

0    58045
1    14793
2        7
Name: htc, dtype: int64

How many people live in those tracts?

In [30]:
cleaned.groupby("htc").pop.sum()

htc
0    260480905
1     60374543
2        25248
Name: pop, dtype: int64

What is the distribution of the mail-response rate?

In [31]:
cleaned.mrr.describe(percentiles=[.2, .4, .6, .8])

count    72845.000000
mean      1406.622404
std      11441.662340
min          0.000000
20%         73.100000
40%         77.900000
50%         79.900000
60%         81.800000
80%         85.500000
max      99999.000000
Name: mrr, dtype: float64

Reproject the maps to CRS 84

In [32]:
reprojected = cleaned.to_crs(epsg=4326)

Write out a GeoJSON file.

In [33]:
reprojected.to_file("data/analysis/hard-to-count-tracts.geojson", driver="GeoJSON")

Create an mbtiles files for Mapbox

In [34]:
!tippecanoe -zg -o "./data/analysis/hard-to-count-tracts.mbtiles" \
    --coalesce-densest-as-needed \
    --extend-zooms-if-still-dropping \
    --generate-ids \
    --force \
    "data/analysis/hard-to-count-tracts.geojson"

For layer 0, using name "hardtocounttracts"
72845 features, 193195289 bytes of geometry, 2415510 bytes of separate metadata, 1041128 bytes of string pool
Choosing a maxzoom of -z5 for features about 12554 feet (3827 meters) apart
Choosing a maxzoom of -z11 for resolution of about 229 feet (69 meters) within features
tile 1/0/0 size is 587129 with detail 12, >500000    
Going to try keeping the sparsest 76.64% of the features to make it fit
tile 1/0/0 size is 587394 with detail 12, >500000    
Going to try keeping the sparsest 58.72% of the features to make it fit
tile 1/0/0 size is 585687 with detail 12, >500000    
Going to try keeping the sparsest 45.11% of the features to make it fit
tile 1/0/0 size is 577932 with detail 12, >500000    
Going to try keeping the sparsest 35.13% of the features to make it fit
tile 1/0/0 size is 555942 with detail 12, >500000    
Going to try keeping the sparsest 28.43% of the features to make it fit
tile 1/0/0 size is 523616 with detail 12, >500000   