# Hard to count Census tract maps

By [Ben Welsh](https://palewi.re/who-is-ben-welsh/)

In [1]:
import pandas as pd
import geopandas as gpd
from tracts import USTractDownloader2010

Download a shapefile of all Census tracts from the 2010 Census

In [None]:
USTractDownloader2010(data_dir="data/tiger/").run()

Read it in.

In [2]:
tracts = gpd.read_file("data/tiger/us.shp")

Read in the hard-to-count estimates from CUNY.

In [5]:
htc = pd.read_excel("./data/cuny/pdb2015tract_2010MRR_2017ACS_US.xlsx", skiprows=5, dtype={"GEOID": str, "GEOIDtxt": str})

Merge the two into a consolidated file.

In [8]:
merged = tracts.merge(
    htc,
    left_on="GEOID10",
    right_on="GEOIDtxt",
    how="inner"
)

Trim down to only the columns we want.

In [10]:
trimmed = merged[[
    'GEOID10',
    'TotPopACS17',
    'MRR2010',
    'MRR20pctthreshold',
    'UE_flag',
    'HTCcomboflag',
    'geometry'
]]

Clean up the headers

In [12]:
cleaned = trimmed.rename(columns={
    "GEOID10": "geoid",
    "TotPopACS17": "pop",
    "MRR2010": "mrr",
    "MRR20pctthreshold": "mrr_htc",
    "UE_flag": "ue",
    "HTCcomboflag": "htc"
})

How many tracts are hard to count?

In [14]:
cleaned.htc.value_counts()

0    58045
1    14793
2        7
Name: htc, dtype: int64

How many people live in those tracts?

In [15]:
cleaned.groupby("htc").pop.sum()

htc
0    260480905
1     60374543
2        25248
Name: pop, dtype: int64

What is the distribution of the mail-response rate?

In [22]:
cleaned.mrr.describe(percentiles=[.2, .4, .6, .8])

count    72845.000000
mean      1406.622404
std      11441.662340
min          0.000000
20%         73.100000
40%         77.900000
50%         79.900000
60%         81.800000
80%         85.500000
max      99999.000000
Name: mrr, dtype: float64

Reproject the maps to CRS 84

In [30]:
reprojected = cleaned.to_crs(epsg=4326)

Write out a GeoJSON file.

In [31]:
reprojected.to_file("data/analysis/hard-to-count-tracts.geojson", driver="GeoJSON")

Create an mbtiles files for Mapbox

In [32]:
!tippecanoe -zg -o "./data/analysis/hard-to-count-tracts.mbtiles" \
    --coalesce-densest-as-needed \
    --extend-zooms-if-still-dropping \
    --force \
    "data/analysis/hard-to-count-tracts.geojson"

For layer 0, using name "hardtocounttracts"
72845 features, 192968458 bytes of geometry, 1355526 bytes of separate metadata, 1009964 bytes of string pool
Choosing a maxzoom of -z5 for features about 12554 feet (3827 meters) apart
Choosing a maxzoom of -z11 for resolution of about 229 feet (69 meters) within features
tile 2/1/1 size is 504857 with detail 12, >500000    
Going to try keeping the sparsest 89.13% of the features to make it fit
tile 2/1/1 size is 505100 with detail 12, >500000    
Going to try keeping the sparsest 79.41% of the features to make it fit
tile 2/1/1 size is 504732 with detail 12, >500000    
Going to try keeping the sparsest 70.80% of the features to make it fit
tile 2/1/1 size is 502371 with detail 12, >500000    
Going to try keeping the sparsest 63.42% of the features to make it fit
tile 3/2/3 size is 625084 with detail 12, >500000    
Going to try keeping the sparsest 71.99% of the features to make it fit
tile 3/2/3 size is 596693 with detail 12, >500000   