# Exploring census blocks & joining to 911 data
911 calls file is [here](https://app.box.com/file/911911135646), and was provided manually by Jimmy McBroom, a data engineer for the city. In theory, you should be able to get it from [the open data portal](https://data.detroitmi.gov/datasets/911-calls-for-service/explore)

2010 census blocks can be downloaded [from the census bureau](https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2010&layergroup=Blocks) or from [box](https://bloombergdotorg.box.com/s/pzaf2y8u6xmw1rgg8tr4hf7270y3q0rj). This is what detroit uses. It's in Wayne county, so that's all you need to download (its county code is 163).  
Wayne county shares a northern and largely an eastern border with Detroit, and extends further south and west

## General Census block notes
These are hierarchical, and pretty granular in populated areas.

The hierarchy looks like this, where each X is and integer: 
XX|XXX|XXXXXX|XXXX

To avoid counting Xs, the lengths are: 
2|3|6|4

With meanings: 
State | county | tract | block

## 911 calls and census blocks
The block is given by the column block_id with len(block_id)==15

The calls dataset does have null block ids (~3% in the small sample I checked). For each of them, it does have a lat/long, so we can probably figure it out

## Census block shp files
Block is given by the column GEOID10. Very well behaved

## Join notes
In the small sample I tested, the strong majority of these join correctly. like 3% do not due to mising a block ID, and < 1% are in adjacent counties (which we could pull in if we really wanted to)


## Other data with census blocks
Tons of datasets by block through the census [here](https://data.census.gov/cedsci/table?q=United%20States&g=0500000US26163%241000000)
This includes 
* population in occupied units by block. (search `DECENNIALSF12010.H10_data_with_overlays_2022-01-28T162836`)
* Population in various types of households (`DECENNIALSF12010.P30_data_with_overlays_2022-01-28T162836`)


In [13]:
import geopandas as gpd
from util_detroit import kml_to_gpd, csv_with_x_y_to_gpd

# Only read in the columns we want
COLS_911 = [
    "calldescription",
    "call_timestamp",
    "block_id",
    "category",
    "officerinitiated",
    "priority",
    "oid",
]

In [151]:
call = (
    csv_with_x_y_to_gpd(
        "calls_for_service_from_jimmy.csv",
        read_csv_args={"nrows": 1000, "usecols": COLS_911 + ["longitude", "latitude"]},
    )
#     want to use ints, but default type does not handle nulls. May factorize later
    .astype({"call_timestamp": "datetime64", "block_id": pd.Int64Dtype()})
    .loc[:, COLS_911 + ["geometry"]]
)

block = (
    gpd.read_file("/Users/ahakso/Downloads/wayne_county_census_blocks/tl_2010_26163_tabblock10.shp")
#     full block id is concatenation of geoid and name. Remove optional `.` delimiter
    .rename(columns={'GEOID10':'block_id'})
#     Can't go to Int64Dtype from string, convert to int first
    .astype({"block_id": int})
#     Use type consistent with that used for 911 calls
    .astype({"block_id": pd.Int64Dtype()})
)

display(block.head(2))
call.head(2)

Unnamed: 0,STATEFP10,COUNTYFP10,TRACTCE10,BLOCKCE10,block_id,NAME10,MTFCC10,UR10,UACE10,UATYP10,FUNCSTAT10,ALAND10,AWATER10,INTPTLAT10,INTPTLON10,geometry
0,26,163,503200,3010,261635032003010,Block 3010,G5040,U,23824,U,S,6142,0,42.4467288,-83.0047134,"POLYGON ((-83.00482 42.44610, -83.00500 42.446..."
1,26,163,503200,2007,261635032002007,Block 2007,G5040,U,23824,U,S,20130,0,42.4444119,-83.0044517,"POLYGON ((-83.00380 42.44354, -83.00503 42.443..."


Unnamed: 0,calldescription,call_timestamp,block_id,category,officerinitiated,priority,oid,geometry
0,TRAFFIC STOP,2020-06-25 14:40:55,261635318001003,TRF STOP,Yes,2,3079296,POINT (42.38711 -83.11380)
1,START OF SHIFT INFORMATION,2020-06-25 14:41:21,261635339003014,STRTSHFT,Yes,3,3079297,POINT (42.36731 -83.08152)


In [176]:
null_count = call.block_id.isna().sum()
print(f"proportion unmatched: {((~call.block_id.isin(block.block_id)).sum())/(call.shape[0]-null_count):.4f}")
print(f"proportion unmatched: {((~call.block_id.isin(block.block_id)).sum()-null_count)/call.shape[0]:.4f}")

proportion unmatched: 0.0403
proportion unmatched: 0.0060
