# Exploring Overlapping Census Boundaries

Evaluating the implications of TELs for debt activity at the local level requires some notion of the demographic characteristics of the jurisdictions in question.  This raises an operational problem.  The debt activity occurs at multiple levels of government, perhaps most often at the municipal or special district level.  By contrast, socioeconomic data is generally collected in detail at the county level (or larger).  Some method of allocating socioeconomic information to smaller jurisdictions must be derived to provide enough information for legit statistical specifications.

The question of how to allocate this information has non-trivial implications for the analysis that it facilitates.  I have [previously proposed](http://gis.stackexchange.com/questions/157610/harmonizing-data-across-geographic-scopes-layers) some rough adjustment mechanisms, but they generally rely on strong assumptions about the spatial distribution of people within a given jurisdiction.  Fortunately, as Dan has pointed out, the [Missouri Census Data Center](http://mcdc.missouri.edu/) (MCDC) has developed a [tool](http://mcdc.missouri.edu/websas/geocorr12.html) that offers a far more robust solution.  Instead of relying assumptions of uniform distributions of people, the tool aggregates up from the block level to provide information on the extent of the overlap across Census geographies.  This approach allows us to retain information that blunt assumptions would discard.  The following description of the tool's capability is taken directly from [MCDC's docs](http://mcdc.missouri.edu/websas/geocorr90_htmls/geocorr.help.html):

>*The MABLE/Geocorr geographic correspondence engine generates files and/or reports showing the relationships between a wide variety of geographic coverages for the United States. It can, for example, tell you with which county or counties each ZIP code in the state of California shares population. It can tell you, for each of those ZIP/county intersections, what the size of that intersection is (based on 1990 population or other user-specifed variable) and what portion of the ZIP's total population is in that intersection. The application permits the user to specify the geographic scope of the correspondence files (typically, one or more complete states, but with the ability to specify counties, cities, or metropolitan areas within those states), and, of course, the specific geographic coverages to be processed. The latter include virtually all geographic units reported in the 1990 U.S. census summary files, and several special "extension coverages" such as 103rd Congress districts, PUMA areas used in the 1990 PUMS files, Labor Market and Commuting Zone areas, and even hydrolgical unit codes (watersheds.) The application creates a report file and a comma-delimited ascii file (by default) which the user can then browse and/or save to their local disk.*

It's difficult to overstate how damn useful this is.  In any event, I've taken the liberty of grabbing a [run](http://mcdc.missouri.edu/tmpscratch/22AUG0900580.geocorr12/geocorr12.csv) from the tool that captures overlap between Census places and counties for all states in the country.  In this Notebook, we will explore these overlaps.  Even with this information, we will still be faced with discretionary choices about how to allocate socioeconomic information that is not countable.  Rather, it is reported statistically, and we clearly have no access to the underlying microdata (think distributions of income or political affiliation, industrial mixtures, etc.).  Any decisions we make on this front will create bias, and the magnitude of this bias will be a function of the extent to which Census places do not nest well into Census counties.  Consequently, this Notebook seeks to develop a method for thinking about this discord.  Note that we may very well decide that Census places are not the appropriate unit of analysis.  The method developed here, however, should be applicable to other geographic combinations.

In [1]:
#Basic data manipulation
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

#Spatial I/O
import fiona

#Spatial Geometry Manipulation
from shapely.geometry import shape,LineString,Polygon,MultiPolygon
from shapely.ops import unary_union
from descartes import PolygonPatch
from rtree import index

#Visualization
from matplotlib.collections import PatchCollection
from mpltools import color

  from pkg_resources import resource_stream


## Data Input

I am not sure how persistent the link to our run is, so the data are housed in this repository as well.