Skip to content
mowilli3 edited this page Dec 18, 2013 · 6 revisions

Data munching

What format did the data have to be to make it into the crosswalk? What data points were necessary from the Census API?

Crosswalk script

What system requirements or programs were necessary to run the crosswalk script? How many scripts were necessary to perform the crosswalk?

Logic

The crosswalk is a matrix that shows the relationship between census tracts or block groups and DC neighborhood boundaries. This matrix contains the census geography ID, the neighborhood cluster ID, and the proportion of the census area that is contained in the neighborhood. We used this relationship file to aggregate Census data from the Decennial Census or American Community Survey (ACS) to the neighborhood level. Crosswalks are commonly used to express relationships between different levels of geography. We can simply and reliably crosswalk county level data to the state level, for example, because counties cluster within states and we know the population characteristics at each level. We have to make more assumptions when cross-walking data from tract to neighborhood because neighborhood boundaries do not follow census boundary lines and we do not have benchmark estimates of population characteristics at the neighborhood level. The first assumption we make is that the population is uniformly distributed across the tract. For most tracts, this assumption is not relevant because the entire tract is contained within one neighborhood cluster. It is only important for tracts that cross neighborhood boundaries. In order to allocate the population across those boundaries, we use the proportion of the tract's land area that overlaps each neighborhood as the apportioning factor. The tract level population is apportioned into the two neighborhoods according to the proportion of its land area that is covered by each neighborhood. The next assumption we make is that tracts are socioeconomically integrated. When we apportion tract level data on characteristics such as poverty, we apply the same land area apportion factor used for the total population. This implies that tracts will not contain concentrated areas of poverty, for example.

Describe what steps were taken to make sure our data crosswalk is methodologically sound and accurate.

We received the crosswalk from the DC Office of Planning and compared it with an earlier version of the crosswalk that we created. The DCOP crosswalk is based on a neighborhood shapefile which has 5 additional neighborhood clusters that covered the population not previously covered by our 39 cluster neighborhood shapefile. We know that the newer crosswalk better for aggregating data because the entire city population is covered by it. We will make other checks to ensure that the results are sound and accurate. See the maps of the two shapefiles overlayed in https://www.dropbox.com/s/64l1ay41v9ls7kh/nbhd_data%2CNeighborhoodClusters_071612%20Map.pdf. The blue areas represent the new clusters.

What steps can be taken to double check the veracity of the crosswalk?

  1. To ensure that the neighborhood boundaries did not change from the 39 to the 44 clusters, we will compare the neighborhood shapefiles. We will also compare the newer shapefile with other maps of DC to understand where the new clusters where added. We found that the existing boundaries did not change, but new boundaries were added.

We also tested the reliability of the crosswalks the office of planning provided by crosswalking the DC population in the 2010 Census to the Neighborhood level from Blocks and Tracts. Because Blocks are smaller, we assume that the block level crosswalk is more accurate. We found small to moderate differences in the neighborhood level population count estimated by the the two crosswalks. See https://www.dropbox.com/s/20h1191ed22r5ey/Crosswalk%20Comparisons.pdf for the absolute and percent differences.

  1. To test our assumptions of uniform distribution of the population and population characteristics across tracts, we compared the count of block-level populations within selected tracts that crossed neighborhood boundaries. We found that the population is not uniformly distributed across blocks within the tract. See examples of block level population within tracts that are split across neighborhood clusters in https://www.dropbox.com/s/2kzsl3n5u94w2kl/Block%20Example%20Spatial%20Distribution%20of%20Population.doc. We also evaluated whether how well the apportioning factors used to to crosswalk the data from the tract level agreed with the population distribution aggregated from block level data when the blocks were assigned to the same tract and neighborhood cluster. In https://www.dropbox.com/s/8src15xz010mokp/Spatial%20Distribution%20of%20Population.doc we show that there are small to moderate differences between the apportion factor and the actual population distribution, but there does not seem to be systematic bias the tracts. The average difference was 0.005 across all the tract-cluster combinations and the median was 0. When we compare across tracts that crossed neighborhood boundaries, the mean and median were -0.031 and 0.035, respectively.
Clone this wiki locally