Linking Tax Parcels to Buildings

In the write-up about linking violations to buildings, there was an issue with using geospatial techniques to link violation coordinates to building shapes. A solution that was discussed would be to link violations to the tax parcel shapes first, and then link the tax shapes to building shapes.

In doing so, I discovered a new issue. Many building shapes overlap two tax parcel shapes, even though it seems that the building should be fully contained within one tax shape. Here is an example where you can see building shapes (red) that go outside the boundary of the tax shape (blue):

Tax Parcels (blue) and Buildings (red)

Some building records are identified by tax parcel ID (aka PIN), so it is important to solve this issue and link tax parcels to buildings (and buildings to tax parcels). Otherwise, if a building were to be linked with the wrong tax parcel, then the data associated with the building would actually be for the building next door.

It is not clear whether the error, buildings slightly overlapping onto the next door tax parcel, is with the shapes themselves or if the buildings were built partially on the wrong land. To investigate the problem, I took a close look at the data. First, I created a new set of shapes that divides each building according to the tax parcel outline. Most of the buildings look the same, but there are a few subtle differences.

Buildings Sliced by Tax Parcel Lines

If you look closely at the above image, you can see that a few of the buildings have been sliced into two polygons at the points where they overlap with the adjoining tax parcel. Here is a zoomed in example:

Next, I calculated the area of each of these new sliced polygons. Here is a histogram showing a distribution of polygon sizes:

rplot

Also, here is a summary of the area data:

Summary Statistics of Building Area (after being sliced by Tax Parcel)

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
0.000e+00 2.369e-09 7.786e-09 1.144e-08 1.382e-08 8.797e-07

My hunch was that the hump on the left of the histogram represents the slivers of buildings that are overlying a next-door tax parcel. So I plotted out the divided building polygons again, using colors to show the relative size of the polygons. Sure enough, the smaller polygons (those in the first quartile of area) are mostly the small slivers of buildings (and also small garages).

Color-coded for Shape Area

Red = smallest/first quartile
Blue = second quartile
Yellow = largest half

The issue here is that it's highly likely that most of the small slivers represent tax parcels that are for the property next door, not for the building in question. It would not be a good idea to associate records for the building next door with the building in question. One way of handling this would be to not associate the sliver records with a building.

On the other hand, it could be that some of the smaller slivers actually do have records that involve the building. For example, see the block below:

Different Block with Small Standalone Buildings

To deal with the possibility that some of the red buildings could be legitimately overlapping multiple tax boundaries (or just really small buildings), I thought of another solution. We could bring in the records associated with every tax parcel that a building overlaps, but give records a weight based on the relative area of the shapes. If a building overlaps 2 tax parcels, and one makes up 99 percent of the building area and the other 1%, then the associated records could be weighed at .99 and .01. That way, we wouldn't lose any data, but the incorrect data would be weighed and count for very little of the ultimate classification.

Provide feedback

Saved searches