# Spatial Mapping

This notebook contains the code for and the final product of the active new building construction site scoping project&mdash;a geo-map of building lots still under construction in New York City.

## Introduction

### Classifying active construction lots

A lot is considered to be a building under construction, for the purposes of this project, if it has been issued a DOB construction permit which has not yet expired, but no certificate of occupancy is on record with the DOB with a date of granting that is more recent than the construction permit approval date.

Work can begin as soon as a construction permit is issued; permits usually last one year before they expire and require renewal. A building cannot actually be occupied until all inspections have taken place and a certificate of occupancy has been issued&mdash;thus the "C of O" approval is traditionally seen as the end of a construction project.

The single caveat is when a lot has recieved a construction permit but due to issues on the side of the permitee that lot never entered construction. Since permits expire after a year, unless a permitee is constantly renewing permits and not doing work I expect that this error is small.

It's also a bit of a philosophical question as to whether or not a lot which is effectively just an unused hole-in-the-ground is a construction site, an active construction site, or just a hole-in-the-ground.

Our classification is as tight as it can be, with respect to what the city regulates.

### Data processing

I use two data sources for this project. The first data source is the [DOB permit issuance dataset](https://data.cityofnewyork.us/Housing-Development/DOB-Permit-Issuance/ipu4-2q9a) on NYC Open Data, from which I retrieve a list of lots with non-expired construction permits (as of writing). The second data source is the DOB [BISweb interface](http://a810-bisweb.nyc.gov/bisweb/bsqpm01.jsp), which provides, in part, PDF copies of the certificates of occupancy that DOB has on its digital record (all recent ones; certificates going further back are more tenuous, with some document scans reaching back as far as ~1900). These are scraped using the `co_reader` module, a Python module that was written for the topically similar [NYC Construction Timeline project](https://github.com/ResidentMario/nyc-construction-timeline) which uses a text scanner to parse out dates from issued certificates.

This data processing is handled in the preceding `Active New Building Construction Site Data Join.ipynb` notebook.

### Dataset

At this point we have a file, `Active New Building Construction Sites.csv`, which contains a unified recordset of all of the active under-construction BINs in New York City (as well as some data about them, taken from the original permit dataset). This file is the source of the map generated by the notebook code in this file.

### Map scoping

See `Active New Building Construction Site Spatial Map Scoping.ipynb` for the map build test.

### Further work

The end product of this notebook is a mapping of active construction sites in New York City, as of the date of processing in mid-July 2016. However, this is functionally no more than a proof of concept. Constructing the dataset required reading in 4000 BISweb certificates, a herculean process that does not scale to operationalization as it requires probably over 24 hours overall to run. A finer-resolution active construction sites chart generated on a daily, monthly, or weekly basis would require an easier way to access this data (and one that wouldn't load down the DOB servers every time it is requested).

I suggest an Open Data dataset of certificate of occupancy releases.

### Small note

BIN `1012420`, address `507 WEST 28 STREET MANHATTAN`, has [65 replicitious certificates of occupancy](http://a810-bisweb.nyc.gov/bisweb/COsByLocationServlet?requestid=4&allbin=1012420), somehow. It correspond with a massive new apartment tower on the High Line ([source](https://newyorkyimby.com/category/507-west-28th-street), [source](http://www.6sqft.com/west-chelseas-tallest-tower-rises-and-finally-reveals-itself/), et. al.). Interesting shape issue.

## Data Load

In [1]:
import geopandas as gpd
import pandas as pd
import requests
import zipfile
import io
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option("max_columns", 500)

Load all of the shapefiles.

In [2]:
r = requests.get('http://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/bk_mappluto_16v1.zip')

with zipfile.ZipFile(io.BytesIO(r.content)) as ar:
    ar.extractall("geospatial/brooklyn/")
del r

r = requests.get('http://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/bx_mappluto_16v1.zip')

with zipfile.ZipFile(io.BytesIO(r.content)) as ar:
    ar.extractall("geospatial/bronx/")
del r

r = requests.get('http://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/mn_mappluto_16v1.zip')

with zipfile.ZipFile(io.BytesIO(r.content)) as ar:
    ar.extractall("geospatial/manhattan/")
del r

r = requests.get('http://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/qn_mappluto_16v1.zip')

with zipfile.ZipFile(io.BytesIO(r.content)) as ar:
    ar.extractall("geospatial/queens/")
del r

r = requests.get('http://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/si_mappluto_16v1.zip')

with zipfile.ZipFile(io.BytesIO(r.content)) as ar:
    ar.extractall("geospatial/staten_island/")
del r

Combine the geographies into a single `GeoDataFrame`.

In [8]:
mappluto = gpd.GeoDataFrame()
for uri, layer in [("geospatial/brooklyn/", "BKMapPLUTO"),
                   ("geospatial/bronx/", "BXMapPLUTO"),
                   ("geospatial/manhattan/", "MNMapPLUTO"),
                   ("geospatial/queens/", "QNMapPLUTO"),
                   ("geospatial/staten_island/", "SIMapPLUTO")]:
    dat = gpd.read_file(uri, layer=layer)
    mappluto = mappluto.append(dat)

In [9]:
len(mappluto)

857216

## Merge

Load and shape the data for merge (see `Scoping` for explanations of why this code is here and what it does).

In [59]:
active_sites = pd.read_csv("active_construction_sites.csv", index_col=0)
active_sites.columns = [column.title() for column in active_sites.columns]

borough_mapping = {
    'BK': 'BROOKLYN',
    'QN': 'QUEENS',
    'SI': 'STATEN ISLAND',
    'MN': 'MANHATTAN',
    'BX': 'BRONX'
                  }

# mappluto['Borough'] = mappluto['Borough'].apply(lambda b: borough_mapping[b])

mappluto.crs = {u'lon_0': -74, u'datum': u'NAD83', u'y_0': 0, u'no_defs': True, u'proj': u'lcc', u'x_0': 300000, u'units': u'us-ft', u'lat_2': 41.03333333333333, u'lat_1': 40.66666666666666, u'lat_0': 40.16666666666666}
mappluto = mappluto.to_crs({'proj':'longlat', 'ellps':'WGS84', 'datum':'WGS84'})

In [61]:
active_new_building_construction_lots = gpd.GeoDataFrame(pd.merge(active_sites, mappluto, how='inner',
                                                                  on=['Borough', 'Block', 'Lot']))

In [55]:
len(active_new_building_construction_lots)

3468

## Visualize

In [63]:
import mplleaflet

fig = active_new_building_construction_lots.plot()
mplleaflet.show()

Note: I swapped to a cleaner basemap and saved to `active_construction_sites_map.html`.