Skip to content
This repository has been archived by the owner on Jun 30, 2023. It is now read-only.

Description of tables

kschmidtDCP edited this page Sep 3, 2020 · 32 revisions

Source data

dob_jobapplications

This dataset contains all job applications submitted through the Borough Offices, through eFiling, or through the HUB, which have a "Latest Action Date" since January 1, 2000. This dataset does not include jobs submitted through DOB NOW. See the DOB NOW: Build – Job Application Filings dataset for DOB NOW jobs.

dob_permitissuance

The Department of Buildings (DOB) issues permits for construction and demolition activities in the City of New York. The construction industry must submit an application to DOB with details of the construction job they would like to complete. The primary types of application, aka job type, are: New Building, Demolition, and Alterations Type 1, 2, and 3. Each job type can have multiple work types, such as general construction, boiler, elevator, and plumbing. Each work type will receive a separate permit. (See the DOB Job Application Filings dataset for information about each job application.) Each row/record in this dataset represents the life cycle of one permit for one work type. The dataset is updated daily with new records, and each existing record will be updated as the permit application moves through the approval process to reflect the latest status of the application.

dob_cofos

  • received by email from DOB
  • containing historical (from 2000) records as well (dob_cofos_append)
  • NYC Opendata

A Certificate of Occupancy (CO) states a building’s legal use and/or type of permitted occupancy. New buildings must have a CO, and existing buildings must have a current or amended CO when there is a change in use, egress or type of occupancy. No one may legally occupy a building until the Department has issued a Certificate of Occupancy or Temporary Certificate of Occupancy. The Department issues a final Certificate of Occupancy when the completed work matches the submitted plans for new buildings or major alterations. It issues a Letter of Completion for minor alterations to properties. These documents confirm the work complies with all applicable laws, all paperwork has been completed, all fees owed to the Department have been paid, all relevant violations have been resolved and all necessary approvals have been received from other City Agencies.

dcp_mappluto

  • shoreline clipped version here

doitt_building_footprints

Shapefile of footprint outlines of buildings in New York City. Please see the following link. Previously posted versions of the data are retained to comply with Local Law 106 of 2015 and can be provided upon request made to Open Data.

doitt_building_footprints_historical

Building Footprints Historical Shape Shapefile of historical footprint outlines of buildings in New York City. Please see the following link for additional documentation. Previously posted versions of the data are retained to comply with Local Law 106 of 2015 and can be provided upon request made to Open Data.

hpd_hny_units_by_building

Housing New York Units by Building The Department of Housing Preservation and Development (HPD) reports on buildings, units, and projects that began after January 1, 2014 and are counted towards the Housing New York plan. The Housing New York Units by Building file presents this data by building, and includes building-level data, such as house number, street name, BBL, and BIN for each building in a project. The unit counts are provided by building. For additional documentation, including a data dictionary, review the attachments in the β€œAbout this Dataset” section of the Primer landing page.

census_units10

census_units10_adj

  • provided by NYC DCP Population Division
  • count of total housing units by NYC census tract, adjusted for Census under-count of housing units in Queens

Lookups

LOOKUP_occ

  • maps occupancy from DOB codes to descriptive names
  • maintained in a csv

LOOKUP_ownership

  • maps ownership from three DOB fields (cityowned, ownertype, nonprofit) to a single descriptive ownership field
  • maintained in a csv

LOOKUP_geo

  • maps census tracts to the larger administrative units they nest into -- puma, nta, borough
  • maintained in a csv

council_members

  • maps council district to the name of the current council member
  • from NYC Open Data

Spatial boundary data

dco_boroboundaries_wi

  • borough boundaries including water

dcp_cd_boundaries

  • community district boundaries

dcp_census_blocks

  • census block boundaries (2010)

dcp_census_tracts

  • census tract boundaries (2010)

dcp_councildistricts

  • council district boundaries

dcp_firecompanies

  • fire company boundaries

dcp_policeprecincts

  • police precinct boundaries

dcp_school_districts

  • school district boundaries

doe_eszones

  • elementary school zone boundaries

doe_mszones

  • middle school zone boundaries

doe_school_subdistricts

  • school sub-districts

doitt_zipcodeboundaries

  • zip code boundaries

dof_shoreline

  • DOF's shoreline file

dof_shoreline_subdivide

  • created in _function.sql
  • a simplified version of dof_shoreline

Initial attribute mapping

_INIT_devdb

  • created in _init.sql
  • It creates an initial field mapping from dob_jobapplications to devdb
  • fields columns that are 1-to-1 mapping and temporary fields noted with _ (such as _occ_prop) for upcoming calculation

INIT_devdb

  • created in init.sql
  • _INIT_devdb + SPATIAL_devdb = INIT_devdb
  • The idea is that INIT_devdb is _INIT_devdb with spatial columns (geo_bbl, geo_bin ...)

Geocoding and spatial joins

_GEO_devdb

  • created in running geocode.py
  • this is generated by geosupport, taking address info and uid from _INIT_devdb

GEO_devdb

  • created in running _geo.sql
  • Since we have logic assigning geom, this table contains all the spatial columns from _GEO_devdb but with geom coming from centroid based on hierarchy (building foot print centroid using bin -> building foot print centroid using geo_bin -> geosupport lot centroid -> mappluto lot centroid & etc)
  • So this table is basically the same as _GEO_devdb but different geom and latlon
  • this table contains all the spatial attributes extracted through geosupport

_SPATIAL_devdb

  • created in _spatial.sql
  • Taking newly assigned geom from GEO_devdb, we do spatial joins to extract spatial boundaries (cd, nta, zipcode & etc)
  • this table contains all the spatial attributes extracted through spatial join

SPATIAL_devdb

  • created in spatial.sql
  • this table is creating a consolidated spatial attribute table from _SPATIAL_devdb and GEO_devdb. Depends on the scenario, there's logic to pick spatial attributes from either _SPATIAL_devdb or GEO_devdb.
  • it has the same schema as GEO_devdb, _SPATIAL_devdb

Occupancy and units

OCC_lookup

  • created in _lookup.sql
  • containing dob_occ and their translation to devdb occ

_OCC_devdb

  • created in _occ.sql
  • this table translates dob_occ to devdb occ using OCC_lookup
  • occ_prop and occ_init are assigned and corrected

OCC_devdb

  • created in _occ.sql
  • this table assigns occ_category and makes corrections on occ_category
  • this table contains occ_prop, occ_init, and occ_category

_UNITS_devdb

  • created in _units.sql
  • units_prop and units_init are assigned and corrected

UNITS_devdb

  • created in _units.sql
  • assigning units_net using finalized units_prop and units_init
  • contains units_prop, units_init and units_net

Assigning status

CO_devdb

  • created in _co.sql
  • assigns co related columns

_MID_devdb

  • created in _mid.sql
  • A temporary table that consolidates fields from UNITS_devdb, OCC_devdb, CO_devdb, STATUS_Q_devdb for the purpose of computing the status field

MID_devdb

  • created in mid.sql
  • _MID_devdb + STATUS_devdb -> MID_devdb

STATUS_devdb

  • created in _status.sql
  • computed from _MID_devdb

Merging with Non-DOB attributes

HNY_geocode_results

  • contains results of running hpd_hny_units_by_building through Geosupport 1B
  • created in geocode_hny.py

HNY_geo

  • contains relevant columns from hpd_hny_units_by_building, joined with spatial information from HNY_geocode_results

HNY_matches

  • created in _hny.sql
  • contains matches between hny building-level records and dob jobs, where the match_priority is the lowest number of
    • 1: Residential new building matched on both BIN & BBL
    • 2: Residential new building matched only on BBL
    • 3: Residential new building matched spatially
    • 4: Alteration or non-residential non-demolition matched on both BIN & BBL
    • 5: Alteration or non-residential non-demolition matched only on BBL
    • 6: Alteration or non-residential non-demolition matched spatially

HNY_no_match

  • created _hny.sql
  • contains the subset of HNY_geo that did not match with a DOB job using any of the above methods. This is the opposite subset to HNY_matches

CORR_hny_matches

  • contains corrections to HNY_matches. This file is populated by manual review. Matches are either added or removed.
  • applied in _hny.sql

HNY_devdb

  • created in _hny.sql
  • contains fields from MID_devdb, along with hny fields
  • join method includes logic to handle one-to-many, many-to-one, and many-to-many cases

PLUTO_devdb

  • created in _pluto.sql
  • contains all the pluto fields

Converting to final schema for export

FINAL_devdb

  • created in final.sql
  • takes columns for final output from MID_devdb, PLUTO_devdb, HNY_devdb and puts them in desired order.
  • for boro, bin, bbl, address_numbr, address_st, and address, data comes from the geosupport fields (geo_ prefixed) where possible, otherwise from source data.

EXPORT_devdb

  • contains a subset of FINAL_devdb based on the desired capture date

EXPORT_housing

  • contains a subset of FINAL_devdb based on the desired capture date, where resid_flag indicates residential

QAQC tables

_INIT_qaqc

  • created in qaqc_init.sql
  • contains flags for invalid dates

GEO_qaqc

  • created in qaqc_geo.sql
  • contains flags for jobs that are in water, that don't fall inside a tax lot, or that are missing coordinates

STATUS_qaqc

  • created in qaqc_status.sql
  • contains a flag for jobs that have manual corrections setting them to inactive, but have had an update since the previous release

UNITS_qaqc

  • created in qaqc_units.sql
  • contains flags for alterations with large reductions in units, new buildings with 500+ units, demolitions of 20+ unit buildings, and the 20 alterations resulting in the largest net changes in units

MID_qaqc

  • created in qaqc_mid.sql
  • contains flags for:
    • demolitions/alterations with null initial units,
    • new buildings/alterations with null proposed units,
    • potential duplicates,
    • jobs that are non-residential but have residential units,
    • jobs that are likely garages or gazebos,
    • jobs that are likely class B,
    • jobs with a mismatch between CO unit count and the proposed number of units,
    • jobs that are tract homes but are labeled as incomplete

FINAL_qaqc

  • created in qaqc_final.sql
  • contains all QAQC flags with columns in alphabetical order

MATCH_dem_nb

  • created in qaqc_mid.sql
  • contains pairwise matches between demolitions and new buildings on bbl

Aggregation tables

YEARLY_devdb

  • created in yearly.sql
  • contains classa_net sorted into columns -- year-by-year breakdowns for complete jobs, and status breakdowns for incomplete jobs

_AGGREGATE_block

  • created in aggregate.sql
  • contains fields from YEARLY_devdb aggregated by census block, along with census unit counts from census_units10

_AGGREGATE_tract

  • created in aggregate.sql
  • contains fields from YEARLY_devdb aggregated by census tract, along with census unit counts from census_units10 and adjusted census unit counts from census_units10adj

AGGREGATE_block

  • created in aggregate.sql
  • a subset of relevant columns from _AGGREGATE_block

AGGREGATE_tract

  • created in aggregate.sql
  • a subset of relevant columns from _AGGREGATE_tract

AGGREGATE_nta

  • created in aggregate.sql
  • contains of relevant columns from _AGGREGATE_tract, further aggregated to the level of nta

AGGREGATE_puma

  • created in aggregate.sql
  • contains of relevant columns from _AGGREGATE_tract, further aggregated to the level of puma

AGGREGATE_comunitydist

  • created in aggregate.sql
  • contains of relevant columns from _AGGREGATE_tract, further aggregated to the level of community district

AGGREGATE_councildist

  • created in aggregate.sql
  • contains of relevant columns from _AGGREGATE_tract, further aggregated to the level of council district
  • also contains a field for council member name

Logs

CORR_devdb

  • contains a running log of changes to the attributes of each job, which come from the application of the manual corrections file
Before Refactor

Input tables

dob_cofos

dob_jobapplications

hpd_hny_units_by_building

Intermediate tables

developments

  • Created in create.sql, but not populated until jobnumber.sql (these two scripts should be combined)

developments_hny

  • Created in dob_hny_create.sql. Initially contains a copy of developments.

hny

  • Created in hny_create.sql
  • Initially contains content from hpd_hny_units_by_building merged with geocoded records from hpd_hny_units_by_building which aren't flagged as confidential, and where reporting_construction_type is "New Construction". Also merged with geocoded records from housing_input_hny_job_manual.

hny_job_lookup

  • Created as an empty table in hny_job_lookup.sql (CHANGE THIS)
  • Contains hny_id (a hash) from hny, and job_number, job_type from developments_hny NB records where they are merged on:
    • geo_bbl, geo_bin, having total_units within 5 of units_prop: match_method = 'BINandBBL'
    • geo_bbl, having total_units within 5 of units_prop: match_method = 'BBLONLY'
    • hny geom within developments geom: match_method = 'Spatial'
  • Also populated in hny_manual_match.sql, w

hny_geocode_results

  • Created in geocode_hny.py
  • Contains geocoded records from hpd_hny_units_by_building which aren't flagged as confidential, and where reporting_construction_type is "New Construction".

hny_manual_geocode_results

  • Created in geocode_hny.py
  • Contains geocoded records from housing_input_hny_job_manual

hny_manual

  • Created in hny_manual_geomerge.sql
  • Contains housing_input_hny_job_manual fields merged with hny_manual_geocode_results, along with a hash hny_id

QAQC tables

dev_qc_water

  • Created in latlong.sql

qc_millionbinresearch

  • Created in dropmillionbin.sql
  • Contains job_number,job_type, dob_bin, geo_bin from developments for records with million bins

developments_co

  • Created in cotable.sql
  • Contains a subset of fields from dob_cofos where the job numbers exist in developments

Output tables (exports)

developments_export

  • Created in export.sql
  • The only purpose of this table, as distinct from developments_hny, is to rename and reorder columns. This was a last-step way of changing schema and should be built in to the process upstream.
  • Other last-step fixes update developments_export in export.sql, including
    • Forcing occ_category to be residential for records where corrections resulted in non-null unit fields, or there are keyword matches in occ_category, occ_proposed, occ_initial, excluding HNY hotels or dorms that are not mixed-use. Every other record has the occ_category forced to "Other." This should move upstream.
    • Overwriting units_initial, units_net, and units_prop to zero for non-residential rows (after the forced correction above). This should be moved upstream.

devdb_export

  • Created in export.sql
  • This is a cut of developments_export where dates are in the desired range and the outlier flag is FALSE (this logic isn't relevant anymore)

housing_export

  • Created in export.sql
  • Same time cut as devdb_export, but only includes records where occ_category is "Other." Currently, the time filter logic gets repeated, rather than extracting housing_export as a subset of devdb_export.
Clone this wiki locally