Skip to content

Product: KPDB

Damon McCullough edited this page Jan 4, 2024 · 6 revisions

Known Projects Database (KPDB)

  • The Known Projects DB contains information on future housing starts. It gathers data from 9-11 data sources (depending on use case, see below), compares the information in each data source to ensure that units existing in multiple sources are not counted more than once, and then aggregates these sources. It incorporates granular information at the project-level, including project statuses, estimated project phasing, and borough planner input on projects where available.
  • The Known Projects DB presents housing information at the project-level. Therefore, it can be used for growth planning citywide as well as granular looks at neighborhood-level expected growth.
  • The Known Projects DB is currently used to create the School Construction Authority's (SCA) Housing Pipeline.

Disclaimer: This information does NOT represent a housing projection produced by DCP, nor can DCP attest to the certainty that each of these developments will lead to future housing starts.
The Known Projects DB does not represent future as-of-right growth nor future growth from projects that have not yet materialized in the data sources below.


Data Sources

Primary Data Sources

  • DCP Housing Developments Database - This database is created by DCP using DOB permit and certificate of occupany data. It includes permits and applications for new buildings, building alterations, and demolitions.
  • HPD Projected Closings - This is confidential data on HPD's pipeline of closings on privately-owned sites. Provided by HPD.
  • HPD Request for Proposals - City-owned sites for which HPD has issued or designated RFPs for residential development. Provided by HPD.
  • EDC Projected Projects - Projected projects expected by EDC to develop residential units with reasonable certainty. Provided by EDC.
  • DCP Applications- Discretionary actions through DP that facilitate residential development. This data is generated by DCP. Several processing steps are required to identify projects facilitating residential development.
  • Empire State Development Projected Projects- Known residential development projects by Empire State Development. These projects are collected by DCP.
  • Neighborhood study affordable housing commitments- Affordable housing commitments made by the current administration in neighborhood rezoning areas. These are collected from each adopted neighborhood study's Points of Agreement.
  • Future City-Sponsored RFPs/RFEIs- Additional future Request for Proposals or Requests for Expressions of Interest for affordable housing provided by City Hall. Note that many of these can be highly speculative conversations with developers, and not comprehensively provided by DCP planners.
  • DCP Planner-Added Projects- Additional projects identified by the Department of City Planning borough planners which are not yet in the above data sources. Note that many of these can be highly speculative conversations with developers, and not comprehensively provided by DCP planners.

Secondary Data Sources

  • Neighborhood study projected developments- The count of units each adopted neighborhood study from the current administration is expected to develop. These projections are not site-specific, and are highly speculative. These projects should not necessarily be included for planning purposes, depending on need.
  • Future neighborhood studies- The count of units each future neighborhood study is projected to develop. These projections are not site-specific, and are highly speculative. Because these rezonings have not yet been adopted, we include a certainty discount factor, and we do not deduplicate. These projects should not necessarily be included for planning purposes.

source tables

KPDB Schema

Field Long form Name Definition
project_id Project ID ID the groups records together into a single project
source Source The name of the dataset the project came from
record_id Record ID The ID of the project from the source dataset
record_name Record name The name of the project as provided by the source
borough Borough The borough the project is located within
status Status The current phase of the project
type Type The type of project. For DOB jobs this would be NB, Alt, etc.
date Date The date of the project.
date_type Date Type The type of date for the project based on the project source (Build Year, Certified Referred, Date Filed, Date Permitted, Effective Date, Month Closed, Projected Fiscal, Year Range)
units_gross Units gross The proposed number of units associated with the project as reported in the source data
units_net Units net The number of future units that should be counted as part of the project after removing completed DOB units and deduplication
prop_within_5_years Proportion built within 5 years Proportion of units that should be completed within the next 5 years
prop_5_to_10_years Proportion built within 5-10 years Proportion of units that should be completed within the next 10 years
prop_after_10_years Proportion built after 10 years Proportion of units that should be completed sometime after 10 years from this year
within_5_years Units built within 5 years The number of units that should be completed within the next 5 years
from_5_to_10_years Units built within 5-10 years The number of units that should be completed within the next 10 years
after_10_years Units built after 10 years The number of units that should be completed sometime after 10 years from this year
phasing_rationale Phasing assumption rational The planner’s rationale for the timeline of a project
phasing_known Phasing: Assumed or Known Boolean. True if the phasing assumptions are known
nycha NYCHA Boolean. True if this is at least partially a NYCHA project
classb Group Quarters Boolean. True if this is a project with Class B units (can be partial)
senior_housing Senior Housing Boolean. True if this is at least partially a Senior Housing project
inactive Inactive Flag to indicate if the project is inactive or not. If we do not include DOB inactive jobs we will exclude this field
geom geometry geometry(Geometry,4326)

Description of tables in the knownprojects build environment

Source tables

The following source tables are pre-processed and loaded from DigitalOcean. Raw data for all non-ZAP sources, along with their preprocessed versions, are available in the private repo db-knownprojects-data.

  • dcp_projects:
  • dcp_projectactions:
  • dcp_projectbbls:
  • dcp_housing: A copy of the housing database
  • dcp_knownprojects: KPDB from previous release
  • dcp_n_study: Neighborhood study rezoning commitment input data
  • dcp_n_study_future: Input data for future neighborhood studies
  • dcp_n_study_projected: Input data for Neighborhood Study Projected Development Sites
  • dcp_planneradded: Contains projects manually identified by planners. For more information, see the planner-added project instructions.
  • edc_projects: EDC projected projects source data
  • esd_projects: Empire State Developments projected projects source data
  • hpd_pc: HPD projected closings source data
  • hpd_rfp: HPD Requests For Proposals source data. Has one record per BBL.

Intermediate tables

  • dcp_application: Pulls fields from the three ZAP input tables to create all fields necessary for KPDB. Also includes several flags to aid in the ZAP review process. Zap records are excluded if they have a status of "record closed," "terminated," or "withdrawn." Includes projects from after 2010 or from an unspecified date. Projects with a clear indication of residential components are flagged for review. Other flags indicate whether a project existed in the previous version of KPDB, and whether the project was explicitly added/removed via the corrections_zap table.

  • dcp_housing_poly: Housing database with formatted dates, BBL geometries, phasing, and boolean flags. Created in dcp_housing.sql.

    • Excludes demolitions and withdrawn jobs.
    • For class A, exludes jobs with no residential unit change, alterations removing units, and jobs where the proposed residential units are 0.
    • For class B, and record with positive initial or proposed units are included. These records only provide context for DOB match review, and get excluded from the final KPDB output.
    • Polygon geometries come from a join with mappluto on BBL, primarily, and a spatial join secondarily.
  • combined: Contains a combination of input data with a shared schema. For several data sources, this involves joining on BBL with dcp_mappluto_wi to get lot-level polygon geometry. For some input datasets, a rows get collapsed to the level of a project. In these cases, the unique IDs for input data rows get stored in the field record_id_input as an array. If no collapsing was necessary (i.e. each project a single record in the a given source data table), the record_id_input array only contains the record_id. In cases where there is no unique project-level ID, the record_id gets assigned from a hash of the uids in record_id_input. This table contains standardized of statuses, initial phasing assumptions, and boolean flags. Created in combine.sql.

  • _project_record_ids: This is the initial table of project inputs (groups of record_ids that refer to the same project). They are identified by finding spatial overlaps between non-DOB records, forming arrays of individual record_ids, called project_record_ids. Two of the neighborhood study sources are not included, as units from these sources do not deduplicate with other sources. The table is initially created in _project_record_ids.sql. If there are corrections in corrections_project, these get applied to move record_ids between projects after the initial identification of spatial overlaps in correct_projects.sql. This table then gets used to create review_project (see below) for review.

  • project_record_ids: This is the final table of project_record_ids. It includes everything in _project_record_ids, with DOB records added. The dcp_housing record_ids get added to the arrays in _project_record_ids, as long as they haven't been tagged for removal in corrections_dob. Additional DOB to non-DOB matches get added if they're flagged as add in the correction_dob table. This table also includes stand-alone projects (an array consisting of a single record_id). These include dcp_housing records and combined records that did not match with other records in the two stages of identifying spatial overlaps. Created in project_record_ids.sql

  • deduped_units: This table contains results from the automatic deduplication of units in dedup_units.py. Units get subtracted within a project based on the following hierarchy, where units from lower-number sources get subtracted from units with higher-number sources.

    hierarchy = {
         "DOB": 1,
         "HPD Projected Closings": 2,
         "HPD RFPs": 3,
         "EDC Projected Projects": 4,
         "DCP Application": 5,
         "Empire State Development Projected Projects": 6,
         "Neighborhood Study Rezoning Commitments": 7,
         "Neighborhood Study Projected Development Sites": 8,
         "DCP Planner-Added Projects": 9,
     }
    
  • _kpdb: Contains all records from combined except for DOB records containing only class B units. Joined onto these records are the deduplicated units_net from deduped_units. This field gets multiplied by phasing proportions to calculate phased unit counts. Created in create_kpdb.sql.

Review tables

  • review_project: Contains records from combined, excluding those from sources DOB, Neighborhood Study Rezoning Commitments, and Future Neighborhood Studies. Joined onto these records are the project_record_ids identified in the table _project_record_ids. This table has a boolean flag indicating cases where records are part of a multi-record project (identified from spatial overlap). Also includes a boolean flag indicating missing geometry. This is because geometries to not get exported in the CSV versions of this table. Created in review_project.sql.
  • _review_dob: Contains spatially-overlapping DOB and non-DOB records from combined (excluding those from Neighborhood Study Rezoning Commitments and Future Neighborhood Studies). In cases where the non-DOB record is part of a multi-record project, all other records associated with that project are also included. There are several boolean flags in this table to aid in review. The field dob_multimatch is 1 for DOB records that overlap with more than one project. All non-DOB records in these projects get flagged with project_has_dob_multi. Created in review_dob.sql.
  • review_dob: This is the same as _review_dob, with the array values transformed to strings for export.
  • corrections_applied: Contains all successfully-applied corrections from corrections_main
  • corrections_not_applied: Contains corrections from corrections_main that could not get applied, either because the field wasn't valid or the old_value did not match any existing records.

Corrections tables

  • corrections_zap: Corrections table for adding and removing ZAP records from KPDB. This gets downloaded from the For more information, see the ZAP review instructions.
  • corrections_main: Corrections table for modifying any value other than units_net. This table is intended for all corrections that do not rely on the relationships between records. For more information, see main review instructions.
  • corrections_project: Corrections table for modifying which records refer to the same project. The default is to assume that spatially overlapping records refer to the same project. For more information, see the project review instructions.
  • corrections_dob: Corrections table either adding or removing matches between DOB and non-DOB records. This table is necessary because a DOB record cannot be associated with more than one project. For more information, see the DOB review instructions.

Output tables

  • kpdb: Final output of KPDB, created in _export.sql. Only difference from _kpdb is which fields are included.

Corrections - Non DOB Records to Each Other

Overview:

This is the process to identify related projects from different source tables using spatial-overlap. Project relations are created in _project_record_ids.sql.

Note: DOB, Neighborhood Study Rezoning Commitments, Future Neighborhood Studies are excluded from the spatial-overlap process and will be added later

File Review

The corrections for Non-DOB records can be recorded in the corrections_project.csv/corrections found in the db-knownprojects-data repo. The files needed to be reviewed in the KPDB output are:

  • review_project.shp, review_project.csv taking a close look at the dummy_id which refers to the cluster ID of the projects and the record_id which refers to the record that will need to be reassigned or combined.
  • corrections_project.csv is the file in which any necessary corrections are updated and applied in the pipeline.

Manual Review Methodology

The purpose of the manual cluster reviews is to identify records from the multiple datasources are indeed referring to the same project and complimentary in their information on the same project. Or is the spatial overlaps between the records simply coincidental and therefore should be separated into different clusters or thereby creating new clusters by themselves. In the second case, we call this a reassignment of clusters has occurred. In some cases, the geometry overlapping is due to an incorrect geometry or bbl. Then an additional step of correcting those geometry will take place by either correcting the source files or the record will need to be updated in ZAP. The cluster review is a crucial step in which reviewers should ensure as a later in KPDB process deduplication of the units to avoid over-counting of units.

Review Strategies

Redundant information should also be removed in this process. Possible redundancy includes but not limited to 1. Planner added projects now an official application is filed. 2. Multiple DCP application coexists with the same project and often the most updated record should be removed in this case.

  • Remove projects that do NOT have any residential component. e.g. parking rezoning.
  • the overlapping geometry may be a larger rezoning that envelopes multiple independent projects. In this case, it is determined that projects should be separated therefore 'declustered'.

Instructions:

Expected corrections file format

record_id action record_id_match
9 reassign 11
7 reassign NULL
8 combine 12
3, 4 reassign NULL
1, 2 reassign 10

Schema

Field Value
record_id record_id or comma-separated list of record_ids to get reassigned to or combined with another project
action combine/reassign
record_id_match any record_id from the project that you want to either combine with or reassign to (can be NULL)

Note: if record_id_match is NULL(leave as blank in csv or excel), it's indicating that we will create a new stand-alone project

How does it work?

Example of matched records (_project_record_ids table):

project_record_ids
{1,2,3,4}
{5,6,7,8}
{9,10,11}

Reassign: removing a record from one group of matched records to another.

For example

record_id action record_id_match
7 reassign 2

results in:

project_record_ids
{1,2,3,4,7}
{5,6,8}
{9,10,11}

This can also be used to separate out a record into a stand-alone project. For example

record_id action record_id_match
7 reassign NULL

results in:

project_record_ids
{1,2,3,4}
{5,6,8}
{9,10,11}
{7}

This action can also reassign multiple record_ids from a given project simultaneously. For example

record_id action record_id_match
7, 8 reassign 2

results in:

project_record_ids
{1,2,3,4,7,8}
{5,6}
{9,10,11}

If record_id_match is NULL, can also be used to separate out multiple records into a stand-alone project. For example

record_id action record_id_match
7, 8 reassign NULL

results in:

project_record_ids
{1,2,3,4}
{5,6}
{9,10,11}
{7,8}

Combine: Merging two, distinct groups of matched records. This means that they refer to the same project.

For example

record_id action record_id_match
7 combine 2

results in:

project_record_ids
{1,2,3,4,5,6,7,8}
{9,10,11}

Corrections - Project to DOB Matching

Overview:

There are two kinds of reviews that we are asking you to do in this phase. The review_dob table contains every match we found between a DOB record and a non-DOB record (or a project containing multiple non-DOB records). These matches are based on where a DOB record's BBL geometry overlaps with the non-DOB record geometry, as long as the dates are close enough.

Note: Neighborhood Study Rezoning Commitments and Future Neighborhood Studies are excluded from this process and will be added later

Sometimes, a DOB record will spatially overlap with only one thing. In this case, we can assume that this match is accurate and move forward with the unit adjustments. In these cases, we set project_has_dob_multi to 0. Because we're still developing our workflow, though, we don't want to make too many assumptions. That is why we are including these matches for you to verify.

In other cases, a DOB record will spatially overlap with more than one non-DOB record or verified project. In these cases, we cannot make any assumptions about how to automatically count up the units, since we don't know which match is the accurate one (remember that a DOB record cannot accurately match with more than one project). Because there is no automated way for us to move forward, we all projects with project_has_dob_multi = 1 needs review. In these cases, we need you to do more than verify that the match we found is correct. We need you to tell us which of the matches is correct, and which are incorrect.

The difference between project_has_dob_multi and dob_multimatch

If a DOB record matched with more than one project,

  • project_has_dob_multi will be 1 for all records in that project. This is to make it easy for you to identify projects with conflicts. For example, the following query will identify every project that matched with a DOB record, but that DOB record also matched with a different project. This would give you the context you need to tell us which match is the correct one.
SELECT * FROM review_dob
WHERE project_has_dob_multi = 1
  • dob_multimatch will be 1 for just the DOB record in that project. Because the DOB records are the only one's you are changing, this is so you can easily identify where we need you to make edits. The following query would give you every DOB match, where the DOB record matched with more than one project. You should be editing every record returned by this query.

It does not include the other records from the project, however, so refer to the previous query to get context information.

SELECT * FROM review_dob
WHERE dob_multimatch = 1

Corrections table format and instructions

You will resolve all cases where dob_multimatch = 1 through the corrections_dob table. In this table, you can either add or remove matches between a DOB record and a project consisting of one or more non-DOB records. As in the corrections_project table, you only need to specify one record_id from a group of record_ids that make up a project in order to either add or remove a match.

Expected corrections file format

record_id action record_id_dob
1 add 54321
1 remove 12345

Types of corrections

In the following example, the DOB record with ID 12345 matched with two different projects. One project consists of non-DOB records with IDs 1, 2, and 3. The second project consists of non-DOB records with IDs 4 and 5.

record_id project_record_ids record_id_dob
1 {1,2,3} 12345
2 {1,2,3} 12345
3 {1,2,3} 12345
4 {4,5} 12345
5 {4,5} 12345

Remove a match: A DOB record matched with a verified group of non-DOB records, but they are not referring to the same project.

If the DOB record with ID 12345 should be part of the same project as the non-DOB records with 4 and 5 (and not the one with IDs 1, 2, and 3) you would specify that by removing the incorrect match. You would do so with a correction that looks like:

record_id action record_id_dob
1 remove 12345

This results in:

record_id project_record_ids record_id_dob
4 {4, 5} 12345
5 {4, 5} 12345

Add a match: A DOB record did not match with a verified group of non-DOB records, but it is actually referring to the same project.

If a DOB record does not overlap with a non-DOB record but they refer to the same project, you can also add a match. For example, if the DOB record with 54321 does not overlap with records 1, 2, and 3, but still refers to the same project, you would add a correction like:

record_id action record_id_dob
1 add 54321

This results in:

record_id project_record_ids record_id_dob
1 {1,2,3} 12345
2 {1,2,3} 12345
3 {1,2,3} 12345
1 {1,2,3} 54321
2 {1,2,3} 54321
3 {1,2,3} 54321
4 {4, 5} 12345
5 {4, 5} 12345

Corrections - Main Process

Table of Contents

About

Sometimes source data will be incorrect or incomplete. Before publishing KPDB, research teams conduct an extensive manual research effort to review, verify, and correct data.

Records are added to the Correction table to edit an attribute of an existing record. The corrections_main table is also used to to exclude records. By capturing the old attribute value, it is easy to keep track of the original, pre-edited data. Additionally, a value is only overwritten if the old value matches the current value for the specified field. Therefore, if the source data is updated between versions, erroneous corrections won't be applied.

Schema - corrections_main.csv

record_id field old value new value editor date record_name Notes
Unique ID of the project Field where the value needs to be updated Value that needs to be changed New value Name of person who made the edit Date the edit was made Name of the record Notes from the editor
CREATE TABLE corrections_main(
    record_id text,
    field text,
    old_value text,
    new_value text,
    editor text,
    date text,
    record_name text,
    notes text
);

It is important that the editor input the record_id, field, and old value values exactly as they appear in KPDB, including any capitalization and special characters. KPDB field names and descriptions can be found here. Additionally, record_id + field must be unique within the Correction table. If there are instances where record_id + field are not unique, we will not ingest the data.

Types of Corrections

Below describes what types of corrections can be made via the corrections_main table, and gives an example of how to make each correction. If you'd like to make a correction that is not described, please reach out to Data Engineering.

Update the value of an existing record

A value is incorrect or missing and you know the correct information, so you'd like it to be reflected in the final output.

record_id field old value new value editor date record_name Notes
P2015R0339 units_gross 200 180 baiyue 4/20/2020 some comments
P2015R0339 units_gross 180 baiyue 4/20/2020 filling in missing unit count

Remove a record

You do not want the record to be included in KPDB or any clustering logic.

Note: as of 2021, all zap add/removals should be recorded in corrections_zap instead of corrections_main

record_id field old value new value editor date record_name Notes
12123424 remove baiyue 4/20/2020 Stanley Avenue and Glen Avenue some comments

Dependent variables

Sometimes the value in one field is dependent on another value. It's important to keep these dependencies in mind as you're updating values. For related fields that aren't automatically adjusted, you will want to also provide corrections to these fields so that they're in-sync.

Corrected fields Fields affected downstream Related fields that aren't automatically adjusted
Date Phasing proportions
Status Phasing proportions
Phasing proportions Counts of units in each phase
Geometry Assignment of record to a project (see note), borough
Gross units Net units, counts of units in each phase

NOTE: If a geometry correction results in a new overlap between records, the default assumption is that these records are part of the same project. If you correct a geometry in a way that causes an overlap, but the record refers to a different project than the overlapping records, you will need to account for this exception in the corrections_project table. For example, let's say you modify geometry in a way that causes an overlap, but the record getting the new geometry is actually a stand-alone project. You would add a row to corrections_project where record_id is the record with the new geometry, action is reassign, and record_id_match is NULL.

Correcting geometries

If the spatial data for a record is incorrect or missing, you can correct or create the geometry by supplying a shapefile. The shapefile should have two components: record_id and corrected geometry.

There are several tools that you can use to create this shapefile, including: geojson.io, Carto, and ArcMap.

The geometry from this file will overwrite the geometry of the record with a matching record_id in the original file, regardless of if the geometry has changed in the source data between versions. If, after editing the geometry of a record, the record should NOT be assigned to the same project as overlapping records, do not forget to specify this exception in corrections_project.

Incorporating the research

The table corrections_main is just one of the five research tables. For complete descriptions of the others, see:

After the research is complete and the five tables are returned to Data Engineering, we will rerun the KPDB build-process with the new corrections data. This will:

  1. Include any new planner-added projects or ZAP projects newly flagged as add in corrections_zap
  2. Remove any records flagged as "remove" in either corrections_main or corrections_zap
  3. Apply the new values and geometries from corrections_main and shapefile
  4. Identify projects by finding overlapping non-DOB records, then making adjustments to the lists of overlapping records using corrections_project. In cases where the geometries got corrected in a way that creates overlaps, we reassign records to the same project as overlapping records unless otherwise specified in corrections_project
  5. Identify overlaps between DOB records and non-DOB records, then use corrections_dob to assign these DOB records to projects. DOB records will get assigned to a project if a match is not flagged as remove in corrections_dob, or if a new match is specified as an add in corrections_dob.
  6. Calculate borough field based on corrected geometries.
  7. Run the units_net calculation based on corrected units_gross and corrected project assignments.
  8. Overwrite units_net calculation using any records in corrections_main where field = 'units_net'
  9. Calculate the phased units by multiplying the corrected phasing proportions with the corrected units net.

zap_record_ids List

The new workflow deprecated the existing method of adding and removing with corrections_zap table. Instead, giving the research team more control over the ZAP projects included in KPDB. zap_record_ids list is the exhaustive list for records included meaning any records not on the list would get dropped after the filtering logics are applied to it. Also based on current logics, any records dropped by the filtering logics would get added back if they are part of this list.

How to add to the list?

Housing team should receive a list of new ZAP projects added since the last KPDB update. This is currently done manually but could be automated. The filtering is based on the createdon date and should look back to a year normally since the last build. For example, if the 2021 version KPDB is built in November of 2021 and new build is happening in November of 2022 then the new records between November 2021 and November 2022 should be provided to Housing.


Planner added projects table schema & instructions

Planner-Added Projects

There are instances when a Planner knows about a future housing project that is not captured in any of the input sources. When this happens, a record is created within the Planner-Added Projects table. The layout of the Planner-Added Projects table is outlined below. It is important that the record_id is unique throughout the table. If there is a corresponding geometry in the geometry corrections shapefile, make sure that the geometry record and the Planner-Added Projects record have the same record_id so we can link them. Additionally, it is important that every relevant component is filled out for a record.

Only new records should be added to and edited within the Planner-Added Projects table. Planner-Added Project records that are already within KPDB should be considered as "locked in." Edits to these existing records should be done in the corrections_main table. Only new records in the Planner-Added Projects table will be appended to the final KPDB.

Field Definition
Record ID The unique ID of the record
Name The name of the project as provided by the source
Borough The borough the project is located within
Status The current phase of the project
Units gross The proposed number of units associated with the project as reported in the source data
Portion built within 5 years The portion of the project should be completed within the next 5 years
Portion built within 5-10 years The portion of the project that should be completed within the next 10 years
Portion built after 10 years The portion of the project that should be completed sometime after 10 years from this year
Phasing assumption rational The planner’s rationale for the timeline of a project
Phasing: Assumed or Known Boolean. True if the phasing assumptions are known
NYCHA Boolean. True if this is a NYCHA project
Group Quarters Boolean. True if this is a Group Quarters project
Senior Housing Boolean. True if this is a Senior Housing project
Assisted Living Boolean. True if this is an Assisted Living project
Inactive Flag to indicate if the project is inactive or not. If we do not include DOB inactive jobs we will exclude this field
Project ID ID the groups records together into a single project
Omit Boolean. True if this project should not be included in KPDB
Omission reason Reason for omitting the record
Lead planner name Name of the lead planner
Editor Name of person who added the record into the planner added projects table
Planner notes Additional notes from the planner
Date Date the project was added

Manual Corrections Ingestion

Create an issue

First navigate the KPDB data repo main page Screenshot 2023-01-03 at 10 53 50 AM

Then click on the issue tab to see this page. Then click the "New Issue" Screenshot 2023-01-03 at 11 09 13 AM

Do not use the issue template instead use the "Open a blank issue" Screenshot 2023-01-03 at 11 10 03 AM

Give your issue an title and also some descriptions. Then click on "Submit new issue" Screenshot 2023-01-03 at 11 17 39 AM

you should see an new issue got created immediately after. One key information here is the #206 number which would be your issue number and also the number you should point the Pull Request to. Screenshot 2023-01-03 at 11 18 52 AM

Create a Pull Request

Once you have your issue created and issue number. Go back to the KPDB data repo home page. The home page should be automatically take you to the main branch which should be indicated right under the issue tab. It is automatic so it shouldn't require you to make a change but good to confirm. Then you can navigate to the corrections folder. Screenshot 2023-01-03 at 11 24 57 AM

Once you are inside the corrections folder, go to Add file dropdown and then choose the Upload files option. Screenshot 2023-01-03 at 11 27 54 AM

Once you choose your new corrections spreadsheet to be uploaded and making sure the name is exactly the same as the file it is going to be replaced. You can give it some commit messages. Then select the option to Create a new branch for this commit and start a pull request. We then highly recommend it to name the branch with the issue number (in this case 206) included. This would help connect the issue and Pull Request which is addressing the issue. When everything looks good to create the Pull Request click on Propose Changes and it would take you to the final creation page for Pull Request. Screenshot 2023-01-03 at 11 35 46 AM

Before finalizing the Pull Request, use the Reviewers functionality on the right to tag appropriate EDM member to review the changes which could be viewed as you scroll to the lower bottom of this page. Once they were tagged then click on Create Pull Request. Screenshot 2023-01-03 at 11 44 12 AM

Now you have a Pull Request created for the changes you want to make and EDM members should be notified by Github with the request to review. It should be noted at least for the early on of the process to not click on Merge Pull Request until at least one member of Data Engineering team approved the Pull Request. Screenshot 2023-01-03 at 11 46 51 AM


Build KPDB and Get Output on Github

You can build the KPDB without needing any softwares (or data) in your local environment by using the Github Actions.

the "Build" workflow

Once you are in the Github Actions page for KPDB. Then navigates to the Build workflow using the panel to the left which should bring you to the view below. Screenshot 2023-01-04 at 2 22 49 PM

Then use the dropdown menu to first choose the branch you want to run the production off. Then also it asked you whether to use the latest datasets (defaulted to "no") which includes any source data updates and manual corrections merged into main branch in the db-knownproject-data repo. Then last option asked you whether to send the output to the (defaulted to "yes") and we will talk about how to access this in the next section. Click Run workflow to kick off a production run. Screenshot 2023-01-04 at 2 24 29 PM

When a run is kicked off it would show up on the list of runs like below. You can click on the specific run to check on the status and details for each run.

Output in db-knownproject-data

Remember the db-knownproject-data is a private repo so you might need to request access for it.

If the option for sent the output to db-knownproject-data was set to yes in the last step, then a new batch of output would be sent to the db-knownproject-data. The output would be timestamped by the date and also hour the production run finished. Click on the output branch name you want to access. Screenshot 2023-01-04 at 2 47 41 PM


Phasing Assumptions

Source Project type Phasing provided/assumed Proposed phasing 2020
DOB Complete Assumed All counted as complete
DOB Permit issued Assumed Next 5 years
DOB Application filed Assumed Next 5 years
DOB In progress/In progress (last disapproved) Assumed Next 5 years
DOB Inactive (not withdrawn) Assumed 5-10 years
DOB Withdrawn Assumed exclude
HPD Projected Closings n/a Provided 2.5 years after financing close (assumption that financing closes around the same time a permit is issued)
HPD RFPs n/a Provided Next 5 years
EDC n/a Provided EDC-provided build year
DCP Application All non-ULURP Assumption 5-10 years
DCP Application Status = on-hold Assumption 5-10 years
DCP Application Status = active, Pre-PAS Assumption 5-10 years
DCP Application Status = active, certified in or before 2019 Assumption 5-10 years
DCP Application Status = active, certifying after 2019 Assumption 5-10 years
DCP Application Status = active, Public review Assumption 5-10 years
DCP Application Status = complete Assumption 5-10 years
DCP Application Status = Record Closed Assumption exclude
Empire State Development Projects n/a n/a One project: Atlantic Yards, provided by planner input
Neighborhood Study Rezoning Commitments Downtown Far Rockaway/Inwood Provided Estimates provided by EDC
Neighborhood Study Rezoning Commitments Jerome and Bay Street Corridor Provided Estimates provided by planner input
Neighborhood Study Rezoning Commitments ENY and East Harlem Assumption Next 5 years (based on rezoing effective date)
Neighborhood Study Projected Development Sites n/a Assumption Remaining units evenly distributed across each year of the estimated buildout timeline
Future Neighborhood Studies n/a Assumption Units evenly distributed across each year of the estimated buildout timeline.

KPDB status normalization

Original status Source Proposed status
Active, Initiation DCP Applications DCP 1: Expression of interest
On-Hold, Initiation DCP Applications DCP 1: Expression of interest
Active, Pre-Cert DCP Applications DCP 2: Application in progress
Active, Pre-PAS DCP Applications DCP 2: Application in progress
On-Hold, Pre-Cert DCP Applications DCP 2: Application in progress
On-Hold, Pre-PAS DCP Applications DCP 2: Application in progress
Active, Certified DCP Applications DCP 3: Certified/Referred
Complete DCP Applications DCP 4: Zoning Implemented
1. Filed Application DOB DOB 1. Filed Application
2. Approved Application DOB DOB 2. Approved Application
3. Permitted for Construction DOB DOB 3. Permitted for Construction
4. Partially Completed Construction DOB DOB 4. Partially Completed Construction
5. Completed Construction DOB DOB 5. Completed Construction
Potential DCP Planner-Added Projects Potential
Projected EDC Projected Projects Potential
Projected Empire State Development Projected Projects Potential
Rezoning Commitment Neighborhood Study Rezoning Commitments Potential
Projected Development Neighborhood Study Projected Development Sites Potential
Future Rezoning Future Neighborhood Studies Potential
RFP issued; financing not closed HPD RFPs HPD 1: RFP Issued
RFP designated; financing not closed HPD RFPs HPD 2: RFP Designated
RFP designated; financing closed HPD RFPs HPD 4: Financing Closed
Projected HPD Projected Closings HPD 3: Projected Closing

SCA Geographies Join

Carto

the current implementation to perform SCA join is with a series sql queries run on with Carto Batch. This would require both the username for CP Carto account and also an API key.

credential and API key

the credential to logon to the CP division carto account is stored in the secrets file for DE.

Using that credential to logon to the portal to get the API key to run batch jobs.

longform_csd_output, Longform_es_zone_output, and longform_subdist_output_cp_assumptions_2021

Producing these three output are the next steps. The four ingredients required to run the sql queries are the final KPDB table, nyc_school_districts, doe_schoolsubdistricts, and doe_school_zones_es. Make sure those datasets are present on CP's Carto account and also they are updated to the latest versions.

Follow the documentation by Emily to produce the three csvs and qaqc these output by comparing them with previous versions using jupyter notebook.

2021_Known_Projects_DB_SCAGeographies_vF_20220214

the final deliverable to SCA is going to be created from the three tables from previous step. The way to generate this is with Excel formula and combined into a single xlsx file before sending to SCA.

The spreadsheets used to perform the calculations for the previous years are in this folder.

##READ ME for HED this tab in Emily's KPDB OneNote contains the most detailed step-by-step to do that calculation to output the final ES zone and MS Zone allocation.


Deprecated repos

db-knownprojects Known-Projects-Database

Clone this wiki locally