Product: KPDB

Known Projects Database (KPDB)

The Known Projects DB contains information on future housing starts. It gathers data from 9-11 data sources (depending on use case, see below), compares the information in each data source to ensure that units existing in multiple sources are not counted more than once, and then aggregates these sources. It incorporates granular information at the project-level, including project statuses, estimated project phasing, and borough planner input on projects where available.
The Known Projects DB presents housing information at the project-level. Therefore, it can be used for growth planning citywide as well as granular looks at neighborhood-level expected growth.
The Known Projects DB is currently used to create the School Construction Authority's (SCA) Housing Pipeline.

Disclaimer: This information does NOT represent a housing projection produced by DCP, nor can DCP attest to the certainty that each of these developments will lead to future housing starts.
The Known Projects DB does not represent future as-of-right growth nor future growth from projects that have not yet materialized in the data sources below.

Data Sources

Primary Data Sources

DCP Housing Developments Database - This database is created by DCP using DOB permit and certificate of occupany data. It includes permits and applications for new buildings, building alterations, and demolitions.
HPD Projected Closings - This is confidential data on HPD's pipeline of closings on privately-owned sites. Provided by HPD.
HPD Request for Proposals - City-owned sites for which HPD has issued or designated RFPs for residential development. Provided by HPD.
EDC Projected Projects - Projected projects expected by EDC to develop residential units with reasonable certainty. Provided by EDC.
DCP Applications- Discretionary actions through DP that facilitate residential development. This data is generated by DCP. Several processing steps are required to identify projects facilitating residential development.
Empire State Development Projected Projects- Known residential development projects by Empire State Development. These projects are collected by DCP.
Neighborhood study affordable housing commitments- Affordable housing commitments made by the current administration in neighborhood rezoning areas. These are collected from each adopted neighborhood study's Points of Agreement.
Future City-Sponsored RFPs/RFEIs- Additional future Request for Proposals or Requests for Expressions of Interest for affordable housing provided by City Hall. Note that many of these can be highly speculative conversations with developers, and not comprehensively provided by DCP planners.
DCP Planner-Added Projects- Additional projects identified by the Department of City Planning borough planners which are not yet in the above data sources. Note that many of these can be highly speculative conversations with developers, and not comprehensively provided by DCP planners.

Secondary Data Sources

Neighborhood study projected developments- The count of units each adopted neighborhood study from the current administration is expected to develop. These projections are not site-specific, and are highly speculative. These projects should not necessarily be included for planning purposes, depending on need.
Future neighborhood studies- The count of units each future neighborhood study is projected to develop. These projections are not site-specific, and are highly speculative. Because these rezonings have not yet been adopted, we include a certainty discount factor, and we do not deduplicate. These projects should not necessarily be included for planning purposes.

source tables

KPDB Schema

Field	Long form Name	Definition
`project_id`	Project ID	ID the groups records together into a single project
`source`	Source	The name of the dataset the project came from
`record_id`	Record ID	The ID of the project from the source dataset
`record_name`	Record name	The name of the project as provided by the source
`borough`	Borough	The borough the project is located within
`status`	Status	The current phase of the project
`type`	Type	The type of project. For DOB jobs this would be NB, Alt, etc.
`date`	Date	The date of the project.
`date_type`	Date Type	The type of date for the project based on the project source (Build Year, Certified Referred, Date Filed, Date Permitted, Effective Date, Month Closed, Projected Fiscal, Year Range)
`units_gross`	Units gross	The proposed number of units associated with the project as reported in the source data
`units_net`	Units net	The number of future units that should be counted as part of the project after removing completed DOB units and deduplication
`prop_within_5_years`	Proportion built within 5 years	Proportion of units that should be completed within the next 5 years
`prop_5_to_10_years`	Proportion built within 5-10 years	Proportion of units that should be completed within the next 10 years
`prop_after_10_years`	Proportion built after 10 years	Proportion of units that should be completed sometime after 10 years from this year
`within_5_years`	Units built within 5 years	The number of units that should be completed within the next 5 years
`from_5_to_10_years`	Units built within 5-10 years	The number of units that should be completed within the next 10 years
`after_10_years`	Units built after 10 years	The number of units that should be completed sometime after 10 years from this year
`phasing_rationale`	Phasing assumption rational	The planner’s rationale for the timeline of a project
`phasing_known`	Phasing: Assumed or Known	Boolean. True if the phasing assumptions are known
`nycha`	NYCHA	Boolean. True if this is at least partially a NYCHA project
`classb`	Group Quarters	Boolean. True if this is a project with Class B units (can be partial)
`senior_housing`	Senior Housing	Boolean. True if this is at least partially a Senior Housing project
`inactive`	Inactive	Flag to indicate if the project is inactive or not. If we do not include DOB inactive jobs we will exclude this field
`geom`	geometry	geometry(Geometry,4326)

Description of tables in the knownprojects build environment

Source tables

The following source tables are pre-processed and loaded from DigitalOcean. Raw data for all non-ZAP sources, along with their preprocessed versions, are available in the private repo db-knownprojects-data.

dcp_projects:
dcp_projectactions:
dcp_projectbbls:

dcp_housing: A copy of the housing database
dcp_knownprojects: KPDB from previous release
dcp_n_study: Neighborhood study rezoning commitment input data
dcp_n_study_future: Input data for future neighborhood studies
dcp_n_study_projected: Input data for Neighborhood Study Projected Development Sites
dcp_planneradded: Contains projects manually identified by planners. For more information, see the planner-added project instructions.
edc_projects: EDC projected projects source data
esd_projects: Empire State Developments projected projects source data
hpd_pc: HPD projected closings source data
hpd_rfp: HPD Requests For Proposals source data. Has one record per BBL.

Intermediate tables

dcp_application: Pulls fields from the three ZAP input tables to create all fields necessary for KPDB. Also includes several flags to aid in the ZAP review process. Zap records are excluded if they have a status of "record closed," "terminated," or "withdrawn." Includes projects from after 2010 or from an unspecified date. Projects with a clear indication of residential components are flagged for review. Other flags indicate whether a project existed in the previous version of KPDB, and whether the project was explicitly added/removed via the corrections_zap table.
dcp_housing_poly: Housing database with formatted dates, BBL geometries, phasing, and boolean flags. Created in dcp_housing.sql.
- Excludes demolitions and withdrawn jobs.
- For class A, exludes jobs with no residential unit change, alterations removing units, and jobs where the proposed residential units are 0.
- For class B, and record with positive initial or proposed units are included. These records only provide context for DOB match review, and get excluded from the final KPDB output.
- Polygon geometries come from a join with mappluto on BBL, primarily, and a spatial join secondarily.
combined: Contains a combination of input data with a shared schema. For several data sources, this involves joining on BBL with dcp_mappluto_wi to get lot-level polygon geometry. For some input datasets, a rows get collapsed to the level of a project. In these cases, the unique IDs for input data rows get stored in the field record_id_input as an array. If no collapsing was necessary (i.e. each project a single record in the a given source data table), the record_id_input array only contains the record_id. In cases where there is no unique project-level ID, the record_id gets assigned from a hash of the uids in record_id_input. This table contains standardized of statuses, initial phasing assumptions, and boolean flags. Created in combine.sql.
_project_record_ids: This is the initial table of project inputs (groups of record_ids that refer to the same project). They are identified by finding spatial overlaps between non-DOB records, forming arrays of individual record_ids, called project_record_ids. Two of the neighborhood study sources are not included, as units from these sources do not deduplicate with other sources. The table is initially created in _project_record_ids.sql. If there are corrections in corrections_project, these get applied to move record_ids between projects after the initial identification of spatial overlaps in correct_projects.sql. This table then gets used to create review_project (see below) for review.
project_record_ids: This is the final table of project_record_ids. It includes everything in _project_record_ids, with DOB records added. The dcp_housing record_ids get added to the arrays in _project_record_ids, as long as they haven't been tagged for removal in corrections_dob. Additional DOB to non-DOB matches get added if they're flagged as add in the correction_dob table. This table also includes stand-alone projects (an array consisting of a single record_id). These include dcp_housing records and combined records that did not match with other records in the two stages of identifying spatial overlaps. Created in project_record_ids.sql

deduped_units: This table contains results from the automatic deduplication of units in dedup_units.py. Units get subtracted within a project based on the following hierarchy, where units from lower-number sources get subtracted from units with higher-number sources.

hierarchy = {
     "DOB": 1,
     "HPD Projected Closings": 2,
     "HPD RFPs": 3,
     "EDC Projected Projects": 4,
     "DCP Application": 5,
     "Empire State Development Projected Projects": 6,
     "Neighborhood Study Rezoning Commitments": 7,
     "Neighborhood Study Projected Development Sites": 8,
     "DCP Planner-Added Projects": 9,
 }

_kpdb: Contains all records from combined except for DOB records containing only class B units. Joined onto these records are the deduplicated units_net from deduped_units. This field gets multiplied by phasing proportions to calculate phased unit counts. Created in create_kpdb.sql.

Review tables

review_project: Contains records from combined, excluding those from sources DOB, Neighborhood Study Rezoning Commitments, and Future Neighborhood Studies. Joined onto these records are the project_record_ids identified in the table _project_record_ids. This table has a boolean flag indicating cases where records are part of a multi-record project (identified from spatial overlap). Also includes a boolean flag indicating missing geometry. This is because geometries to not get exported in the CSV versions of this table. Created in review_project.sql.
_review_dob: Contains spatially-overlapping DOB and non-DOB records from combined (excluding those from Neighborhood Study Rezoning Commitments and Future Neighborhood Studies). In cases where the non-DOB record is part of a multi-record project, all other records associated with that project are also included. There are several boolean flags in this table to aid in review. The field dob_multimatch is 1 for DOB records that overlap with more than one project. All non-DOB records in these projects get flagged with project_has_dob_multi. Created in review_dob.sql.
review_dob: This is the same as _review_dob, with the array values transformed to strings for export.
corrections_applied: Contains all successfully-applied corrections from corrections_main
corrections_not_applied: Contains corrections from corrections_main that could not get applied, either because the field wasn't valid or the old_value did not match any existing records.

Corrections tables

corrections_zap: Corrections table for adding and removing ZAP records from KPDB. This gets downloaded from the For more information, see the ZAP review instructions.
corrections_main: Corrections table for modifying any value other than units_net. This table is intended for all corrections that do not rely on the relationships between records. For more information, see main review instructions.
corrections_project: Corrections table for modifying which records refer to the same project. The default is to assume that spatially overlapping records refer to the same project. For more information, see the project review instructions.
corrections_dob: Corrections table either adding or removing matches between DOB and non-DOB records. This table is necessary because a DOB record cannot be associated with more than one project. For more information, see the DOB review instructions.

Output tables

kpdb: Final output of KPDB, created in _export.sql. Only difference from _kpdb is which fields are included.

Corrections - Non DOB Records to Each Other

Overview:

This is the process to identify related projects from different source tables using spatial-overlap. Project relations are created in _project_record_ids.sql.

Note: DOB, Neighborhood Study Rezoning Commitments, Future Neighborhood Studies are excluded from the spatial-overlap process and will be added later

File Review

The corrections for Non-DOB records can be recorded in the corrections_project.csv/corrections found in the db-knownprojects-data repo. The files needed to be reviewed in the KPDB output are:

review_project.shp, review_project.csv taking a close look at the dummy_id which refers to the cluster ID of the projects and the record_id which refers to the record that will need to be reassigned or combined.
corrections_project.csv is the file in which any necessary corrections are updated and applied in the pipeline.

Manual Review Methodology

The purpose of the manual cluster reviews is to identify records from the multiple datasources are indeed referring to the same project and complimentary in their information on the same project. Or is the spatial overlaps between the records simply coincidental and therefore should be separated into different clusters or thereby creating new clusters by themselves. In the second case, we call this a reassignment of clusters has occurred. In some cases, the geometry overlapping is due to an incorrect geometry or bbl. Then an additional step of correcting those geometry will take place by either correcting the source files or the record will need to be updated in ZAP. The cluster review is a crucial step in which reviewers should ensure as a later in KPDB process deduplication of the units to avoid over-counting of units.

Review Strategies

Redundant information should also be removed in this process. Possible redundancy includes but not limited to 1. Planner added projects now an official application is filed. 2. Multiple DCP application coexists with the same project and often the most updated record should be removed in this case.

Remove projects that do NOT have any residential component. e.g. parking rezoning.
the overlapping geometry may be a larger rezoning that envelopes multiple independent projects. In this case, it is determined that projects should be separated therefore 'declustered'.

Instructions:

Expected corrections file format

record_id	action	record_id_match
9	reassign	11
7	reassign	NULL
8	combine	12
3, 4	reassign	NULL
1, 2	reassign	10

Schema

Field	Value
record_id	record_id or comma-separated list of record_ids to get reassigned to or combined with another project
action	`combine`/`reassign`
record_id_match	any record_id from the project that you want to either combine with or reassign to (can be `NULL`)

Note: if record_id_match is NULL(leave as blank in csv or excel), it's indicating that we will create a new stand-alone project

How does it work?

Example of matched records (`_project_record_ids` table):

project_record_ids
`{1,2,3,4}`
`{5,6,7,8}`
`{9,10,11}`

Reassign: removing a record from one group of matched records to another.

For example

record_id	action	record_id_match
7	reassign	2

results in:

project_record_ids
`{1,2,3,4,7}`
`{5,6,8}`
`{9,10,11}`

This can also be used to separate out a record into a stand-alone project. For example

record_id	action	record_id_match
7	reassign	NULL

results in:

project_record_ids
`{1,2,3,4}`
`{5,6,8}`
`{9,10,11}`
`{7}`

This action can also reassign multiple record_ids from a given project simultaneously. For example

record_id	action	record_id_match
7, 8	reassign	2

results in:

project_record_ids
`{1,2,3,4,7,8}`
`{5,6}`
`{9,10,11}`

If record_id_match is NULL, can also be used to separate out multiple records into a stand-alone project. For example

record_id	action	record_id_match
7, 8	reassign	NULL

results in:

project_record_ids
`{1,2,3,4}`
`{5,6}`
`{9,10,11}`
`{7,8}`

Combine: Merging two, distinct groups of matched records. This means that they refer to the same project.

For example

record_id	action	record_id_match
7	combine	2

results in:

project_record_ids
`{1,2,3,4,5,6,7,8}`
`{9,10,11}`

Corrections - Project to DOB Matching

Overview:

There are two kinds of reviews that we are asking you to do in this phase. The review_dob table contains every match we found between a DOB record and a non-DOB record (or a project containing multiple non-DOB records). These matches are based on where a DOB record's BBL geometry overlaps with the non-DOB record geometry, as long as the dates are close enough.

Note: Neighborhood Study Rezoning Commitments and Future Neighborhood Studies are excluded from this process and will be added later

Sometimes, a DOB record will spatially overlap with only one thing. In this case, we can assume that this match is accurate and move forward with the unit adjustments. In these cases, we set project_has_dob_multi to 0. Because we're still developing our workflow, though, we don't want to make too many assumptions. That is why we are including these matches for you to verify.

In other cases, a DOB record will spatially overlap with more than one non-DOB record or verified project. In these cases, we cannot make any assumptions about how to automatically count up the units, since we don't know which match is the accurate one (remember that a DOB record cannot accurately match with more than one project). Because there is no automated way for us to move forward, we all projects with project_has_dob_multi = 1 needs review. In these cases, we need you to do more than verify that the match we found is correct. We need you to tell us which of the matches is correct, and which are incorrect.

The difference between `project_has_dob_multi` and `dob_multimatch`

If a DOB record matched with more than one project,

project_has_dob_multi will be 1 for all records in that project. This is to make it easy for you to identify projects with conflicts. For example, the following query will identify every project that matched with a DOB record, but that DOB record also matched with a different project. This would give you the context you need to tell us which match is the correct one.

SELECT * FROM review_dob
WHERE project_has_dob_multi = 1

dob_multimatch will be 1 for just the DOB record in that project. Because the DOB records are the only one's you are changing, this is so you can easily identify where we need you to make edits. The following query would give you every DOB match, where the DOB record matched with more than one project. You should be editing every record returned by this query.

It does not include the other records from the project, however, so refer to the previous query to get context information.

SELECT * FROM review_dob
WHERE dob_multimatch = 1

Corrections table format and instructions

You will resolve all cases where dob_multimatch = 1 through the corrections_dob table. In this table, you can either add or remove matches between a DOB record and a project consisting of one or more non-DOB records. As in the corrections_project table, you only need to specify one record_id from a group of record_ids that make up a project in order to either add or remove a match.

Expected corrections file format

record_id	action	record_id_dob
1	add	54321
1	remove	12345

Types of corrections

In the following example, the DOB record with ID 12345 matched with two different projects. One project consists of non-DOB records with IDs 1, 2, and 3. The second project consists of non-DOB records with IDs 4 and 5.

record_id	project_record_ids	record_id_dob
1	{1,2,3}	12345
2	{1,2,3}	12345
3	{1,2,3}	12345
4	{4,5}	12345
5	{4,5}	12345

Remove a match: A DOB record matched with a verified group of non-DOB records, but they are not referring to the same project.

If the DOB record with ID 12345 should be part of the same project as the non-DOB records with 4 and 5 (and not the one with IDs 1, 2, and 3) you would specify that by removing the incorrect match. You would do so with a correction that looks like:

record_id	action	record_id_dob
1	remove	12345

This results in:

record_id	project_record_ids	record_id_dob
4	{4, 5}	12345
5	{4, 5}	12345

Add a match: A DOB record did not match with a verified group of non-DOB records, but it is actually referring to the same project.

If a DOB record does not overlap with a non-DOB record but they refer to the same project, you can also add a match. For example, if the DOB record with 54321 does not overlap with records 1, 2, and 3, but still refers to the same project, you would add a correction like:

record_id	action	record_id_dob
1	add	54321

This results in:

record_id	project_record_ids	record_id_dob
1	{1,2,3}	12345
2	{1,2,3}	12345
3	{1,2,3}	12345
1	{1,2,3}	54321
2	{1,2,3}	54321
3	{1,2,3}	54321
4	{4, 5}	12345
5	{4, 5}	12345

Corrections - Main Process

About

Sometimes source data will be incorrect or incomplete. Before publishing KPDB, research teams conduct an extensive manual research effort to review, verify, and correct data.

Records are added to the Correction table to edit an attribute of an existing record. The corrections_main table is also used to to exclude records. By capturing the old attribute value, it is easy to keep track of the original, pre-edited data. Additionally, a value is only overwritten if the old value matches the current value for the specified field. Therefore, if the source data is updated between versions, erroneous corrections won't be applied.

Schema - `corrections_main.csv`

record_id	field	old value	new value	editor	date	record_name	Notes
Unique ID of the project	Field where the value needs to be updated	Value that needs to be changed	New value	Name of person who made the edit	Date the edit was made	Name of the record	Notes from the editor

CREATE TABLE corrections_main(
    record_id text,
    field text,
    old_value text,
    new_value text,
    editor text,
    date text,
    record_name text,
    notes text
);

It is important that the editor input the record_id, field, and old value values exactly as they appear in KPDB, including any capitalization and special characters. KPDB field names and descriptions can be found here. Additionally, record_id + field must be unique within the Correction table. If there are instances where record_id + field are not unique, we will not ingest the data.

Types of Corrections

Below describes what types of corrections can be made via the corrections_main table, and gives an example of how to make each correction. If you'd like to make a correction that is not described, please reach out to Data Engineering.

Update the value of an existing record

A value is incorrect or missing and you know the correct information, so you'd like it to be reflected in the final output.

record_id	field	old value	new value	editor	date	record_name	Notes
P2015R0339	units_gross	200	180	baiyue	4/20/2020		some comments
P2015R0339	units_gross		180	baiyue	4/20/2020		filling in missing unit count

Remove a record

You do not want the record to be included in KPDB or any clustering logic.

Note: as of 2021, all zap add/removals should be recorded in corrections_zap instead of corrections_main

record_id	field	old value	new value	editor	date	record_name	Notes
12123424	remove			baiyue	4/20/2020	Stanley Avenue and Glen Avenue	some comments

Dependent variables

Sometimes the value in one field is dependent on another value. It's important to keep these dependencies in mind as you're updating values. For related fields that aren't automatically adjusted, you will want to also provide corrections to these fields so that they're in-sync.

Corrected fields	Fields affected downstream	Related fields that aren't automatically adjusted
Date		Phasing proportions
Status		Phasing proportions
Phasing proportions	Counts of units in each phase
Geometry	Assignment of record to a project (see note), borough
Gross units	Net units, counts of units in each phase

NOTE: If a geometry correction results in a new overlap between records, the default assumption is that these records are part of the same project. If you correct a geometry in a way that causes an overlap, but the record refers to a different project than the overlapping records, you will need to account for this exception in the corrections_project table. For example, let's say you modify geometry in a way that causes an overlap, but the record getting the new geometry is actually a stand-alone project. You would add a row to corrections_project where record_id is the record with the new geometry, action is reassign, and record_id_match is NULL.

Correcting geometries

If the spatial data for a record is incorrect or missing, you can correct or create the geometry by supplying a shapefile. The shapefile should have two components: record_id and corrected geometry.

There are several tools that you can use to create this shapefile, including: geojson.io, Carto, and ArcMap.

The geometry from this file will overwrite the geometry of the record with a matching record_id in the original file, regardless of if the geometry has changed in the source data between versions. If, after editing the geometry of a record, the record should NOT be assigned to the same project as overlapping records, do not forget to specify this exception in corrections_project.

Incorporating the research

The table corrections_main is just one of the five research tables. For complete descriptions of the others, see:

After the research is complete and the five tables are returned to Data Engineering, we will rerun the KPDB build-process with the new corrections data. This will:

Include any new planner-added projects or ZAP projects newly flagged as add in corrections_zap
Remove any records flagged as "remove" in either corrections_main or corrections_zap
Apply the new values and geometries from corrections_main and shapefile
Identify projects by finding overlapping non-DOB records, then making adjustments to the lists of overlapping records using corrections_project. In cases where the geometries got corrected in a way that creates overlaps, we reassign records to the same project as overlapping records unless otherwise specified in corrections_project
Identify overlaps between DOB records and non-DOB records, then use corrections_dob to assign these DOB records to projects. DOB records will get assigned to a project if a match is not flagged as remove in corrections_dob, or if a new match is specified as an add in corrections_dob.
Calculate borough field based on corrected geometries.
Run the units_net calculation based on corrected units_gross and corrected project assignments.
Overwrite units_net calculation using any records in corrections_main where field = 'units_net'
Calculate the phased units by multiplying the corrected phasing proportions with the corrected units net.

zap_record_ids List

The new workflow deprecated the existing method of adding and removing with corrections_zap table. Instead, giving the research team more control over the ZAP projects included in KPDB. zap_record_ids list is the exhaustive list for records included meaning any records not on the list would get dropped after the filtering logics are applied to it. Also based on current logics, any records dropped by the filtering logics would get added back if they are part of this list.

How to add to the list?

Housing team should receive a list of new ZAP projects added since the last KPDB update. This is currently done manually but could be automated. The filtering is based on the createdon date and should look back to a year normally since the last build. For example, if the 2021 version KPDB is built in November of 2021 and new build is happening in November of 2022 then the new records between November 2021 and November 2022 should be provided to Housing.

Planner added projects table schema & instructions

Planner-Added Projects

There are instances when a Planner knows about a future housing project that is not captured in any of the input sources. When this happens, a record is created within the Planner-Added Projects table. The layout of the Planner-Added Projects table is outlined below. It is important that the record_id is unique throughout the table. If there is a corresponding geometry in the geometry corrections shapefile, make sure that the geometry record and the Planner-Added Projects record have the same record_id so we can link them. Additionally, it is important that every relevant component is filled out for a record.

Only new records should be added to and edited within the Planner-Added Projects table. Planner-Added Project records that are already within KPDB should be considered as "locked in." Edits to these existing records should be done in the corrections_main table. Only new records in the Planner-Added Projects table will be appended to the final KPDB.

Field	Definition
Record ID	The unique ID of the record
Name	The name of the project as provided by the source
Borough	The borough the project is located within
Status	The current phase of the project
Units gross	The proposed number of units associated with the project as reported in the source data
Portion built within 5 years	The portion of the project should be completed within the next 5 years
Portion built within 5-10 years	The portion of the project that should be completed within the next 10 years
Portion built after 10 years	The portion of the project that should be completed sometime after 10 years from this year
Phasing assumption rational	The planner’s rationale for the timeline of a project
Phasing: Assumed or Known	Boolean. True if the phasing assumptions are known
NYCHA	Boolean. True if this is a NYCHA project
Group Quarters	Boolean. True if this is a Group Quarters project
Senior Housing	Boolean. True if this is a Senior Housing project
Assisted Living	Boolean. True if this is an Assisted Living project
Inactive	Flag to indicate if the project is inactive or not. If we do not include DOB inactive jobs we will exclude this field
Project ID	ID the groups records together into a single project
Omit	Boolean. True if this project should not be included in KPDB
Omission reason	Reason for omitting the record
Lead planner name	Name of the lead planner
Editor	Name of person who added the record into the planner added projects table
Planner notes	Additional notes from the planner
Date	Date the project was added

Manual Corrections Ingestion

Create an issue

First navigate the KPDB data repo main page Screenshot 2023-01-03 at 10 53 50 AM

Then click on the issue tab to see this page. Then click the "New Issue" Screenshot 2023-01-03 at 11 09 13 AM

Do not use the issue template instead use the "Open a blank issue" Screenshot 2023-01-03 at 11 10 03 AM

Give your issue an title and also some descriptions. Then click on "Submit new issue" Screenshot 2023-01-03 at 11 17 39 AM

you should see an new issue got created immediately after. One key information here is the #206 number which would be your issue number and also the number you should point the Pull Request to. Screenshot 2023-01-03 at 11 18 52 AM

Create a Pull Request

Once you have your issue created and issue number. Go back to the KPDB data repo home page. The home page should be automatically take you to the main branch which should be indicated right under the issue tab. It is automatic so it shouldn't require you to make a change but good to confirm. Then you can navigate to the corrections folder. Screenshot 2023-01-03 at 11 24 57 AM

Once you are inside the corrections folder, go to Add file dropdown and then choose the Upload files option. Screenshot 2023-01-03 at 11 27 54 AM

Once you choose your new corrections spreadsheet to be uploaded and making sure the name is exactly the same as the file it is going to be replaced. You can give it some commit messages. Then select the option to Create a new branch for this commit and start a pull request. We then highly recommend it to name the branch with the issue number (in this case 206) included. This would help connect the issue and Pull Request which is addressing the issue. When everything looks good to create the Pull Request click on Propose Changes and it would take you to the final creation page for Pull Request. Screenshot 2023-01-03 at 11 35 46 AM

Before finalizing the Pull Request, use the Reviewers functionality on the right to tag appropriate EDM member to review the changes which could be viewed as you scroll to the lower bottom of this page. Once they were tagged then click on Create Pull Request. Screenshot 2023-01-03 at 11 44 12 AM

Now you have a Pull Request created for the changes you want to make and EDM members should be notified by Github with the request to review. It should be noted at least for the early on of the process to not click on Merge Pull Request until at least one member of Data Engineering team approved the Pull Request. Screenshot 2023-01-03 at 11 46 51 AM

Build KPDB and Get Output on Github

You can build the KPDB without needing any softwares (or data) in your local environment by using the Github Actions.

the "Build" workflow

Once you are in the Github Actions page for KPDB. Then navigates to the Build workflow using the panel to the left which should bring you to the view below. Screenshot 2023-01-04 at 2 22 49 PM

Then use the dropdown menu to first choose the branch you want to run the production off. Then also it asked you whether to use the latest datasets (defaulted to "no") which includes any source data updates and manual corrections merged into main branch in the db-knownproject-data repo. Then last option asked you whether to send the output to the (defaulted to "yes") and we will talk about how to access this in the next section. Click Run workflow to kick off a production run. Screenshot 2023-01-04 at 2 24 29 PM

When a run is kicked off it would show up on the list of runs like below. You can click on the specific run to check on the status and details for each run.

Output in db-knownproject-data

Remember the db-knownproject-data is a private repo so you might need to request access for it.

If the option for sent the output to db-knownproject-data was set to yes in the last step, then a new batch of output would be sent to the db-knownproject-data. The output would be timestamped by the date and also hour the production run finished. Click on the output branch name you want to access. Screenshot 2023-01-04 at 2 47 41 PM

Phasing Assumptions

Source	Project type	Phasing provided/assumed	Proposed phasing 2020
DOB	Complete	Assumed	All counted as complete
DOB	Permit issued	Assumed	Next 5 years
DOB	Application filed	Assumed	Next 5 years
DOB	In progress/In progress (last disapproved)	Assumed	Next 5 years
DOB	Inactive (not withdrawn)	Assumed	5-10 years
DOB	Withdrawn	Assumed	exclude
HPD Projected Closings	n/a	Provided	2.5 years after financing close (assumption that financing closes around the same time a permit is issued)
HPD RFPs	n/a	Provided	Next 5 years
EDC	n/a	Provided	EDC-provided build year
DCP Application	All non-ULURP	Assumption	5-10 years
DCP Application	Status = on-hold	Assumption	5-10 years
DCP Application	Status = active, Pre-PAS	Assumption	5-10 years
DCP Application	Status = active, certified in or before 2019	Assumption	5-10 years
DCP Application	Status = active, certifying after 2019	Assumption	5-10 years
DCP Application	Status = active, Public review	Assumption	5-10 years
DCP Application	Status = complete	Assumption	5-10 years
DCP Application	Status = Record Closed	Assumption	exclude
Empire State Development Projects	n/a	n/a	One project: Atlantic Yards, provided by planner input
Neighborhood Study Rezoning Commitments	Downtown Far Rockaway/Inwood	Provided	Estimates provided by EDC
Neighborhood Study Rezoning Commitments	Jerome and Bay Street Corridor	Provided	Estimates provided by planner input
Neighborhood Study Rezoning Commitments	ENY and East Harlem	Assumption	Next 5 years (based on rezoing effective date)
Neighborhood Study Projected Development Sites	n/a	Assumption	Remaining units evenly distributed across each year of the estimated buildout timeline
Future Neighborhood Studies	n/a	Assumption	Units evenly distributed across each year of the estimated buildout timeline.

KPDB status normalization

Original status	Source	Proposed status
Active, Initiation	DCP Applications	DCP 1: Expression of interest
On-Hold, Initiation	DCP Applications	DCP 1: Expression of interest
Active, Pre-Cert	DCP Applications	DCP 2: Application in progress
Active, Pre-PAS	DCP Applications	DCP 2: Application in progress
On-Hold, Pre-Cert	DCP Applications	DCP 2: Application in progress
On-Hold, Pre-PAS	DCP Applications	DCP 2: Application in progress
Active, Certified	DCP Applications	DCP 3: Certified/Referred
Complete	DCP Applications	DCP 4: Zoning Implemented
1. Filed Application	DOB	DOB 1. Filed Application
2. Approved Application	DOB	DOB 2. Approved Application
3. Permitted for Construction	DOB	DOB 3. Permitted for Construction
4. Partially Completed Construction	DOB	DOB 4. Partially Completed Construction
5. Completed Construction	DOB	DOB 5. Completed Construction
Potential	DCP Planner-Added Projects	Potential
Projected	EDC Projected Projects	Potential
Projected	Empire State Development Projected Projects	Potential
Rezoning Commitment	Neighborhood Study Rezoning Commitments	Potential
Projected Development	Neighborhood Study Projected Development Sites	Potential
Future Rezoning	Future Neighborhood Studies	Potential
RFP issued; financing not closed	HPD RFPs	HPD 1: RFP Issued
RFP designated; financing not closed	HPD RFPs	HPD 2: RFP Designated
RFP designated; financing closed	HPD RFPs	HPD 4: Financing Closed
Projected	HPD Projected Closings	HPD 3: Projected Closing

SCA Geographies Join

Carto

the current implementation to perform SCA join is with a series sql queries run on with Carto Batch. This would require both the username for CP Carto account and also an API key.

credential and API key

the credential to logon to the CP division carto account is stored in the secrets file for DE.

Using that credential to logon to the portal to get the API key to run batch jobs.

longform_csd_output, Longform_es_zone_output, and longform_subdist_output_cp_assumptions_2021

Producing these three output are the next steps. The four ingredients required to run the sql queries are the final KPDB table, nyc_school_districts, doe_schoolsubdistricts, and doe_school_zones_es. Make sure those datasets are present on CP's Carto account and also they are updated to the latest versions.

Follow the documentation by Emily to produce the three csvs and qaqc these output by comparing them with previous versions using jupyter notebook.

2021_Known_Projects_DB_SCAGeographies_vF_20220214

the final deliverable to SCA is going to be created from the three tables from previous step. The way to generate this is with Excel formula and combined into a single xlsx file before sending to SCA.

The spreadsheets used to perform the calculations for the previous years are in this folder.

##READ ME for HED this tab in Emily's KPDB OneNote contains the most detailed step-by-step to do that calculation to output the final ES zone and MS Zone allocation.

Product: KPDB

Known Projects Database (KPDB)

Data Sources

Primary Data Sources

Secondary Data Sources

KPDB Schema

Description of tables in the knownprojects build environment

Source tables

Intermediate tables

Review tables

Corrections tables

Output tables

Corrections - Non DOB Records to Each Other

Overview:

File Review

Manual Review Methodology

Review Strategies

Instructions:

Expected corrections file format

Schema

How does it work?

Example of matched records (_project_record_ids table):

Reassign: removing a record from one group of matched records to another.

Combine: Merging two, distinct groups of matched records. This means that they refer to the same project.

Corrections - Project to DOB Matching

Overview:

The difference between project_has_dob_multi and dob_multimatch

Corrections table format and instructions

Expected corrections file format

Types of corrections

Corrections - Main Process

Table of Contents

About

Schema - corrections_main.csv

Types of Corrections

Update the value of an existing record

Remove a record

Dependent variables

Correcting geometries

Incorporating the research

zap_record_ids List

How to add to the list?

Planner added projects table schema & instructions

Planner-Added Projects

Manual Corrections Ingestion

Create an issue

Create a Pull Request

Build KPDB and Get Output on Github

the "Build" workflow

Output in db-knownproject-data

Phasing Assumptions

KPDB status normalization

SCA Geographies Join

Carto

credential and API key

longform_csd_output, Longform_es_zone_output, and longform_subdist_output_cp_assumptions_2021

2021_Known_Projects_DB_SCAGeographies_vF_20220214

Deprecated repos

Getting started

Code/Infrastructure

Data Products

Project Management/Developer Practices

Clone this wiki locally

Example of matched records (`_project_record_ids` table):

The difference between `project_has_dob_multi` and `dob_multimatch`

Schema - `corrections_main.csv`