Skip to content
This repository has been archived by the owner on Jun 30, 2023. It is now read-only.

Housing New York workflow

Te Du edited this page Dec 28, 2022 · 15 revisions

Overview

Goal

Match records from hpd_hny_units_by_building with the DOB records in developments, in order to merge on columns related to the number of affordable units.

Challenge

  • The relationship between hpd_hny_units_by_project records and developments records are not one-to-one
  • There are multiple ways to match records
    • BIN & BBL
    • BBL
    • Spatially

General outline of steps

  1. Geocode hny data
  2. Find matches using the three different methods, excluding confidential hny projects and demolitions. All match methods also have the constraint that the total units in the hny record cannot be more than 5 units different than the units proposed for the developments record.
  3. Identify cases where the matches are not one-to-one
  4. Resolve ambiguous matches
    • Using logic:
      • Matches with residential new buildings take precedence over matches with non-residential or A1
      • There is a hierarchy of match type: BIN & BBL, then BBL, then spatial
      • An associative join is performed where in a cluster of HNY and DevDB records every single HNY records will be matched with the other records in the cluster
      • Identify records both HNY and DevDB that are also matched with multiple counterpart records and create flags.

Two output tables for HNY and DevDB join

  1. DevDB_hny_lookup
    • Even after applying this hierarchy, not all matches are one-to-one, and so we need to resolve this.
      • In cases where one development record matches with many hny records, affordable units get aggregated
      • In cases where one hny record matches with many developments records, the units get assigned to the DOB job that has the lowest job number
      • Resolving many-to-one and one-to-many matches using the above two techniques also resolves many-to-many matches
  2. HNY_devdb_lookup
    • this is the table where HNY attributes can be then linked back to DevDB records via job number. It relies on the matches and relationship tables created in step 4 but resolve the relationship with somewhat different logics. Please note the descriptor below is always DevDB first and HNY second e.g. one-to-many means one DevDB records is identified as associated with many HNY records.
      • one-to-one: no need for resolution
      • one-to-many: the HNY ids will be joined together as an array and their units would be summed up and earliest dates will be set to the whole group e.g. project_start_date.
      • many-to-one: the DevDB record with the least job number will be picked to be associated with HNY record
      • many-to-many: first combine the HNY ids by grouping by DevDB record to create the HNY ids array. Then since the same array of HNY ids would be associated with multiple different DevDB record, pick the DevDB with the least job number to be associated with the entire group of HNY ids.

What is the roll of manual research?

  1. Add matches that didn't exist
  2. Remove matches

Corrections process:

We output matches we find automatically, prior to the automatic treatment of one-to-many and many-to-one cases.

job_number hny_id hny_project_id all_counted_units total_units ...
1234567 abcdef a 5 10 ...

We also automatically populate a table with all HNY records that did not match to a DOB record.

Corrections would involve filling in the action column, and adding rows if necessary.

job_number hny_id hny_project_id action
1234567 abcdef a remove
1234567 ghijklm g add
7654321 ghijklm g remove
1234567 nopqrst n add

We apply these changes to the matches file, then use the logic above to

  1. Sum up all hny records that match with a single developments record
  2. Find the developments record with the smallest job_number to assign to a hny record