Skip to content
This repository has been archived by the owner on Oct 10, 2023. It is now read-only.

Overview of Build Process

fvankrieken edited this page Mar 13, 2023 · 4 revisions

Building CPDB

Creating table views

Create the table views for the three different reporting levels: Commitment, Project, Budget

Creating the DCP attributes table

This is where DCP adds value to the information derived from the Capital Commitment Plan

  1. Build the table

  2. Categorize each project into one of three categories based on keywords found in the project description

    • ITT, Vehicles, and Equipment: projects that fund assets not tied to a fixed location, such as ITT, vehicles, or other equipment assets
    • Lump Sum: project lines that may be drawn down upon to create / fund new discrete projects
    • Fixed Asset: projects associated with one or many discrete locations
    • Unknown: if a project cannot be classified into one of the three categories above based on keywords in the project's description then the type category is left blank

Projects are categorized in the order above, meaning that if a keyword is found in a project's description that classifies that project as 'ITT, Vehicles, and Equipment' it cannot later be classified as 'Lump Sum' or 'Fixed Asset,' even if the project's description has other keywords that could classify it into 'Lump Sum' or 'Fixed Asset.' Example project descriptions and their classifications:

Description Type category
Urban Resource Institute: DV Shelter Vans ITT, Vehicles, and Equipment
DATA ANALYTICS SYSTEM ITT, Vehicles, and Equipment
SURVEYS FOR SEWERS IN BRONX Lump Sum
SCA26 Lump Sum
MTC - Renovation of Friedman Theater Fixed Asset
Storm Swr ext 119 ave b/t 192 & 195 st, QNS. Fixed Asset

The primary purpose of these classifications is to help prioritize any manual mapping efforts. Reports should not be made off of these classifications. The list of keywords used to classify projects could be refined.

  1. Append geometries in the following order
  • Geometries that were created manually

    • By bbl (attributes/sprints.sql)
    • By geojson files (attributes/json2sql.py)
    • By bin (attributes/geom_from_id_bin_map.sql)
  • Spatial data from agencies These may overwrite existing geometries

    • DOT
    • DPR
    • EDC
    • DDC
    • Summer 2017 agency verified data
  • Fuzzy string matching

    • We used DCP's Facilities Database and DPR's Parks Properties as base datasets to match project descriptions with either the facility or park name, and then assign the geometry from the reference dataset to the Capital project. None of these geometries can overwrite any existing geometries.
  • Clean geometries

    • Transform lines to polygons and make all geoms multi
  • Remove faulty geometries

    • DPC identified a geometry as incorrect and added it to this list: 'capitalprojects_build/cpdb_geomsremove.csv'
    • An agency indicated that the project cannot be mapped now or ever

Create administrative boundaries relational table

A series of spatial joins are run generating the 'cpdb_adminbounds' table.

Export data

Exports two shapefiles and five csvs

Loading data onto Carto

When you're ready to update the data driving the Capital Projects Explorer load the following tables into the Carto production server (download via DigitalOcean):

  1. cpdb_projects
  2. cpdb_commitments
  3. cpdb_budgets
  4. cpdb_dcpattributes_pts (the .shp.zip extension file)
  5. cpdb_dcpattributes_poly (the .shp.zip extension file)
  6. cpdb_projects_combined
  7. cpdb_adminbounds

Once the latest cpdb output is uploaded to Carto, rename the files with the suffix corresponding to the release cycle (i.e. _22executive). Next copy the script from db-cpdb/sql/cpdb_projects_combined.sql and run the script directly in carto (best practice is to run this via cpdb_projects - making sure to update the table names with the appropriate suffix for this release). Save the output of the script as a new table named cpdb_projects_combined (overwriting the existing table) - to save the script press the 3 dots on the upper left hand corner and press create dataset from query. Additional steps will be needed to grant appropriate accounts with the proper read/write access to these files (speak with @amandadoyle or @mbh329 for clarification).

Inform digital services of updated data so that they can update the data within the Capital Projects Explorer.

Spending data

Each night at midnight the latest Capital spending data is pulled from Checkbook NYC

Analysis

5 analyses that used by the business owners for reporting as well as QA/QC.

The analyses are outputted into 'analysis/output/'

Share the following tables with CPDB's business owners:

  • cpdb_summarystats_magency.csv
    • The number of projects and total planned commitments that are mapped in each type category by managing agency
  • cpdb_summarystats_sagency.csv
    • The number of projects and total planned commitments that are mapped in each type category by sponsor agency
  • projects_by_communitydistrict_commitments.csv
    • Projects and each of the community districts that the project intersects, along with the apportioned commitments associated with the part of the project within that community district.
  • projects_by_communitydistrict_spending.csv
    • Projects and each of the community districts that the project intersects, along with the apportioned commitment and spending associated with the part of the project within that community district.
  • projects_by_dprcdboundaries_spending_2014.csv
    • Projects and each of the community districts that the project intersects, along with the apportioned commitment and spending associated with the part of the project within that community district. Only includes check written beginning in 2014.
  • agency_validated_geoms_summary_table.csv
    • Summary stats of the Summer 2017 exercise, which required agencies to validate existing spatial data and make new spatial data where possible.

Making spatial data

If it's determined that more spatial data should be generated manually for 'mappable' projects within CPDB geometries can be generated by:

  1. geojson.io See Amanda Doyle to determine the best way to incorporate data from this tool into CPDB

  2. simple-geom-editor Right now, this tool can only be used to create point or polygon geometries for single site projects

    • Copy the FMS ID (maprojid) for the project you want to map
    • Paste the FMS ID (maprojid) into the small box at the top of the page
    • Use the tools to draw the desired geometry
    • A text sting will appear in the large box
    • Copy the text string (which is an echo command) and paste the command into the terminal to generate a new .json file within the folder 'capitalprojects_build/attributes/geometries'
  3. id_bin_map If the project is taking place within a single building and you know the bin of the building, then add the FMS ID (maprojid) and the associated bin to capitalprojects_build/attributes/id_bin_map.csv