Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

2023 data update for schools data #153

Merged
merged 31 commits into from
May 1, 2023
Merged

2023 data update for schools data #153

merged 31 commits into from
May 1, 2023

Conversation

damonmcc
Copy link
Member

@damonmcc damonmcc commented Apr 20, 2023

resolves #152

changes

  • modify build scripts to use latest data
  • for build scripts that had issues, pass data via csv files rather than stdin piping
  • simplify github actions to delete an unused one and temporarily disable the OSE review portions of the build action
  • modify all relevant bash scripts to raise bash and sql errors so things stop when they fail
  • improve flag names in bash scripts to be more descriptive
  • remove unused envar CEQR_DATA from example.env

outputs

  • Digital ocean outputs are in edm-publishing/ceqr-app-data-staging
  • All outputs also exist in our persistent database in their own schemas (e.g. edm-data.sca_capacity_projects)

notes

  • relies on having used new data library templates and postgres archiving code added in modify postgres archiving and add ceqr school template db-data-library#388
    • new source data was given to us as excel files in a zip file
    • seems most runner.sh build scripts expect input data to be versioned in schemas in edm-data.recipe database
    • best way to go from excel to recipes DB seemed to be via data library with modifications (example in test_archive.py)
  • source data which was updated:
    • sca_e_projections_by_sd
    • sca_e_projections_by_boro
    • sca_capacity_projects (with subd)
  • OK to exclude OSE data review from this update
  • OK to push to EDM publishing. there are two folders called ceqr-app-data and ceqr-app-data-staging and the former is only updated via the Publish cli function from config.sh

@damonmcc damonmcc added the data update Related to a data product update label Apr 20, 2023
@damonmcc damonmcc self-assigned this Apr 20, 2023
@damonmcc damonmcc marked this pull request as ready for review April 25, 2023 14:17
@mbh329
Copy link

mbh329 commented Apr 25, 2023

@damonmcc I don't know how sensitive the data is but we might not want to have links to it, I am able to access it via edm-data. Not sure if anyone can access it but just to be safe

Copy link

@mbh329 mbh329 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me and I gandered at the outputs and they seem to be make sense to me. As an example, I looked at the borough projections and it looks like (generally) the capacity has peaked or is going to be peaking within the next few years and then dropping as we reach the end of the decade. The exception to this is Staten Island which stays relatively flat - that makes sense because, historically, there hasn't been a lot of development in Staten Island that would prompt a significant increase in school capacity. The data we have suggests that the school capacity is more dynamic in the other boroughs. Is that what everyone else is taking away from this?

@mbh329
Copy link

mbh329 commented Apr 25, 2023

I guess that's for the SME's to figure out 🖖

@damonmcc damonmcc merged commit 5eba13a into main May 1, 2023
1 check passed
@damonmcc damonmcc deleted the dm-2023-update branch May 1, 2023 19:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
data update Related to a data product update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update CEQR schools data tables
2 participants