Skip to content

chriswhong/nyc-capital-commitment-scrape-old

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nyc-capital-commitment-scrape

node.js scraping script for NYC OMB Capital Commitment Plans. The NYC Capital Commitment Plan is a detailed budget document that complements the Capital Budget, showing detailed sub-project committed costs and expected commitment dates. The Capital Commitment plan is published as a 4-part PDF, but machine-readable data at the commitment level is not published.

Disclaimer

I have not thoroughly QC'd the output csv in this repo, and cannot vouch for its accuracy. I recommend that you spot-check individual commitments with the source PDFs if you plan to use this dataset. Please open issues in this repo if you find discrepencies, or submit a pull request if you can help with the scraping code.

Get Data

October 2016 Capital Commitment Plan - Individual Commitments (csv) - 26,432 commitments, $84.3B

October 2016 Capital Commitment Plan - Grouped by Project ID (csv) - 9,207 Capital Projects

January 2017 Capital Commitment Plan - Individual Commitments (csv) - 29,616 commitments, $99.6B

January 2017 Capital Commitment Plan - Grouped by Project ID (csv) - 9,543 Capital Projects

April 2017 Capital Commitment Plan - Individual Commitments (csv) - 33,259 commitments, $105.8B

April 2017 Capital Commitment Plan - Grouped by Project ID (csv) - 9,983 Capital Projects

Agency Code Lookup (csv)

How to Use

Install dependencies npm install

Run scrape.js with a directory of capital commitment plan pdfs as an argument

For example, if you have capital commitment plan pdfs in /pdf/2017-Jan, run node scrape /pdf/2017-Jan.

The script will create a directory of the same name in /csv, with a new file called commitments.csv containing the data.

About

node.js scraping script for NYC OMB Capital Budget PDFs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published