Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
csv
 
 
pdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

nyc-capital-commitment-scrape

node.js scraping script for NYC OMB Capital Commitment Plans. The NYC Capital Commitment Plan is a detailed budget document that complements the Capital Budget, showing detailed sub-project committed costs and expected commitment dates. The Capital Commitment plan is published as a 4-part PDF, but machine-readable data at the commitment level is not published.

Disclaimer

I have not thoroughly QC'd the output csv in this repo, and cannot vouch for its accuracy. I recommend that you spot-check individual commitments with the source PDFs if you plan to use this dataset. Please open issues in this repo if you find discrepencies, or submit a pull request if you can help with the scraping code.

Get Data

October 2016 Capital Commitment Plan - Individual Commitments (csv) - 26,432 commitments, $84.3B

October 2016 Capital Commitment Plan - Grouped by Project ID (csv) - 9,207 Capital Projects

January 2017 Capital Commitment Plan - Individual Commitments (csv) - 29,616 commitments, $99.6B

January 2017 Capital Commitment Plan - Grouped by Project ID (csv) - 9,543 Capital Projects

April 2017 Capital Commitment Plan - Individual Commitments (csv) - 33,259 commitments, $105.8B

April 2017 Capital Commitment Plan - Grouped by Project ID (csv) - 9,983 Capital Projects

Agency Code Lookup (csv)

How to Use

Install dependencies npm install

Run scrape.js with a directory of capital commitment plan pdfs as an argument

For example, if you have capital commitment plan pdfs in /pdf/2017-Jan, run node scrape /pdf/2017-Jan.

The script will create a directory of the same name in /csv, with a new file called commitments.csv containing the data.

About

node.js scraping script for NYC OMB Capital Budget PDFs

Resources

License

Releases

No releases published

Packages

No packages published