nyc-capital-commitment-scrape

node.js scraping script for NYC OMB Capital Commitment Plans. The NYC Capital Commitment Plan is a detailed budget document that complements the Capital Budget, showing detailed sub-project committed costs and expected commitment dates. The Capital Commitment plan is published as a 4-part PDF, but machine-readable data at the commitment level is not published.

Disclaimer

I have not thoroughly QC'd the output csv in this repo, and cannot vouch for its accuracy. I recommend that you spot-check individual commitments with the source PDFs if you plan to use this dataset. Please open issues in this repo if you find discrepencies, or submit a pull request if you can help with the scraping code.

Get Data

October 2016 Capital Commitment Plan - Individual Commitments (csv) - 26,432 commitments, $84.3B

October 2016 Capital Commitment Plan - Grouped by Project ID (csv) - 9,207 Capital Projects

January 2017 Capital Commitment Plan - Individual Commitments (csv) - 29,616 commitments, $99.6B

January 2017 Capital Commitment Plan - Grouped by Project ID (csv) - 9,543 Capital Projects

April 2017 Capital Commitment Plan - Individual Commitments (csv) - 33,259 commitments, $105.8B

April 2017 Capital Commitment Plan - Grouped by Project ID (csv) - 9,983 Capital Projects

Agency Code Lookup (csv)

How to Use

Install dependencies npm install

Run scrape.js with a directory of capital commitment plan pdfs as an argument

For example, if you have capital commitment plan pdfs in /pdf/2017-Jan, run node scrape /pdf/2017-Jan.

The script will create a directory of the same name in /csv, with a new file called commitments.csv containing the data.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
csv		csv
pdf		pdf
.eslintrc		.eslintrc
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
package.json		package.json
scrape.js		scrape.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv

csv

pdf

pdf

.eslintrc

.eslintrc

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

package.json

package.json

scrape.js

scrape.js

Repository files navigation

nyc-capital-commitment-scrape

Disclaimer

Get Data

How to Use

About

Releases

Packages

Languages

License

chriswhong/nyc-capital-commitment-scrape-old

Folders and files

Latest commit

History

Repository files navigation

nyc-capital-commitment-scrape

Disclaimer

Get Data

How to Use

About

Resources

License

Stars

Watchers

Forks

Languages