City Record Online parsing libraries and supporting files
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Current DCAS Implementation
Planned Implementation
code
.gitignore
README.md

README.md

City Record Online Workgroup (CROW) - Parsing

This is the main repository containing efforts pertaining to the parsing efforts of CROW. For Notice Schema development, see https://github.com/CityOfNewYork/CROL-Schema.

Disclaimer. In case of conflicting document versions, please refer to documents mentioned in GitHub as the latest version.

Important Docs

###Open Standard Links

Community Links

About

As the City embarks on implementing Intro 363-2014 and unlocking its daily actions, we are working together with the Department of Citywide Services to publish the City Record as open, clean and structured data. At the same time, we are unlocking decades of historical information and making it accessible to all, at no charge.

Our goal is to optimize the utility of City Record content by making accessible and structuring the data; addresses, dates, persons, subjects, agencies, contract types and more are parsed and made available as individual objects. This way, residents, organizations and small and large businesses alike will be able to access, interact and stay informed, whether through notifications, visualizations or other easy-to-use community tools.

Project Partners

  • City of New York
  • BetaNYC
  • Commune
  • Citizens Union
  • Dev Bootcamp
  • Ontodia
  • Socrata
  • Sunlight Foundation

Achieved Milestones

  • Came together to form a CROW parsing and scraping volunteer team
  • Set up collaboration framework with DCAS
  • Scraped PDFs from 2008 - 2014
  • Proposed public notice schema
  • Added “addresses” and “time & dates” fields to the City’s input workflow

Tasks

For a list of current tasks, please see Issues.

Phase 1: Parsers and Schema

  • Develop a set of collaboratively produced open source library parsers to populate the Public Notice Data Standard schema using the DCAS pipeline

  • Work with DCAS to implement the pipeline into the City’s workflow by August 1, and use that as their way of publishing the City Record data

  • Publish a Public Notice Data Standard and documentation on an interactive website

Phase 2: PDF Scraping

  • Scrape the archival PDFs
  • Apply and modify the parsers to be able to parse and structure the data in the PDFs

Press Releases, Blog Posts, and News Articles

Blog Posts

Press Releases / News Articles