Skip to content

CityOfNewYork/CROL-Overview

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

City Record Online Workgroup (CROW) - Parsing

This is the main repository containing efforts pertaining to the parsing efforts of CROW. For Notice Schema development, see https://github.com/CityOfNewYork/CROL-Schema.

Disclaimer. In case of conflicting document versions, please refer to documents mentioned in GitHub as the latest version.

Important Docs

###Open Standard Links

Community Links

About

As the City embarks on implementing Intro 363-2014 and unlocking its daily actions, we are working together with the Department of Citywide Services to publish the City Record as open, clean and structured data. At the same time, we are unlocking decades of historical information and making it accessible to all, at no charge.

Our goal is to optimize the utility of City Record content by making accessible and structuring the data; addresses, dates, persons, subjects, agencies, contract types and more are parsed and made available as individual objects. This way, residents, organizations and small and large businesses alike will be able to access, interact and stay informed, whether through notifications, visualizations or other easy-to-use community tools.

Project Partners

  • City of New York
  • BetaNYC
  • Commune
  • Citizens Union
  • Dev Bootcamp
  • Ontodia
  • Socrata
  • Sunlight Foundation

Achieved Milestones

  • Came together to form a CROW parsing and scraping volunteer team
  • Set up collaboration framework with DCAS
  • Scraped PDFs from 2008 - 2014
  • Proposed public notice schema
  • Added “addresses” and “time & dates” fields to the City’s input workflow

Tasks

For a list of current tasks, please see Issues.

Phase 1: Parsers and Schema

  • Develop a set of collaboratively produced open source library parsers to populate the Public Notice Data Standard schema using the DCAS pipeline

  • Work with DCAS to implement the pipeline into the City’s workflow by August 1, and use that as their way of publishing the City Record data

  • Publish a Public Notice Data Standard and documentation on an interactive website

Phase 2: PDF Scraping

  • Scrape the archival PDFs
  • Apply and modify the parsers to be able to parse and structure the data in the PDFs

Press Releases, Blog Posts, and News Articles

Blog Posts

Press Releases / News Articles

About

City Record Online parsing libraries and supporting files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published