Skip to content
@TIFScrapingOrg

TIF data scraping and digitization team

This organization is set up to handle the scraping, digitization, and online maintenence

This Repo is part of the TIFrific project for IPRO 497 at the Illinois Institute of Technology

Logo

Improving Chicago’s Fiscal Transparency By Extracting Historical Financial Data Through a Machine Learning Pipeline

Since 1986, approximately $25 billion dollars of property tax revenue collected from Chicago property owners has been diverted into Chicago's district-based Tax Increment Financing (TIF) program [1]. TIFs currently cover roughly 33% of the area, mi^2, in the City of Chicago [2]. Transparency is imperative for good governance and civic involvement in decisions regarding the TIF funds considering their property taxes could be used towards potential projects which provide little community support. The lack of data has caused TIFs to face scrutiny over concerns regarding the transparency of investment allocation. Prior to 2010, TIF district data had been archived using digital scans rather than direct computer entry, reducing accessibility for analytical purposes. Although an existing dataset has been made by Phillip Yates. Our project outlines an algorithmic approach to extracting, organizing, and hosting Chicago TIF records dating back to 1997. We developed an automated “document to data” pipeline by leveraging Optical Character Recognition (OCR), Machine Learning (ML), and parsing algorithms technologies to perform direct data extraction from the scanned documents. We are currently hosting the existing Chicago TIF data via AWS RDS, allowing for seamlessly integrating newly extracted data, ensuring comprehensive and up-to-date records. Through this work, we hope to improve transparency, accountability, and informed civic engagement in Chicago, thereby supporting equitable urban growth.

Popular repositories Loading

  1. DataTools DataTools Public

    Python

  2. DataScraping DataScraping Public

    This repo contains programs responsible for downloading and reading the text from old Chicago TIF documents

    Jupyter Notebook

  3. DataOrganization DataOrganization Public

    This repo is meant to transform the read .txts from the OCR into .csv we can put into an accessible database

    Jupyter Notebook

  4. WebHosting WebHosting Public

    This is a series of tools to manage hosting the S3 buckets and RDS service.

    Python

  5. .github .github Public

Repositories

Showing 5 of 5 repositories
  • DataScraping Public

    This repo contains programs responsible for downloading and reading the text from old Chicago TIF documents

    TIFScrapingOrg/DataScraping’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jul 11, 2024
  • .github Public
    TIFScrapingOrg/.github’s past year of commit activity
    0 0 0 0 Updated May 6, 2024
  • TIFScrapingOrg/DataTools’s past year of commit activity
    Python 0 0 0 0 Updated Apr 18, 2024
  • DataOrganization Public

    This repo is meant to transform the read .txts from the OCR into .csv we can put into an accessible database

    TIFScrapingOrg/DataOrganization’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Mar 18, 2024
  • WebHosting Public

    This is a series of tools to manage hosting the S3 buckets and RDS service.

    TIFScrapingOrg/WebHosting’s past year of commit activity
    Python 0 0 0 0 Updated Feb 21, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…