Skip to content

abqsmartcasa/imr-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

imrscrape

This Python module allows scraping of IMR (Independent Monitoring Report) PDFs to extract CASA (Court Approved Settlement Agreement) paragraph compliance and page information into a tabular format.

imrscrape is available as an importable Python module and as a CLI tool.

Installation

clone this repo:

git clone https://github.com/apd-forward/imr-scrape

run setup.py

python setup.py

CLI usage

Example

imrscrape -i ./imr-8-final.pdf -o ./imr-8-data.csv

Available Commands

  • -i --input [filepath] (required)

    Takes the filepath to the PDF of the IMR to be scraped

  • -o --output [filepath] (required)

    Take the filepath to a csv for the results

  • -qa

    returns a QA/QC report of possible missing paragraphs to stdout

Development

This module is written using Python >3.7.0 syntax. Dependencies for development are managed with pipenv. Code is formatted with black.

Releases

No releases published

Packages

No packages published

Languages