Skip to content

aldnav/doeextractor

Repository files navigation

doeextractor

pre-commit

DOE Reports Extractor

Requirements

Tabula

Poppler via pdf2image

https://github.com/Belval/pdf2image#how-to-install

Amazon Textract

AWS Subscription (Access Key and Secret Key)

Features

  • Extract tables from PDF reports of DOE using Amazon Textract (Online, more accurate, may incur charges.)
  • Extract tables from PDF reports of DOE using Tabula (Offline, less accurate, free and open source.)

Usage

Available commands

$ doeextractor --help
Usage: doeextractor [OPTIONS] COMMAND [ARGS]...

Console script for doeextractor.

Options:
--help  Show this message and exit.

Commands:
extract          Extract tables from a PDF file using Amazon Textract
parse            Parse extracted tables from Amazon Textract
show-debug-info  Debug info for DOE Extractor
tabula-extract   Extract tables from a PDF file using Tabula
tabula-parse     Parse extracted tables from Tabula

Please check the documentation for more info

About

DOE Reports Extractor. Extract tables from PDF reports of DOE using Amazon Textract (Online) or Tabula (Offline)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published