Skip to content
Python library to extract tabular data from images and scanned PDFs
Python Jupyter Notebook
Branch: master
Clone or download
Latest commit 4d05317 Feb 9, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/ISSUE_TEMPLATE Update issue templates Oct 10, 2019
ExtractTable Asc col ind fix (#17) Dec 1, 2019
samples Add files via upload (#25) Dec 26, 2019
tests v1.2.0 (#9) Oct 20, 2019
.gitignore Read me installation (#3) Oct 7, 2019
LICENSE Initial commit Oct 6, 2019
README.md Asc col ind fix (#17) Dec 1, 2019
_config.yml Set theme jekyll-theme-cayman Feb 9, 2020
example-code.ipynb v1.2.0 (#9) Oct 20, 2019
requirements.txt v1.2.0 (#9) Oct 20, 2019
setup.py V1.0.1 (#2) Oct 7, 2019

README.md

image

image image image

Overview

ExtractTable - API to extract tabular data from images and scanned PDFs

The motivation is to make it easy for developers to extract tabular data from images or scanned PDF files without worrying about the table area, column coordinates, rotation et al.

Prerequisite

Before we talk/boast about the service, a developer MUST need an API key to use the ExtractTable service. FREE credits here - check data privacy in FAQ.

Installation

pip install -U ExtractTable

Basic Usage

Ok, enough selling. Let the ease in coding do the talk, and the output encourages you to buy credits - put that timer on and count the LOC.

from ExtractTable import *
et_sess = ExtractTable(api_key=YOUR_API_KEY)        # Replace your VALID API Key here
print(et_sess.check_usage())        # Checks the API Key validity as well as shows associated plan usage 
table_data = et_sess.process_file(filepath=Location_of_Image_with_Tables, output_format="df")

# To process PDF, make use of pages ("1", "1,3-4", "all") params in the read_pdf function
table_data = et_sess.process_file(filepath=Location_of_PDF_with_Tables, output_format="df", pages="all")

Detail Code Here

Woahh, as simple as that ?!

Certainly. Do you know the current ExtractTable users use it on

  • Bank Statement
  • Medical Records
  • Invoice Details
  • Tax forms

Its up to you now to explore the ways.

Explore

Whatelse is in the store.

  • ExtractTable._OUTPUT - check the list of available output formats
  • et_sess.ServerResponse.json() - check the latest Actual ServerResponse attached to the session

Pull Requests & Rewards

Pull requests are most welcome and greatly appreciated with API credits.

License

This project is licensed under the Apache License 2.0, see the LICENSE file for details.

Social Media

Follow us on Social media for library updates and free credits.

Image      Image

You can’t perform that action at this time.