Skip to content

RijunLiao/InvoiceNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This code provides models to extract intelligent information (company, address, date, total amount) from invoice documents based on natural language processing. The flamework mainly includes two steps. Firstly, text data extraction and processing by using text detector algorithm. And then recognizing the context by using recurrent neural network.

Pre-trained models

You can download the pre-trained models from Google Drive

Installation

Ubuntu 18.04

To install InvoiceNet on Ubuntu 18.04, run the following commands:

git clone https://github.com/RijunLiao/invoice.git
cd InvoiceNet/

# Run installation script
./install.sh

The install.sh script will install all the dependencies, create a virtual environment, and install InvoiceNet in the virtual environment.

To be able to use InvoiceNet, you need to source the virtual environment that the package was installed in.

# Source virtual environment
source env/bin/activate

Training

Prepare the data for training first by running the following command:

python prepare_data.py --data_dir train_data/

Train InvoiceNet using the following command:

python train.py --field enter-field-here --batch_size 8

# For example, for field 'total_amount'
python train.py --field total_amount --batch_size 8

Prediction


Single invoice

To extract a field from a single invoice file, run the following command:

python predict.py --field enter-field-here --invoice path-to-invoice-file

# For example, to extract field total_amount from an invoice file invoices/1.pdf
python predict.py --field total --invoice invoices/1.pdf # just predict the amount
python predict.py --field comany address total date --invoice invoices/1.pdf # predict the comany address total date at the same time

Multiple invoices

For extracting information using the trained InvoiceNet model, you just need to place the PDF invoice documents in one directory in the following format:

predict_data/
    invoice1.pdf
    invoice2.pdf
    ...

Run InvoiceNet using the following command:

python predict.py --field enter-field-here --data_dir predict_data/

# For example, for field 'total_amount'
python predict.py --field total --data_dir predict_data/  # just predict the amount
python predict.py --field comany address total date --data_dir predict_data/ # predict the omany address total date at the same time

Prediction Demo

Input invoice:

test

Result of total amount extraction:

Result of total amount extraction:

Reference

This implementation is largely based on the work of InvoiceNet

About

Deep neural network to extract intelligent information from invoice documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages