This repository contains the official implementation of the paper titled Multimodal weighted graph representation for information extraction from visually rich documents.
The paper introduces a novel system for information extraction from visually rich documents (VRD) using a weighted graph representation. It aims to enhance the performance of information extraction tasks by capturing relationships between various VRD components. VRD is modeled as a weighted graph, encoding visual, textual, and spatial features of text regions as nodes and edges representing relationships between neighboring text regions. Information extraction from VRD is treated as a node classification task using graph convolutional networks. The approach is evaluated across diverse documents, including invoices and receipts, achieving performance levels equal to or exceeding robust baselines.
To build a graph-based dataset, use the following command:
$ python graph_builder.py -h
This command creates a graph-based dataset for node classification for a specific dataset to extract entities from Visually Rich Documents.
Optional Arguments:
-d DATASET, --dataset DATASET
: Choose the dataset to use. Options areFUNSD
,SROIE
,Wildreceipt
orCORD
.-t TRAIN, --train TRAIN
: Boolean to choose between the train or test dataset.
Example:
$ python graph_builder.py -d FUNSD -t True
To train the model, use the following command:
$ python train.py -h
This command trains the model on a selected dataset for node classification.
Arguments:
-d DATANAME, --dataname DATANAME
: Select the dataset for model training. Options areFUNSD
,SROIE
,Wildreceipt
orCORD
.-p PATH, --path PATH
: Select the dataset path for model training.-hs HIDDEN_SIZE, --hidden_size HIDDEN_SIZE
: GCN hidden size.-hl HIDDEN_LAYERS, --hidden_layers HIDDEN_LAYERS
: Number of GCN hidden layers.-lr LEARNING_RATE, --learning_rate LEARNING_RATE
: The learning rate.-e EPOCHS, --epochs EPOCHS
: The number of epochs.
Example:
$ python train.py -d FUNSD -p data/ -hs 16 -hl 10 -lr 0.01 -e 50
We acknowledge the contributions of the authors of the paper and the developers of the libraries used in this project.