PFL-DocVQA Competition

This repository is intended to provide a base framework and method for the PFL-DocVQA Competition.

The objective of the Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition is to develop privacy-preserving solutions for fine-tuning multi-modal language models for document understanding on distributed data. We seek efficient federated learning solutions for finetuning a pre-trained generic Document Visual Question Answering (DocVQA) model on a new domain, that of invoice processing.

Automatically managing the information of document workflows is a core aspect of business intelligence and process automation. Reasoning over the information extracted from documents fuels subsequent decision-making processes that can directly affect humans, especially in sectors such as finance, legal or insurance. At the same time, documents tend to contain private information, restricting access to them during training. This common scenario requires training large-scale models over private and widely distributed data.

Please, if you plan to participate in the Competition, read the participation instructions carefully.

How to use

To set up and use the framework please check How to use instructions.

Dataset

The dataset is split into Blue and Red data. Moreover, the Blue training set is further divided into 10 different clients. The whole dataset comprises around 1M question-answer pairs on 109,727 document images from 6,574 unique providers. In this competition, we will only use a reasonable set of the full dataset. 251,810 question-answer pairs are available for training and validation, while 43,591 pairs will be used for testing. The rest of the dataset will be available after the competition period.

If you want to download the dataset, you can do so in the ELSA Benchmarks Competition platform. For this framework, you will need to download the IMDBs (which contains processed QAs and OCR) and the images. All the downloads must be performed through the RRC portal.

Dataset	Link
PFL-DocVQA	Link

PFL-DocVQA models weights

We provide pre-trained weights on SP-DocVQA dataset to allow the particiapnts start from a common starting point.

Model	Weights HF name	Parameters
VT5 base	rubentito/vt5-base-spdocvqa	316M

Metrics

Average Normalized Levenshtein Similarity (ANLS)
The standard metric for text-based VQA tasks (ST-VQA and DocVQA). It evaluates the method's reasoning capabilities while smoothly penalizes OCR recognition errors.
Check Scene Text Visual Question Answering for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
communication		communication
configs		configs
datasets		datasets
differential_privacy		differential_privacy
framework_documentation		framework_documentation
models		models
.gitignore		.gitignore
build_utils.py		build_utils.py
centralized_train.py		centralized_train.py
checkpoint.py		checkpoint.py
environment.yml		environment.yml
eval.py		eval.py
logger.py		logger.py
metrics.py		metrics.py
readme.md		readme.md
train.py		train.py
utils.py		utils.py
utils_parallel.py		utils_parallel.py

BlackSamorez/PFL-DocVQA-Competition

Folders and files

Latest commit

History

Repository files navigation

PFL-DocVQA Competition

How to use

Dataset

PFL-DocVQA models weights

Metrics

About

Resources

Stars

Watchers

Forks

Languages