Multi Modal Page Stream Segmentation

Implementation of Multi-Modal Page Stream Segmentation with CNN Networks & Transformers as a Python Service

Additional Resources

model_training contains jupyter notebooks to train the CNN models
document_stream_builder contains a script to build document streams from pdf files

Requirements

Needed Installations

Poppler (PDF Rendering Library)
brew install poppler
CMake brew install cmake
tesseract brew install tesseract
python
poetry (Dependency Management)

Fasttext Word Vectors

Download english fasttext wiki word vectors under this link and put wiki.en.bin in ./app/pss/models/.

Install Python Dependencies

poetry install

Start Uvicorn Server

uvicorn app.main:app --reload

Routes

OpenAPI Documentation http://localhost:8000/openapi.json

SwaggerUI Instance http://localhost:8000/docs

For all routes the model_type can always be single_page or prev_page.

single_page selects the model, that was trained only with the current page as input data.

prev_page selects the model, that was trained with a pair of consecutive pages as input data.

The PDF Documents for all routes need to be sent at the request body as form-data with the key being file

Text Only Processing

POST localhost:8000/pss/textModel/{model_type}/processDocument/

Process PDF Documents with models that only consider the text data.

Model Performance

Model	Accuracy	Kappa
single_page	0,826255	0,627790
prev_page	0,830116	0,641725

Text Only Processing with Transformers

POST localhost:8000/pss/bertTextModel/processDocument/

Process PDF Documents with a transformer bert model, that only considers the text data.

Model	Accuracy	Kappa
bert-based-uncased	0,915058	0,824828

Image Only Processing

POST localhost:8000/pss/imageModel/{model_type}/processDocument/

Process PDF Documents with models that only consider the image data.

Model Performance

Model	Accuracy	Kappa
single_page	0,926641	0,847236
prev_page	0,934363	0,863316

Combined Multi-Modal Processing

POST localhost:8000/pss/combinedModels/{text_model_type}/{image_model_type}/processDocument/

Process PDF Documents with text and image models and combine the output for a multi-modal PSS prediction.

Text Model	Image Model	Accuracy	Kappa
single_page	single_page	0,926641	0,847236
single_page	prev_page	0,942085	0,879059
prev_page	single_page	0,918919	0,830682
prev_page	prev_page	0,938224	0,871176

Combined Multi-Modal Processing with Transformers (Text)

POST localhost:8000/pss/combinedModelsBert/{image_model_type}/processDocument/

Process PDF Documents with bert text and image models and combine the output for a multi-modal PSS prediction.

The bert based model is hosted at the huggingface model repository

Text Model	Image Model	Accuracy	Kappa
bert-based-uncased	single_page	0,926641	0,850575
bert-based-uncased	prev_page	0,934363	0,866669

Notice

This repository builds onto the works of Wiedemann & Heyer 2019:

Wiedemann, G., Heyer, G. Multi-modal page stream segmentation with convolutional neural networks. Lang Resources & Evaluation (2019). https://doi.org/10.1007/s10579-019-09476-2

The Model Training was performed with the Tobacco800 Dataset: (Model Performance was measured with a test subset)

David Doermann, Tobacco 800 Dataset (Tobacco800) http://tc11.cvc.uab.es/datasets/Tobacco800_1

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
app		app
document_stream_builder		document_stream_builder
model_training		model_training
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENCE.md		LICENCE.md
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

document_stream_builder

document_stream_builder

model_training

model_training

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENCE.md

LICENCE.md

README.md

README.md

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

Multi Modal Page Stream Segmentation

Requirements

Needed Installations

Fasttext Word Vectors

Install Python Dependencies

Start Uvicorn Server

Routes

Text Only Processing

Text Only Processing with Transformers

Image Only Processing

Combined Multi-Modal Processing

Combined Multi-Modal Processing with Transformers (Text)

Notice

About

Releases 1

Languages

License

agiagoulas/page-stream-segmentation

Folders and files

Latest commit

History

Repository files navigation

Multi Modal Page Stream Segmentation

Requirements

Needed Installations

Fasttext Word Vectors

Install Python Dependencies

Start Uvicorn Server

Routes

Text Only Processing

Text Only Processing with Transformers

Image Only Processing

Combined Multi-Modal Processing

Combined Multi-Modal Processing with Transformers (Text)

Notice

About

Topics

Resources

License

Stars

Watchers

Forks

Languages