CompFigSep

Implementation of a complete pipeline for compound figures separation.
The code for the panel segmentation task is heavily inspired from Zou & al. work (paper and implementation).

Objective

Compound figures are numerous in scientific publications. They consist in figures containing multiple (more or less related) sub figures. In the context of medical scientific publications, compound figures account for a significant amount of visual data. To exploit the information from those compound figures, they need to be segmented in several sub figures as independent as possible.

The compound figure separation task is composed of several subtasks:

Panel segmentation
- Panel splitting
- Label recognition
Caption splitting

How to use

In order to be sure to fulfill the software requirements, it is best to work within a Python virtual environment.

# Create the virtual environment.
python3 -m venv venv

# Activate it.
. venv/bin/activate

# Make sure pip is up to date.
pip install --upgrade pip

# Install pytorch first.
pip install torch

# Install the required packages.
pip install -r requirements.txt

# Download the requirements for nltk.
python -c "import nltk; nltk.download('punkt')"

It is possible to follow training using TensorBoard

tensorboard --logdir=compfigsep/<TASK_NAME>/output/ [--bind_all]

Implementation details

Pipeline

Modules

data

The data module contains function dealing with the various data sources. Among other things, one can preview, load and export the different data sets.

utils

In utils, several functions are here to handle miscellaneous tasks.

*   `utils.detectron_utils`
*   `utils.figure`

panel_splitting
label_recognition
panel_segmentation
caption_splitting

Data sets

Different data sets are involved in this project.

Learn more by reading this README.md.

Contact

I have been realizing this project from April to August 2020 within the Medgift team from HES-SO for my Masters project. I worked under the supervision of Henning Müller and Manfredo Atzori.

Niccolò Marini and Stefano Marchesin also offered an helpful contribution.

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825292. This project is better known as the ExaMode project. The objectives of the ExaMode project are:

Weakly-supervised knowledge discovery for exascale medical data.
Develop extreme scale analytic tools for heterogeneous exascale multimodal and multimedia data.
Healthcare & industry decision-making adoption of extreme-scale analysis and prediction tools.

For more information on the ExaMode project, please visit www.examode.eu.

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
.remi		.remi
compfigsep		compfigsep
data		data
doc		doc
weights		weights
.envrc		.envrc
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
conda_env.yaml		conda_env.yaml
container.def		container.def
mypy.ini		mypy.ini
pylintrc		pylintrc
requirements.txt		requirements.txt
script.sh		script.sh
shell.nix		shell.nix
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CompFigSep

Objective

How to use

Implementation details

Pipeline

Modules

Data sets

Contact

Acknowledgement

About

Releases

Packages

Languages

License

GaetanLepage/compound-figure-separator

Folders and files

Latest commit

History

Repository files navigation

CompFigSep

Objective

How to use

Implementation details

Pipeline

Modules

Data sets

Contact

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages