Skip to content

Own implementation of a complete pipeline for compound figure separation.

License

Notifications You must be signed in to change notification settings

GaetanLepage/compound-figure-separator

Repository files navigation

CompFigSep

Implementation of a complete pipeline for compound figures separation.
The code for the panel segmentation task is heavily inspired from Zou & al. work (paper and implementation).

Objective

Compound figures are numerous in scientific publications. They consist in figures containing multiple (more or less related) sub figures. In the context of medical scientific publications, compound figures account for a significant amount of visual data. To exploit the information from those compound figures, they need to be segmented in several sub figures as independent as possible.

The compound figure separation task is composed of several subtasks:

  • Panel segmentation
    • Panel splitting
    • Label recognition
  • Caption splitting

How to use

In order to be sure to fulfill the software requirements, it is best to work within a Python virtual environment.

# Create the virtual environment.
python3 -m venv venv

# Activate it.
. venv/bin/activate

# Make sure pip is up to date.
pip install --upgrade pip

# Install pytorch first.
pip install torch

# Install the required packages.
pip install -r requirements.txt

# Download the requirements for nltk.
python -c "import nltk; nltk.download('punkt')"

It is possible to follow training using TensorBoard

tensorboard --logdir=compfigsep/<TASK_NAME>/output/ [--bind_all]

Implementation details

Pipeline

Modules

  • data

The data module contains function dealing with the various data sources. Among other things, one can preview, load and export the different data sets.

  • utils

In utils, several functions are here to handle miscellaneous tasks.

*   `utils.detectron_utils`
*   `utils.figure`
  • panel_splitting
  • label_recognition
  • panel_segmentation
  • caption_splitting

Data sets

Different data sets are involved in this project.

Learn more by reading this README.md.

Contact

I have been realizing this project from April to August 2020 within the Medgift team from HES-SO for my Masters project. I worked under the supervision of Henning Müller and Manfredo Atzori.

Niccolò Marini and Stefano Marchesin also offered an helpful contribution.

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825292. This project is better known as the ExaMode project. The objectives of the ExaMode project are:

  1. Weakly-supervised knowledge discovery for exascale medical data.
  2. Develop extreme scale analytic tools for heterogeneous exascale multimodal and multimedia data.
  3. Healthcare & industry decision-making adoption of extreme-scale analysis and prediction tools.

For more information on the ExaMode project, please visit www.examode.eu.

About

Own implementation of a complete pipeline for compound figure separation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published