plazi-multipanel-figures

Subdivide multipanel figures and complex captions to individual image / caption pairs

About

The Plazi project has accumulated a large repository of figures and figure captions from species description publications, which are available via zenodo in the Biodiversity Literature Repository. However, figure captions typically consist of multiple subcaptions, each of which describes a different subfigure. The figures are images consisting of multiple subfigures. In theory, these high-quality biodiversity specimen images with associated textual description (of species, and often occurrence location and/or notable traits shown in the (sub)figure) could be very valuable for CLIP model training for biology, but for this to be effective, we need pairs of text descriptions that describe only one image (as a subfigure), and one subfigure image.

Aim

To produce a dataset of individual images with associated description suitable for CLIP-model training

Context

Part of the Image Datapalooza event held at Ohio State University, Columbus, 14-17th August 2023. Original project pitch is documented as an issue

Data

Tasks

Subdivision of complex captions into subfigure captions, with an identifier to correlate these to image segment
Segmentation of multi-panel images to one image per subfigure

Process

TBC

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plazi-multipanel-figures

About

Aim

Context

Data

Tasks

Process

Contacts

About

Releases

Packages

Contributors 2

Imageomics/plazi-multipanel-figures

Folders and files

Latest commit

History

Repository files navigation

plazi-multipanel-figures

About

Aim

Context

Data

Tasks

Process

Contacts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages