Skip to content

Subdivide multipanel figures and complex captions to individual image / caption pairs

Notifications You must be signed in to change notification settings

Imageomics/plazi-multipanel-figures

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

plazi-multipanel-figures

Subdivide multipanel figures and complex captions to individual image / caption pairs

About

The Plazi project has accumulated a large repository of figures and figure captions from species description publications, which are available via zenodo in the Biodiversity Literature Repository. However, figure captions typically consist of multiple subcaptions, each of which describes a different subfigure. The figures are images consisting of multiple subfigures. In theory, these high-quality biodiversity specimen images with associated textual description (of species, and often occurrence location and/or notable traits shown in the (sub)figure) could be very valuable for CLIP model training for biology, but for this to be effective, we need pairs of text descriptions that describe only one image (as a subfigure), and one subfigure image.

Aim

To produce a dataset of individual images with associated description suitable for CLIP-model training

Context

Part of the Image Datapalooza event held at Ohio State University, Columbus, 14-17th August 2023. Original project pitch is documented as an issue

Data

Tasks

  • Subdivision of complex captions into subfigure captions, with an identifier to correlate these to image segment
  • Segmentation of multi-panel images to one image per subfigure

Process

TBC

Contacts

About

Subdivide multipanel figures and complex captions to individual image / caption pairs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published