Skip to content

DataCatalogue/datacat-object-detection-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

🔎 DataCatalogue Object Detection Dataset

As part of our pipeline, we are experimenting with Document Layout Segmentation (DSL) or Document Layout Analysis (DLA). We used the web application Roboflow to manually annotate our dataset and train our object detection model based on YOLOv8. The annotations are based on the SegmOnto controlled vocabulary, with new classes defined by the COLaF project on their LADaS dataset.

Useful links:

  • Our dataset and model available on Roboflow
  • More on YOLOv8
  • Gabay, S., Pinche, A., Christensen, K., Camps, J.-B., & Carboni, N. (2023). A Controlled Vocabulary to Describe the Layout of Pages (Version 0.9). Genève, Lyon, Paris. https://segmonto.github.io/.

📝 Bibliography

  • Clérice, T., Janès, J., Scheithauer, H., Bénière, S., Romary, L., & Sagot, B. (2024, August 6-9). Layout Analysis Dataset with SegmOnto. DH 2024 - Annual Conference of the Alliance of Digital Humanities Organizations, Washington, D.C., United States. https://inria.hal.science/hal-04513725.

About

DataCatalogue Object Detection Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published