Skip to content

alix-tz/moonshines

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Moonshines

CC BY 4.0

characters badge regions badge lines badge files badge

License

This dataset is published under the CC-BY 4.0 License.

To cite this dataset:

Chagué, A. (2023). moonshines (Version 2.0.0) [Data set]. https://github.com/alix-tz/moonshines

Description

This dataset is composed of pages of text written in 2023 by a single person, copying texts taken from Guillaume Apollinaire's poems published in Alcools.

The dataset is divided into two parts:

  • data/ which is intended to train transcription models,
  • test/ which is intended for test.

Transcription guidelines

The transcription strictly follows what is written on the images, including accentuation or capitalization errors.

The segmentation follows the SegmOnto ontology and mostly relies on MainZone and DefaultLine.

Possible limitations

Since the text follows the structure of Alcools, there is almost no ponctuation in this ground truth. Besides, most of the lines start with a capital letter.

About

A single-hand set of French GT based on texts copied from Alcools

Resources

License

Stars

Watchers

Forks

Packages

No packages published