Skip to content

Ground truth and models for 17th c. French prints.

Notifications You must be signed in to change notification settings

e-ditiones/OCR17

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OCR17

Ground truth and models for 17th c. French prints.

- This repo is not updated anymore. Please use the OCR17plus repo, which uses XML files and not .png/.txt pairs.

For the OCR17plus repo, cf. here.

Organisation of the repo

|-Models
  |-Kraken
  |-Calamari
|-Testing_data
  |-XIX
  |-XVI
  |-XVIII
|-Training_data
  |-72dpi
    |-Print_1
      |-extracted
      |-training_data
      |-README.md
      |transcription.txt
    |-Print_2
  |-400dpi
  |-400dpi_MUFI
  |-600dpi

Corpus

Most of the training data are taken from literary texts, and especially plays, printed throughout the 17th century. Each print is described in depth in its own folder.

Transcriptions

Transcripts are almost diplomatic. Long ſ is maintained ( plaiſir and not plaisir). Ligatures which have disappeared ( ſt, st, ct) are not kept, but not those that are maintained in contemporary French (œ, æ).

For certain prints only, unicode and MUFI ligatures are maintained (folder 400dpi_mufi) for testing purposes. Ground truth is provided both with and without them.

Cite this repository

@dataset{simon_gabay_2020_3826894,
  author       = {Simon Gabay},
  title        = {OCR17: GT for 17th French prints},
  month        = may,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.3826894},
  url          = {https://doi.org/10.5281/zenodo.3826894}
}

Please keep me posted if you use this data! simon.gabay[at]unige.ch

Licence

Licence Creative Commons
This work is licensed under a Creative Commons Attribution 4.0 International Licence.

Thanks

Special thanks to Thibault Clérice for his magic xslt stylesheets (and many other things)!

About

Ground truth and models for 17th c. French prints.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published