Skip to content

Ground truth for the digitized historic collections of UB Mannheim

License

Notifications You must be signed in to change notification settings

UB-Mannheim/digi-gt

Repository files navigation

digi-gt

Ground truth for the digitized historic collections of Universitätsbibliothek Mannheim.

The transcriptions were done with eScriptorium, a transcription platform developed as part of the Scripta and RESILIENCE projects (https://gitlab.com/scripta/escriptorium/).

After exporting the transcriptions as PAGE XML files, those without any transcription were removed, and empty lines in the remaining ones were removed, too.:

# Remove PAGE XML files without any transcription.
rm -v $(grep -L "<Unicode>..*</Unicode>" *.xml)
# Remove empty lines in PAGE XML files.
perl -i -ne "tr|\r||d; next if /^\s*$/;print" *.xml

List of transcriptions

Links

About

Ground truth for the digitized historic collections of UB Mannheim

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages