Skip to content

Ground truth for the digitized historic collections of UB Mannheim

License

Notifications You must be signed in to change notification settings

UB-Mannheim/digi-gt

Repository files navigation

digi-gt

Ground truth for the digitized historic collections of Universitätsbibliothek Mannheim.

The transcriptions were done with eScriptorium, a transcription platform developed as part of the Scripta and RESILIENCE projects (https://gitlab.com/scripta/escriptorium/).

After exporting the transcriptions as PAGE XML files, those without any transcription were removed, and empty lines in the remaining ones were removed, too.:

# Remove PAGE XML files without any transcription.
rm -v $(grep -L "<Unicode>..*</Unicode>" *.xml)
# Remove empty lines in PAGE XML files.
perl -i -ne "tr|\r||d; next if /^\s*$/;print" *.xml

List of transcriptions

Links

About

Ground truth for the digitized historic collections of UB Mannheim

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages