Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 861 Bytes

README.md

File metadata and controls

23 lines (18 loc) · 861 Bytes

Ground truth and full text for selected prints of German libraries

Collection of useful commands

# Remove empty lines from ALTO and PAGE XML.
perl -i -ne "tr|\r||d; next if /^\s*$/;print" *.xml

# Remove ALTO files without fulltext.
rm -f $(grep -L 'CONTENT="..*"' *.xml)

# Remove PAGE files without fulltext.
rm -f $(grep -L '<Unicode>..*</Unicode>' *.xml)