v0.1.0
First public release of archival-structures: tools for analysing PageXML/ATR transcriptions and scan images of archival documents -- detecting and splitting two-page book openings, clustering text lines and page layouts, mining cross-page document-element sequences, ink-colour and missing-transcription detection, and parsing EAD/METS archival finding-aid metadata.
Install
pip install archival-structuresDemo data
The demo notebooks in notebooks/demo/ need real PageXML/thumbnail data that isn't part of the package install. Download archival-structures-demo-data.zip from this release and extract it at the repository root:
unzip archival-structures-demo-data.zip -d .See the README for the full list of notebooks and what each one demonstrates.