Skip to content

Python tools for performing various operations on ALTO XML files

License

Notifications You must be signed in to change notification settings

cneud/alto-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ALTO Tools

Python tools for performing various operations on ALTO XML files


Installation

You can install from PyPI by running

pip install alto-tools

or clone the repository, enter it and run

pip install .

Usage

alto-tools <INPUT> [OPTION] 

INPUT should be the path to an ALTO xml file or directory containing ALTO xml files.

The following OPTIONS are currently supported:

OPTION Description
-t --text Extract UTF-8 encoded text content
-c --confidence Extract mean OCR word confidence score
-i --illustrations Extract bounding box coordinates of <Illustration> elements
-g --graphics Extract bounding box coordinates of <GraphicalElement> elements
-s --statistics Extract statistical info (no. of textlines, words, glyphs etc.)

All output is sent to stdout.