Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Clone this wiki locally
Tools creating ALTO
- ABBYY FineReader Engine is an OCR SDK that gives developers, integrators and BPOs the tools they require to integrate optical text recognition technologies into their applications.
- docWorks is a software solution to digitize and convert library holdings and archives for easy access, searchability, and long-term preservation. docWorks generates as default METS/ALTO output, in addition it offers the transformation of the output into further formats like ePUB, PDF, plain-text, RTF or others.
- kraken is a turn-key OCR system forked from ocropus. It is intended to rectify a number of issues while preserving (mostly) functional equivalence.
Tools presenting ALTO content
- OCLC CONTENTdm makes everything in your digital collections available to everyone, everywhere. No matter the format — local history archives, newspapers, books, maps, slide libraries or audio/video — CONTENTdm can handle the storage, management and delivery of your collections to users across the Web.
- Veridian is presentation software that makes it easy to search, view, and interact with digital collections on the Internet. Veridian supports almost any type of content such as books, magazines, journals, newspapers, photographs, maps, and audio/video files and makes them easily accessible to anyone online.
- Islandora is a popular open source digital repository system based on Fedora Commons, Drupal and a host of additional applications. Islandora is used for many different types of content, including newspapers.
An enhanced ALTO-viewer for Quality Assurance oriented display of a collections of scans, typically from books or newspapers.
Browser web service for displaying digital representations from decentralized library repositories
Tools working with ALTO
- Aletheia (an advanced document analysis system) as well as other commercial and/or open source PRImA tools such as OCR text and layout performance evaluation, viewers, and converters support ALTO as input format.
Browser based post-correction tool for Alto XML files, version 1
Browser based post-correction tool for Alto XML files, version 2
Python script for various operations on ALTO files
Named Entity Recognition based on Stanford Named Entity Recognizer with support for ALTO
Evaluation of OCR and a reference text (multiple formats supported, incl. ALTO)
Tools for transforming ALTO or other formats into ALTO
Convert between Tesseract hOCR and ALTO XML 2.0/2.1 using XSL stylesheets
This is a simple Converter written in PHP5 to convert Abbyy FineReader XML into the ALTO XML document format.
A simple Java based tool to convert Abbyy FineReader XML to ALTO XML.
OCR/Text format conversion tool, supports ALTO as input format to create TEI, Folia
ALTO to HTML batch converter dealing with the ALTO tags feature (tags were introduced in ALTO v2). Based on XSLT and DOS scripts.
XSL stylesheets to convert from Tesseract hOCR output to ALTO 2.0/2.1 format
This XSLT converts an ALTO xml document to an annotation list for use with a IIIF manifest.
Program that uses Tesseract API to produce ALTO XML with Glyph variants.
This is a simplistic demonstration of how you can calculate the ratio of dictionary words to all words in a METS Alto OCR xml file
Experiments with cleanup of dirty ALTO OCR files using anagram hashing.
METS/ALTO data mining tool: Extraction of quantitative metadata from METS/ALTO newspapers documents. Based on XSLT or Perl scripts. See also http://altomator.github.io/EN-data_mining