Skip to content

Conversion of ALTO files (including tags) to HTML

Notifications You must be signed in to change notification settings

altomator/ALTO-HTML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ALTO2HTML

Conversion of XML ALTO files to HTML

Synopsis

This batch script converts OCR ALTO files to HTML. It also renders the tags (logical, layout, semantic, etc.) which were introduced in ALTO v2.1 format. See https://www.loc.gov/standards/alto/

Installation

The script needs xalan-java.

A sample document is stored in the "DOCS" folder.

XSLT

Two DOS shell scripts :

  • ALTO2HTML.bat
  • xslt.cmd

One XSLT stylesheet:

  • ALTO2HTML.xsl

The XSLT is runned with Xalan-Java. Path to the Java binary must be set in xslt.cmd. For each document stored in the DOCS folder, all the ALTO files found in the X folder are processed. The HTML format is generated in a "HTML" folder and rendered with a CSS stylesheet.

Test
  1. Open a DOS terminal.
  2. Change dir to the folder containing the DOCS folder
  3. ALTO2HTML.bat DOCS

About

Conversion of ALTO files (including tags) to HTML

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages