v4.0 Changes

Summary of changes for version 4.0

  • Changed schema version to 4.0
  • Changed namespace and targetNamespace to
  • Clarification and definition of the licensing to common standard "CC BY-SA 4.0" for this ALTO standard (with agreement of the authors)
  • Added character based text description with new Glyph element and its subelement Variant (GlyphType, VariantType)
  • Extended annotation for clarification of the difference of existing element ALTERNATIVE and Glyph/Variant
  • Introduce generic "Processing" and deprecate "OcrProcessing"
  • Introduce generic "processingStep" with "ProcessingStepType" and required attribute "ID" and deprecate "preProcessingStep", "ocrProcessingStep", "postProcessingStep"
  • Add common vocabulary for "processingStep" comprising the "ContentGeneration", "ContentModification", "PreOperation", "PostOperation", "Other"
  • Fix for the element Shape. The Shape element can now only be used once within a PageSpace or a TextLine as it was intended.

Version 4.0 was released in January 2018.

You can find the official version 4.0 schema here.

See also the use-cases for the use of glyphs.

Comments about the schema and its documentation as well as additional use cases for the new schema features are encouraged (GitHub account required).

ALTO schemas will be updated by whole numbers upon making changes that break backward compatibility (version 1 to version 2), and decimals for changes that will not (3.0 to 3.1). The namespace itself will also only change on major versions (ns-v2 to ns-v3).

