Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



75 Commits

Repository files navigation


PAGE XML format collection for document image page content and more

For an introduction, please see the following publication:

The most actively used XML formats are:

  • PAGE XML for page content (regions, text lines, words, glyphs, reading order, text content, ...)
  • PAGE XML for layout analysis evaluation (evaluation profiles, evaluation results, ...)
  • PAGE XML for document image dewarping (dewarping grids)

All formats are defined by an XML schema, hosted officially on

Please see the wiki for more information.

Note: The master branch contains the proposed changes for the next release.

Page Content

Proposed media type for page content: "application/"