Reading Order (IMPACT) #18

Open
markusenders opened this Issue Jun 5, 2014 · 2 comments

Comments

Projects
None yet
4 participants
@markusenders
Contributor

markusenders commented Jun 5, 2014

use case
Modern OCR software is able to recognize sections within a page. The logical text flow of between some of the sections may be continuous. The OCR software however may not always be able recognize the text flow correctly and stores these sections in non-continuous parts of the ALTO file.

Additional processing software or manual intervention may add correct this problem. ALTO has to store the reading order explicitly. The reading order should not rely on the order of XML elements in the ALTO file.

implementation
A single element defines the information flow for every section in the document. This section is called Region. Each region is specified by a element. This element points to either one Block-element (see chapter 4). Each region is part of a group. A group can contain regions that are

•unordered (information flow doesn’t have a particular order) or
•ordered (information flow has a particular order).
The appropriate elements are called or . Every region must be part of exactly one group. All regions in the ordered group must provide their position within the group. This position is stored in the ORDER attribute. The value of the ORDER attribute must be an integer and be unique within the group.

In order to represent complex information flows within a page groups may have an unlimited number of sub-groups. The sub-groups are of the type ordered and unordered groups. Both types of groups may have any type of sub-groups.

example

       <ReadingOrder>

               <OrderedGroup ID=”G1”>

                     <RegionRef IDREF="xxxx001" ORDER="1"/>

                     <RegionRef IDREF="…" ORDER="2"/>

                     <RegionRef IDREF="……" ORDER="3"/>

               </OrderedGroup>

               <UnorderedGroup ID=”UG1!”>

                     <RegionRef IDREF="…"/>

                     <RegionRef IDREF="……"/>

               </UnorderedGroup>

      </ReadingOrder>

      <Layout>

                     <Page>

                           <PrintSpace>

                                 <TextBlock ID="xxxx001">

                                       <TextLine>

                                             <String CONTENT="Advertisement"/>

                                       </TextLine>

                                 </TextBlock>

                           </PrintSpace>

                     </Page>          

                           ..... the complete layout description

   </Layout>

@Jo-CCS Jo-CCS added the 1 submitted label Sep 10, 2014

@jukervin jukervin changed the title from Reading Order to Reading Order (IMPACT) Sep 10, 2014

@cowboyMontana

This comment has been minimized.

Show comment
Hide comment
@cowboyMontana

cowboyMontana Feb 2, 2015

Member

Changed label from 'submitted' to 'discussion'.

Member

cowboyMontana commented Feb 2, 2015

Changed label from 'submitted' to 'discussion'.

@cowboyMontana

This comment has been minimized.

Show comment
Hide comment
@cowboyMontana

cowboyMontana Feb 2, 2015

Member

Assigned Markus Enders as change request champion.

Member

cowboyMontana commented Feb 2, 2015

Assigned Markus Enders as change request champion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment