Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics of textLineOrder and readingDirection #26

Open
bertsky opened this issue Jan 6, 2021 · 3 comments
Open

Semantics of textLineOrder and readingDirection #26

bertsky opened this issue Jan 6, 2021 · 3 comments

Comments

@bertsky
Copy link
Contributor

bertsky commented Jan 6, 2021

The schema documentation only says this:

  • readingDirection:

    The direction in which text within lines should be read (order of words and characters), in addition to “textLineOrder”.

  • textLineOrder:

    The order of text lines within the block, in addition to “readingDirection”.

Now, the values for both of these are stated in absolute terms (top-to-bottom, bottom-to-top, left-to-right, right-to-left), not relative to XML ordering (straight vs inverse).

So how exactly should they be interpreted?

  1. W.r.t. @orientation: Before or after rotation?
  2. W.r.t. XML ordering: Should elements always be "in order" already, or must they follow some absolute top-down left-right default?
  3. W.r.t. each other: Is it an error if they are not orthogonal?

I have not found a single example anywhere in the repo. I found but 2 examples of @readingDirection="bottom-to-top" in the PRImA Layout Analysis Dataset, namely r13 in 00000408 and r3 in 00000394 – both of which are cases of @orientation=-90°. Is this correct?

@bertsky
Copy link
Contributor Author

bertsky commented Feb 13, 2021

I have not found a single example anywhere in the repo. I found but 2 examples of @readingDirection="bottom-to-top" in the PRImA Layout Analysis Dataset, namely r13 in 00000408 and r3 in 00000394 – both of which are cases of @orientation=-90°. Is this correct?

Interestingly, there are also 3 examples of top-to-bottom, namely r19 in 00000404 (with @orientation=-90), r2 in 00000395 (with @orientation=-90) and r21 in 00000407 (with @orientation=90).

Looking at the images, to me it seems that:

  • 394 is inconsistent (with @readingDirection being wrong)
  • 395 is consistent and correct
  • 407 is inconsistent (with @orientation being wrong)
  • 408 is consistent but both wrong

Then there are 107 TextRegions with @readingDirection="left-to-right", of which about half have @orientation=90 and the other -90.

And there's four more, 089, 90, 712 and 713, which all have an additional @readingOrientation=90 – that's clearly wrong (given the documentation that this applies on top of @orientation) – and which all also have @orientation with a wrong sign.

Is this some sort of game?

(There's also the aspect of what your point of reference for absolute terms like top and bottom, left and right is when you have non-orthogonal @orientation. Does the interpretation of "left" snap from one side to the other as the angle crosses 45°?)

@bertsky
Copy link
Contributor Author

bertsky commented Sep 3, 2021

The larger issue on how XML ordering relates to explicit @index / @readingDirection / @textLineOrder semantically also applies on the TextEquiv level, BTW.

@bertsky
Copy link
Contributor Author

bertsky commented Jan 7, 2022

In a discussion about related representation within ALTO, IIUC @mittagessen argued that the notions top-to-bottom, bottom-to-top, left-to-right and right-to-left should not be seen as absolute (w.r.t. the page image) but relative to the textline. IMO there are two possibilities to define relativitiy here:

  • w.r.t. Baseline – but this element is only optional; and it would require defining the first point as "top left" and the last point as "bottom right" (which seems like a stretch)
  • w.r.t. the glyphs of the text content (after page/region derotation, so for bottom-to-top we would expect that the textline bbox image can be digitally rendered from its textequiv codepoint sequence by gluing glyph strokes on top of each other, whereas right-to-left by gluing left of each other) – but what if the text contains BiDi marks?

Regardless what might be a good interpretation, they all seem to defy the actual examples described above.

@chris1010010 please clarify what were the intended semantics of these attributes (and where to find documentation or correct examples)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant