Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix glyph positioning in some rotation scenarios #403

Merged
merged 1 commit into from Dec 12, 2021
Merged

Conversation

yob
Copy link
Owner

@yob yob commented Dec 12, 2021

Including some rotated pages, and some rotated text on non-rotated pages.

When processing glyph displacement after rendering a glyph, the spec is pretty clear that the calculation should be:

      [ 1  0  0 ]
Tm =  [ 0  1  0 ]  x Tm
      [ tx ty 1 ]

However, for years pdf-reader has had it backwards:

           [ 1  0  0 ]
Tm =  Tm x [ 0  1  0 ]
           [ tx ty 1 ]

We'd built up some compensating bugs to cover that for some PDFs too, like using a calculated font size instead of the raw font size from the page state. Also a divide by ctm.a that made no sense, and there was even a comment saying that.

Fixing the order of the matrix multiplication means those compensating bugs can also go away.

There are some minor changes to the text output of the columns spec, which I'm willing to wear. Mostly whitespace changes - nothing significant - so I've updated the spec to match. I suspect these actually indicate some additional bugs in glyph displacement - particularly the way we process numeric arguments to the TJ (show_text_with_positioning) operator. I think this commit is an overall nett positive as it fixes some significant glyph positioning issues. We can iterate on the TJ operator handling separately.

Finally, there's a couple of tweaks to the apply_rotation in PageTextReceiver. This method is still buggy, and I've left a comment with some details. The current version will shift the characters around so they're positioned correctly relative to eachother, but the final x and y values are incorrect relative to the overall page boxes. I'll fix that up separately.

These changes were driven by a failing spec with a PDF based on the failure reported at #397. It's a page that's rotated by 270 degrees, and the rotation is undone in the BT block rather than via the CTM.

Fixes #376
Fixes #316
Fixes #271
Fixes #110

Including some rotated pages, and some rotated text on non-rotated
pages.

When processing glyph displacement after rendering a glyph, the spec is
pretty clear that the calculation should be:

          [ 1  0  0 ]
    Tm =  [ 0  1  0 ]  x Tm
          [ tx ty 1 ]

However, for years pdf-reader has had it backwards:

               [ 1  0  0 ]
    Tm =  Tm x [ 0  1  0 ]
               [ tx ty 1 ]

We'd built up some compensating bugs to cover that for some PDFs too,
like using a calculated font size instead of the raw font size from the
page state. Also a divide by ctm.a that made no sense, and there was
even a comment saying that.

Fixing the order of the matrix multiplication means those compensating
bugs can also go away.

There are some minor changes to the text output of the columns spec,
which I'm willing to wear. Mostly whitespace changes -  nothing
significant - so I've updated the spec to match. I suspect these
actually indicate some additional bugs in glyph displacement -
particularly the way we process numeric arguments to the TJ
(show_text_with_positioning) operator. I think this commit is an overall
nett positive as it fixes some significant glyph positioning issues. We
can iterate on the TJ operator handling separately.

Finally, there's a couple of tweaks to the apply_rotation in
PageTextReceiver. This method is still buggy, and I've left a comment
with some details. The current version will shift the characters around
so they're positioned correctly relative to eachother, but the final x
and y values are incorrect relative to the overall page boxes. I'll fix
that up separately.

These changes were driven by a failing spec with a PDF based on the
failure reported at #397. It's a page that's rotated by 270 degrees, and
the rotation is undone in the BT block rather than via the CTM.
@yob yob merged commit 2dc17eb into main Dec 12, 2021
@yob yob deleted the fix-glyph-displacement branch December 12, 2021 11:00
@yob yob mentioned this pull request Jan 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant