Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Text overlap and Overshoot in html #108

Closed
ghost opened this Issue Mar 21, 2013 · 7 comments

Comments

Projects
None yet
2 participants
@ghost

ghost commented Mar 21, 2013

Hi,
I've been trying to convert pdf to HTML to speed up their rendering in the browser but I want the pdf to look exactly the same. I tried converting many pdf's and it worked well till my friend tried the pdf posted below.
This pdf when converted, the text is somehow ruined in some places.

Please advise on how to remove it.

Thanks

https://www.dropbox.com/s/b6gy9n1l4de261y/f.pdf

Contributor

jahewson commented Mar 21, 2013

Confirmed. Here's the HTML from my machine using the latest master.

The problems are:

  • p4 - large "Be" text too low
  • p5 - large "abg" text has wrong spacing
  • p8 - missing/incorrectly placed hyperlink text
  • p12 - vertical text has wrong spacing

Also: 6, 7, 12, 13 - overlapping text due to clipping path (this is a known issue, see #39)

@coolwanglu, any thoughts on what's going wrong?

Contributor

jahewson commented Mar 21, 2013

@coolwanglu the problems I found are all regressions, I tried it with 3ed576b and works fine except for the vertical text on p12. Maybe a problem with the newer state tracking code?

Owner

coolwanglu commented Mar 21, 2013

I've disabled a recent added function.
I can confirm missing links (p8) and wrong letter space for vertical text (p12)
I also observe a number of "boxes" what are supposed to be invisible.

Owner

coolwanglu commented Mar 23, 2013

@PunkBuster Is it possible for you to tell me the password, or provide me with an decrypted version? I need to inspect the links, which are not typical annotation links.

Owner

coolwanglu commented Mar 24, 2013

I think I've fixed the format problem, it's about the line merging procedure, which has been very wrong...
Please try the lastest master branch.

I'm going to deal with the blank boxes.

Owner

coolwanglu commented Mar 24, 2013

@jahewson The missing links seem to be Widget Annotations. Maybe support it in the future

Owner

coolwanglu commented Mar 25, 2013

@PunkBuster Actually I didn't solve the overlapping text. There are two parts

  • Text with clipping path, as john mentioned, duplicated of #39
  • Text covered by images, similar as #39 but harder.

But other problems should have been fixed. Please reopen the issue if the problem still exists.

@coolwanglu coolwanglu closed this Mar 25, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment