I've been trying to convert pdf to HTML to speed up their rendering in the browser but I want the pdf to look exactly the same. I tried converting many pdf's and it worked well till my friend tried the pdf posted below.
This pdf when converted, the text is somehow ruined in some places.
Please advise on how to remove it.
Confirmed. Here's the HTML from my machine using the latest master.
The problems are:
Also: 6, 7, 12, 13 - overlapping text due to clipping path (this is a known issue, see #39)
@coolwanglu, any thoughts on what's going wrong?
@coolwanglu the problems I found are all regressions, I tried it with 3ed576b and works fine except for the vertical text on p12. Maybe a problem with the newer state tracking code?
I've disabled a recent added function.
I can confirm missing links (p8) and wrong letter space for vertical text (p12)
I also observe a number of "boxes" what are supposed to be invisible.
@PunkBuster Is it possible for you to tell me the password, or provide me with an decrypted version? I need to inspect the links, which are not typical annotation links.
I think I've fixed the format problem, it's about the line merging procedure, which has been very wrong...
Please try the lastest master branch.
I'm going to deal with the blank boxes.
@jahewson The missing links seem to be Widget Annotations. Maybe support it in the future
@PunkBuster Actually I didn't solve the overlapping text. There are two parts
But other problems should have been fixed. Please reopen the issue if the problem still exists.