PDFs don't render correctly. #82

bigfatbird · 2017-05-20T08:04:05Z

Text is aligned oddly, code indentation isn't looking right, and i guess some characters are not encoded correctly.

babluboy · 2017-05-20T12:33:11Z

@bigfatbird Yes, I am aware of this issue. I'm using poppler utils at the moment to convert PDF to HTML to render the content and the conversion is not great...I will use this issue to track this and see if I can extract the text and images programatically to have greater control in rendering the content...I can also see if some css can be applied to render the text a little better...

At the current time you can use the reading preferences for line width and line height to adjust the content a little bit better...

babluboy · 2017-05-20T12:34:03Z

@bigfatbird Can you post a screen shot here to show how the text currently for a PDF and whether the PDF is image rich or just text...

bigfatbird · 2017-05-20T12:48:12Z

Sure. Here are two screenshots of the same book for example.
http://imgur.com/a/WaZpn

babluboy · 2017-05-20T12:54:20Z

@bigfatbird thanks. looks like if I can center the content the rendering will look better...that should not be hard to achieve...will update here when I get to this issue

See if line width helps a little better until I get the fix in..

bigfatbird · 2017-05-20T13:02:12Z

Just curious: Why do you want to style it yourself, if there is an existing PDF standard?
A pdf should look exactly like it was released, I assume.

babluboy · 2017-05-20T13:12:09Z

Not sure, at the moment I'm using poppler util pdftohtml and I dont see any option to render the html the way it looks like in Evince viewer...perhaps I should check the Evince code to see how the rendering is done

bigfatbird · 2017-05-20T14:33:17Z

Maybe you can integrate mozillas pdf.js Von meinem iPhone gesendet

…

Am 20.05.2017 um 15:12 schrieb Siddhartha Das ***@***.***>: Not sure, at the moment I'm using poppler util pdftohtml and I dont see any option to render the html the way it looks like in Evince viewer...perhaps I should check the Evince code to see how the rendering is done — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

babluboy · 2017-05-20T14:45:25Z

Thats sounds great...thanks for the suggestion...looks workable at a quick glance..
https://mozilla.github.io/pdf.js/examples/

unhammer · 2017-07-29T09:17:53Z

would using on-the-fly pdf rendering instead of cached pdftohtml give a smaller ~/.config/bookworm db too? Mine is already up to 1.9GB

babluboy · 2017-07-29T17:38:18Z

@unhammer how many books are there in the library and how many are PDFs? The actual book content is cached on the file system(if the cache preference is set) but the metadata including the table of contents is stored in the db. I have seen that pdf2html generates a lot of pages as I separate the html by page break tag... Will look at better PDF handling in the future....

If you turn off caching then book content will be cached in /tmp and automatically be removed on restart... If you open the book again the same will be parsed and the html content regenerated in /tmp...this takes slightly longer to resume reading....

unhammer · 2017-07-30T07:42:27Z

in my case, 297 pdf's and 34 html/txt/epub

babluboy · 2017-07-30T08:48:40Z

hmm...while 300 PDFs seem a largi-ish library (i have not tested more than 100 PDFs), yet it does feel high just for the content data to be 1.9 GB...will look into this to see if I can replicate...

babluboy · 2017-09-08T12:56:48Z

@bigfatbird It dosen't look like it will be possible to extract PDF to HTML using pdf.js based on this:
mozilla/pdf.js#8732
Bookworm relies upon HTML files to apply all the text/color modifications, highlighting, search, navigation, etc.
I will need to either render the output of PDF2HTML in a better way or find some other way to create html pages out of PDF...

babluboy · 2017-11-23T19:47:44Z

Looks like poppler can be used to get the chapters from the book using this example:
https://stackoverflow.com/questions/7131906/how-to-extract-pdf-index-table-of-contents-with-poppler

At least it will reduce the data in the meta data database by just storing just the chapters and their corresponding html file. Currently i'm storing the location of all the html files which is one per page of the PDF thereby bloating the DB size as mentioned here by @unhammer

babluboy · 2018-06-12T06:05:02Z

@Preconf unfortunately I have not spent further time on this. I tried the following library but the extraction was too slow although the rendering was better:
https://github.com/coolwanglu/pdf2htmlEX

Will check evince to see if it is usable.

prog-amateur · 2019-10-13T00:34:59Z

@Preconf unfortunately I have not spent further time on this. I tried the following library but the extraction was too slow although the rendering was better:
https://github.com/coolwanglu/pdf2htmlEX

Will check evince to see if it is usable.

Hello, I came here after a review of your app in a website. Everything is perfect, except this PDF support : it should be as the original one, not re-arranged. This is clearly a No-Go for this specific aspect, and people can have many PDF in their library.

So do you have any solution please ? this issue was open in 2017.

Thank you, and please remember : I claim here but I really like this app.

0xBRM · 2019-10-24T21:38:25Z

Completely unusable for pdfs. Might as well just stick to koreader for the epub support and evince for everything else.

prog-amateur · 2019-10-30T05:28:56Z

Completely unusable for pdfs. Might as well just stick to koreader for the epub support and evince for everything else.

Be careful, you can have a thumb down (like for my request) for saying that this app cannot read PDF correctly. There is some people here who maybe want to stay with a pdf code reader ( @bigfatbird ?)

bigfatbird · 2019-11-02T11:20:18Z

@prog-amateur I downvoted your and the other reply as they are completely useless for fixing the bug and just create unnecessary work. A "it doesn't work for me, too" under a bug which just describes that something is not working as expected is not helpful at all and is more work for the developers. It was not personal.

babluboy self-assigned this May 20, 2017

babluboy added the Bug label May 20, 2017

babluboy added this to the 0.8 milestone May 20, 2017

babluboy removed this from the 0.8 milestone Jul 13, 2017

babluboy mentioned this issue Aug 4, 2017

Add annotation capabilities #109

Closed

babluboy mentioned this issue Nov 6, 2017

Library issues for auto addition of books #143

Closed

babluboy added the In Progress label Nov 23, 2017

babluboy added this to the 1.0 milestone Nov 23, 2017

babluboy removed this from the 0.9.5 milestone Dec 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDFs don't render correctly. #82

PDFs don't render correctly. #82

bigfatbird commented May 20, 2017

babluboy commented May 20, 2017

babluboy commented May 20, 2017

bigfatbird commented May 20, 2017

babluboy commented May 20, 2017

bigfatbird commented May 20, 2017

babluboy commented May 20, 2017

bigfatbird commented May 20, 2017 via email

babluboy commented May 20, 2017

unhammer commented Jul 29, 2017 •

edited

Loading

babluboy commented Jul 29, 2017

unhammer commented Jul 30, 2017 •

edited

Loading

babluboy commented Jul 30, 2017

babluboy commented Sep 8, 2017

babluboy commented Nov 23, 2017

babluboy commented Jun 12, 2018

prog-amateur commented Oct 13, 2019

0xBRM commented Oct 24, 2019

prog-amateur commented Oct 30, 2019

bigfatbird commented Nov 2, 2019

PDFs don't render correctly. #82

PDFs don't render correctly. #82

Comments

bigfatbird commented May 20, 2017

babluboy commented May 20, 2017

babluboy commented May 20, 2017

bigfatbird commented May 20, 2017

babluboy commented May 20, 2017

bigfatbird commented May 20, 2017

babluboy commented May 20, 2017

bigfatbird commented May 20, 2017 via email

babluboy commented May 20, 2017

unhammer commented Jul 29, 2017 • edited Loading

babluboy commented Jul 29, 2017

unhammer commented Jul 30, 2017 • edited Loading

babluboy commented Jul 30, 2017

babluboy commented Sep 8, 2017

babluboy commented Nov 23, 2017

babluboy commented Jun 12, 2018

prog-amateur commented Oct 13, 2019

0xBRM commented Oct 24, 2019

prog-amateur commented Oct 30, 2019

bigfatbird commented Nov 2, 2019

unhammer commented Jul 29, 2017 •

edited

Loading

unhammer commented Jul 30, 2017 •

edited

Loading