-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDFs don't render correctly. #82
Comments
@bigfatbird Yes, I am aware of this issue. I'm using poppler utils at the moment to convert PDF to HTML to render the content and the conversion is not great...I will use this issue to track this and see if I can extract the text and images programatically to have greater control in rendering the content...I can also see if some css can be applied to render the text a little better... At the current time you can use the reading preferences for line width and line height to adjust the content a little bit better... |
@bigfatbird Can you post a screen shot here to show how the text currently for a PDF and whether the PDF is image rich or just text... |
Sure. Here are two screenshots of the same book for example. |
@bigfatbird thanks. looks like if I can center the content the rendering will look better...that should not be hard to achieve...will update here when I get to this issue See if line width helps a little better until I get the fix in.. |
Just curious: Why do you want to style it yourself, if there is an existing PDF standard? |
Not sure, at the moment I'm using poppler util pdftohtml and I dont see any option to render the html the way it looks like in Evince viewer...perhaps I should check the Evince code to see how the rendering is done |
Maybe you can integrate mozillas pdf.js
Von meinem iPhone gesendet
… Am 20.05.2017 um 15:12 schrieb Siddhartha Das ***@***.***>:
Not sure, at the moment I'm using poppler util pdftohtml and I dont see any option to render the html the way it looks like in Evince viewer...perhaps I should check the Evince code to see how the rendering is done
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thats sounds great...thanks for the suggestion...looks workable at a quick glance.. |
would using on-the-fly pdf rendering instead of cached pdftohtml give a smaller ~/.config/bookworm db too? Mine is already up to 1.9GB |
@unhammer how many books are there in the library and how many are PDFs? The actual book content is cached on the file system(if the cache preference is set) but the metadata including the table of contents is stored in the db. I have seen that pdf2html generates a lot of pages as I separate the html by page break tag... Will look at better PDF handling in the future.... If you turn off caching then book content will be cached in /tmp and automatically be removed on restart... If you open the book again the same will be parsed and the html content regenerated in /tmp...this takes slightly longer to resume reading.... |
in my case, 297 pdf's and 34 html/txt/epub |
hmm...while 300 PDFs seem a largi-ish library (i have not tested more than 100 PDFs), yet it does feel high just for the content data to be 1.9 GB...will look into this to see if I can replicate... |
@bigfatbird It dosen't look like it will be possible to extract PDF to HTML using pdf.js based on this: |
Looks like poppler can be used to get the chapters from the book using this example: At least it will reduce the data in the meta data database by just storing just the chapters and their corresponding html file. Currently i'm storing the location of all the html files which is one per page of the PDF thereby bloating the DB size as mentioned here by @unhammer |
@Preconf unfortunately I have not spent further time on this. I tried the following library but the extraction was too slow although the rendering was better: Will check evince to see if it is usable. |
Hello, I came here after a review of your app in a website. Everything is perfect, except this PDF support : it should be as the original one, not re-arranged. This is clearly a No-Go for this specific aspect, and people can have many PDF in their library. So do you have any solution please ? this issue was open in 2017. Thank you, and please remember : I claim here but I really like this app. |
Completely unusable for pdfs. Might as well just stick to koreader for the epub support and evince for everything else. |
Be careful, you can have a thumb down (like for my request) for saying that this app cannot read PDF correctly. There is some people here who maybe want to stay with a pdf code reader ( @bigfatbird ?) |
@prog-amateur I downvoted your and the other reply as they are completely useless for fixing the bug and just create unnecessary work. A "it doesn't work for me, too" under a bug which just describes that something is not working as expected is not helpful at all and is more work for the developers. It was not personal. |
Text is aligned oddly, code indentation isn't looking right, and i guess some characters are not encoded correctly.
The text was updated successfully, but these errors were encountered: