Add PDF support to Bookworm #34

babluboy · 2017-04-18T21:40:18Z

Initial digging suggests uses of the poppler utility pdftohtml and then use the output HTML files on WebKit...

babluboy · 2017-04-29T18:50:48Z

pdftohtml does not give complete control on the PDF and uses its own images for borders, etc...decided to put the extra effort of parsing the PDF document using the poppler library...that will provide the API level control on the PDF...

babluboy · 2017-04-30T21:55:20Z

Just pushed a version of Bookworm with PDF support based on the use of Poppler and "pdftohtml". However the following needs to be completed:

Avoid re-parsing of the book once it has been parsed and added to the library - this can be done by maintaining the parsed contents in the ~/.config/bookworm folder until a book is not removed from the library. This duplicates the contents of the book and hence adds to storage. Add a toggle button to choose between performance and storage
PDF parsing and conversion to HTML takes considerable time, especially for PDFs with images - will need to see if pdftohtml can be optimized
Poppler generated HTML files are not optimized in terms of the number of words per line. Some work is needed to optimize this
Add contractor support for PDF mime type to offer support for Bookworm in Files

babluboy self-assigned this Apr 28, 2017

babluboy added the Enhancement label Apr 28, 2017

babluboy added this to the 0.6 milestone Apr 28, 2017

babluboy added the In Progress label Apr 28, 2017

babluboy added Completed Released Enhancement and removed In Progress Enhancement Completed labels May 12, 2017

babluboy closed this as completed May 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PDF support to Bookworm #34

Add PDF support to Bookworm #34

babluboy commented Apr 18, 2017

babluboy commented Apr 29, 2017

babluboy commented Apr 30, 2017 •

edited

Add PDF support to Bookworm #34

Add PDF support to Bookworm #34

Comments

babluboy commented Apr 18, 2017

babluboy commented Apr 29, 2017

babluboy commented Apr 30, 2017 • edited

babluboy commented Apr 30, 2017 •

edited