You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pdftohtml does not give complete control on the PDF and uses its own images for borders, etc...decided to put the extra effort of parsing the PDF document using the poppler library...that will provide the API level control on the PDF...
Just pushed a version of Bookworm with PDF support based on the use of Poppler and "pdftohtml". However the following needs to be completed:
Avoid re-parsing of the book once it has been parsed and added to the library - this can be done by maintaining the parsed contents in the ~/.config/bookworm folder until a book is not removed from the library. This duplicates the contents of the book and hence adds to storage. Add a toggle button to choose between performance and storage
PDF parsing and conversion to HTML takes considerable time, especially for PDFs with images - will need to see if pdftohtml can be optimized
Poppler generated HTML files are not optimized in terms of the number of words per line. Some work is needed to optimize this
Add contractor support for PDF mime type to offer support for Bookworm in Files
Initial digging suggests uses of the poppler utility pdftohtml and then use the output HTML files on WebKit...
The text was updated successfully, but these errors were encountered: