You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the pdf seems to generate pdfs files containing an image/screenshot of the source web page.
This behavior prevents:
file text indexing
running the import-document connector to automate the generation of entities/relationships linked to the pdf.
We need to modify the pdf generation so that it contains text rather than an image.
Proposed Solution
We need to modify the pdf generation so that it contains text rather than an image.
PDF generation is on line 89 of the file "import-external-reference.py", with the "pdfkit" library. This library uses the "wkhtmltopdf" utility to generate the pdf.
Additional Information
After testing, the version of the utility "wkhtmltopdf" retrieved from the dockerfile via the Debian repo is 0.12.5.
The new version 0.12.6 solves the problem, so we need to force the installation of this version.
Would you be willing to submit a PR?
Yes
The text was updated successfully, but these errors were encountered:
SamuelHassine
added
feature
use for describing a new feature to develop
solved
use to identify issue that has been solved (must be linked to the solving PR)
and removed
needs triage
use to identify issue needing triage from Filigran Product team
labels
Jan 16, 2024
Description
Currently, the pdf seems to generate pdfs files containing an image/screenshot of the source web page.
This behavior prevents:
We need to modify the pdf generation so that it contains text rather than an image.
Proposed Solution
We need to modify the pdf generation so that it contains text rather than an image.
PDF generation is on line 89 of the file "import-external-reference.py", with the "pdfkit" library. This library uses the "wkhtmltopdf" utility to generate the pdf.
Additional Information
After testing, the version of the utility "wkhtmltopdf" retrieved from the dockerfile via the Debian repo is 0.12.5.
The new version 0.12.6 solves the problem, so we need to force the installation of this version.
Would you be willing to submit a PR?
Yes
The text was updated successfully, but these errors were encountered: