Problem: Images rendered from PDF pages are hard to read and sometimes have issues #1792

mcantelon · 2024-04-02T19:55:41Z

Current Behavior

Steps to reproduce the behavior

Here are the steps needs to enable the setting, upload a PDF, then navigate to the corresponding image of a PDF page:

Navigate to settings/uploads.
Change the "Upload multi-page files as multiple descriptions" setting to "Yes".
Click the "Save" button.
Navigate to informationobject/addand add an information object.
Once you've created the information object then, for the "More" select menu, select "Link Digital Object".
Click "Choose file", select a multi-page PDF file, then click the "Create" button.
Click the preview image for one of the PDF's pages to navigate to that page's corresponding information object.
Click on the PDF page's representation image.
The full PDF page image will be displayed.

Example PDFs, provided by Dan, that don't render optimally:

https://api.printnode.com/static/test/pdf/multipage.pdf: pages aren't very legible due to lack of detail
https://www.delta-intkey.com/www/printtest.pdf: pages are illegible due to text being rendered as black squares

Expected Behavior

Rendered pages should be legible.

Possible Solution

Argument sent to invocation of the "convert" tool can fix.

Context and Notes

AtoM has a setting that allows PDFs, uploaded as digital objects to information objects, to be "exploded" into child information objects, for each of the PDF's pages, with each information object having a digital object attached that's an image rendered from the PDF page.

Version used

2.8.2 - 193

Operating System and version

Ubuntu 20.04

Default installation culture

en

PHP version

PHP 7.4

Contact details

mike@artefactual.com

The text was updated successfully, but these errors were encountered:

Added new CLI options to command used to extract images of PDF pages. Added "-density 300" to increase image detail and "-alpha remove" to fix issue where the alpha channel is rendered as black and causes images to be illegible.

Fixed issue with the digital objects regeneration task (digitalobject:regen-derivatives) deleting, but not regenerating, digital objects representing PDF pages. Removed unneeded and unused digital object class method.

Added new CLI options to command used to extract images of PDF pages. Added "-density 300" to increase image detail and "-alpha remove" to fix issue where the alpha channel is rendered as black and causes images to be illegible.

mcantelon · 2024-04-04T17:18:16Z

Merged PR to fix this.

mcantelon added the Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result. label Apr 2, 2024

mcantelon added a commit that referenced this issue Apr 2, 2024

Fix PDF page image rendering issues (#1792)

4e659b6

anvit linked a pull request May 16, 2024 that will close this issue

Fix PDF page image rendering issues (#1792) #1793

Merged

anvit added this to the 2.8.2 milestone May 16, 2024

anvit closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem: Images rendered from PDF pages are hard to read and sometimes have issues #1792

Problem: Images rendered from PDF pages are hard to read and sometimes have issues #1792

mcantelon commented Apr 2, 2024 •

edited

Loading

mcantelon commented Apr 4, 2024

Problem: Images rendered from PDF pages are hard to read and sometimes have issues #1792

Problem: Images rendered from PDF pages are hard to read and sometimes have issues #1792

Comments

mcantelon commented Apr 2, 2024 • edited Loading

Current Behavior

Expected Behavior

Possible Solution

Context and Notes

Version used

Operating System and version

Default installation culture

PHP version

Contact details

mcantelon commented Apr 4, 2024

mcantelon commented Apr 2, 2024 •

edited

Loading