Skip to content

Minor improvements to ImageMagick thumbnail filters #8850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 31, 2023

Conversation

alanorth
Copy link
Contributor

@alanorth alanorth commented May 18, 2023

References

Description

Minor documentation and logging fixes to the ImageMagick thumbnail filters—these are not documented very well and have some logical errors. Also includes a minor optimization for PDF thumbnail generation, which changes the intermediate temp file format to MIFF instead of JPEG so we don't lose quality.

In the long term we will have to re-factor these filters, but for now I want to address some minor issues.

Instructions for Reviewers

Test running the filter-media script on an item with PDF bitstreams. Use verbose and force to make sure the process works as expected:

$ dspace filter-media -i 10568/130406 -v -f

List of changes in this PR:

  • Add comments to ImageMagick Thumbnail filters because they are a bit confusing
  • Improve verbose logging by fixing typos, style, and using the actual thumbnail name instead of the parent bitstream's name where appropriate
  • Simplify logic in getImageFile() by removing the page parameter (we always use page 0 and this is not configurable so there is no need to use this parameter)

See the commit messages for more information.

Checklist

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & integration tests). Exceptions may be made if previously agreed upon.
  • My PR passes Checkstyle validation based on the Code Style Guide.
  • My PR includes Javadoc for all new (or modified) public methods and classes. It also includes Javadoc for large or complex private methods.
  • My PR passes all tests and includes new/updated Unit or Integration Tests based on the Code Testing Guide.
  • If my PR includes new libraries/dependencies (in any pom.xml), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR modifies REST API endpoints, I've opened a separate REST Contract PR related to this change.
  • If my PR includes new configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

@alanorth alanorth added the tools: media-filters Related to filter-media, full text extraction or thumbnail creation label May 18, 2023
@alanorth alanorth requested review from tdonohue and terrywbrady May 18, 2023 07:37
@alanorth alanorth force-pushed the rework-im-thumbnail-filter branch from b07fdd7 to cf0a3f2 Compare May 18, 2023 08:09
@tdonohue tdonohue added this to the 7.6 milestone May 18, 2023
@tdonohue tdonohue added the 1 APPROVAL pull request only requires a single approval to merge. label May 18, 2023
@tdonohue tdonohue requested review from hardyoyo and removed request for terrywbrady May 18, 2023 14:10
alanorth added a commit to alanorth/DSpace that referenced this pull request May 19, 2023
Minor improvements to logging and generation of PDF thumbnails by
using a MIFF intermediate instead of a JPEG.

Backported from DSpace 7.6.

See: DSpace#8850
@alanorth alanorth force-pushed the rework-im-thumbnail-filter branch from 5cb1f85 to 5c63296 Compare May 19, 2023 07:27
alanorth added 6 commits May 22, 2023 18:36
Instead of logging the name of the source bitstream, we should be
logging the name of the actual thumbnail bitstream that is being
considered for replacement. For example, instead of this:

  IM Thumbnail manual.pdf matches pattern and is replaceable.

... the message should read:

  IM Thumbnail manual.pdf.jpg matches pattern and is replaceable.

This message is already confusing enough, but this will help.
Minor standardization to logging (unneccessary capitalization and
excessive spaces).
There is no point passing a page parameter here, with a default of
0 no less, because we will *always* use the first page of the PDF
to generate the thumbnail. No other filters use this function and
the page parameter is not configurable so we should just hard code
it.
Add some comments to document the functionality of the ImageMagick
thumbnail filters. This will help others understand it later when
we need to re-factor them.
When filtering PDF bitstreams, the ImageMagickThumbnailFilter first
creates an intermediate JPEG and then a "thumbnail" JPEG. These two
operations are both lossy. The ImageMagick usage guide warns against
doing that:

> JPEG losses (sic) information, degrading images when saved.
> Use some other format for intermediate images during processing.
> Only use JPEG format, for the final image, not for further processing.

As our current filter architecture requires writing a temporary file
we must choose one of the following lossless formats to use for the
intermediate: PNG, TIFF, or MIFF. MIFF is ImageMagick's own internal
format and is much faster to write than PNG.

By eliminating the first lossy conversion we gain 1.1% points on the
ssimulacra2 (v2.1) scoring scale of visual quality.

See: https://imagemagick.org/Usage/formats/#jpg
@alanorth alanorth force-pushed the rework-im-thumbnail-filter branch from 5c63296 to 5357923 Compare May 22, 2023 15:36
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @alanorth ! Tested these minor changes today with ImageMagick for image and pdf thumbnails. No issues found... thumbnails still are created successfully.

@tdonohue tdonohue merged commit cac0859 into DSpace:main May 31, 2023
@alanorth alanorth deleted the rework-im-thumbnail-filter branch May 31, 2023 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 APPROVAL pull request only requires a single approval to merge. tools: media-filters Related to filter-media, full text extraction or thumbnail creation
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants