-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-work the ImageMagick PDF thumbnail filter #8849
Comments
The various ImageMagick thumbnail filter classes are tightly intertwined, which makes it hard to change the image format for PDFs alone. I suppose we could adopt the WebP format for image thumbnails too—after all, in the case that an It could maybe even be configurable. Perhaps @terrywbrady or @tdonohue have some comments or suggestions. Note: I might even suggest starting on commenting the existing code because it's not well documented. |
I'm not against switching to WebP for all thumbnails. It sounds like others (YouTube and Facebook) have made the switch and found better performance: https://web.dev/serve-images-webp/ That said, I think this sort of change would obviously need to be made in a major release (8.0 at the earliest), as it'd require sites to recreate all their thumbnails (luckily we already have a script for that). |
Thanks @tdonohue. There is no need for sites to regenerate their thumbnails. Old JPEG thumbnails generated by ImageMagick will still work. Also, I appreciate the thinking behind the ImageMagick filter setting the bitstream description to I don't want to bite off more than I can chew, but it might be good to re-think this so that the format is configurable. Maybe some site wants to stick with JPEG. Maybe AVIF or JPEG-XL become viable. Etc... |
Ah, I just tested and it seems I misunderstood. The If we change the default to WebP then sites will automatically be missing the appropriate Even if we add So yes this needs more thought. You're right! I will experiment on our own site for implementation ideas and more corner cases. |
Rather than depending on magical description strings, shouldn't we key off of the MIME type? If there's a Bitstream in a THUMBNAIL bundle where the ORIGINAL bundle's file name is a prefix of its name, and its Or, if we want to support "manually" deposited thumbnails, then perhaps DSpace-derived Bitstreams need a "derived from" metadata field. There's probably a standardized name for this relationship in some well-known namespace. Then use this field as the "replace it" criterion. Use the above scheme in the case of missing metadata. Perhaps there should be an option to bypass the check, obliterate all matching Bitstreams, and derive a new one? (Or do we already have that?) |
Ugh, this is complex. For full generality, we need the "derived" relationship to be "derived(from, by)" so that "manual" deposits can be marked "derived(from, 'depositor')" and DSpace-derived bitstreams "derived(from, 'repository')". |
Yeah it's complex. I think the vast majority of sites don't customize their thumbnail setup at all, so we need to try to do the most sane thing by default and leave crazy stuff to over-involved admins like us 😝. The current system might be the sanest in that regard, but the comments in I think it is reasonable that a bitstream There's an "obliterate all matching Bitstreams" option (aka force), but the keyword there is "matching". In the case of the ImageMagick Thumbnail Filter, a thumbnail will only be replaced in "force" mode if:
Otherwise, the filter-media script assumes this is a manually uploaded thumbnail. And this does nothing for the case where we want to switch the default thumbnail format to WebP, because we blindly create |
Is your feature request related to a problem? Please describe.
The ImageMagick PDF thumbnail filter in DSpace versions up to 7.6 has several problems:
ImageMagickPdfThumbnailFilter.java
Describe the solution you'd like
First, doing a double lossy conversion is a waste of resources and an obvious bad practice when working with lossy codecs. I recently estimated an average drop of 1.2 points in the ssimulacra2 score due to generation loss. This is like making a photocopy of a photocopy.
Second, we should be using a more modern image format that allows similar visual quality with drastically reduced file sizes. I propose WebP, which requires an average of 33% fewer bits than JPEG to achieve the same visual quality and has broad support in web browsers and beyond. (Yes I know that WebP is not perfect and is already over ten years old, but the results speak for themselves).
On this second point, I have done an extensive evaluation of JPEG versus WebP and AVIF using a large sample of PDFs. The results can be summarized in this one plot of perceptual quality versus bits per pixel (BPP):
AVIF and WebP need less bits than JPEG to achieve the same visual quality
My full comparison, with methodology and source code, is here: Evaluating JPEG, WebP, and AVIF
Additional context
There has been some past discussion and work on the ImageMagick PDF thumbnail filter:
The text was updated successfully, but these errors were encountered: