Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make image compression idempotent #6506

Open
Bossett opened this issue Nov 18, 2024 · 1 comment
Open

Make image compression idempotent #6506

Bossett opened this issue Nov 18, 2024 · 1 comment
Labels
feature-request A request for a new feature

Comments

@Bossett
Copy link
Contributor

Bossett commented Nov 18, 2024

Describe the Feature

Images that already meet Bluesky's specifications are always recompressed, leading to images that degrade over time if re-uploaded. This means that each image is stored with different content hashes, provided by the CDN as a different file, etc. If matching images, a consumer needs to download and use a perceptual hash.

It would be useful to switch to an idempotent alternative, where images are processed in a way that preserves the original as much as possible.

This would enable:

  • Better caching
  • File hashes & cids could now be used to identify re-uploads; which is useful for labelling & identifying scraped content
  • Preservation of the highest quality images - including compression profiles for tools like photoshop that 'play nice' with the predictable settings to maximise perceived quality
  • Other cool things - search by image to identify a source, automatically proposed alt-text, etc.

Attachments

No response

Describe Alternatives

No response

Additional Context

I have used this in bash before with imagemagick's convert - and it does the job. I imagine similar settings can be find in whatever the conversion library is?

reencode_jpeg() {
  input_file="$1"
  output_file="$2"

  if [ -z "$output_file" ]; then
    echo "Usage: reencode_jpeg input.jpg output.jpg"
    return 1
  fi

  convert "$input_file" \
    -resize '2000x2000>' \
    -strip \
    -sampling-factor 4:2:0 \
    -interlace none \
    -colorspace sRGB \
    -define jpeg:optimize-coding=true \
    -define jpeg:dct-method=integer \
    -define jpeg:extent=2MB \
    "$output_file"
}
@Bossett Bossett added the feature-request A request for a new feature label Nov 18, 2024
@Bossett
Copy link
Contributor Author

Bossett commented Nov 18, 2024

ok looks like this is done here with sharp, and a quick look implies it may be fixed https://github.com/bluesky-social/atproto/blob/main/packages/bsky/src/image/sharp.ts#L40 by:

processor = processor.jpeg({
  quality: quality ?? 100,
  progressive: false,
  chromaSubsampling: '4:2:0',
  optimizeCoding: true,
  trellisQuantisation: false,
  overshootDeringing: false,
  optimizeScans: false,
  quantisationTable: 2,
})
.withMetadata(false)

I'll need to set up a test harness to do this myself with a PR (so pretty please, someone else, etc. etc.), would have to work out the PNG stuff, and make sure the min/max resizing further up is 'non destructive' to repeat images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A request for a new feature
Projects
None yet
Development

No branches or pull requests

1 participant