Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(image): provide fast resize method #34

Merged
merged 6 commits into from
Jan 14, 2023
Merged

feat(image): provide fast resize method #34

merged 6 commits into from
Jan 14, 2023

Conversation

Brooooooklyn
Copy link
Owner

@Brooooooklyn Brooooooklyn commented Jan 13, 2023

x86_64 (AVX2)

OS:  Windows 11 x86_64
Kernel: 10.0.22621
CPU: AMD Ryzen 9 5950X (32) @ 3.400GHz
Memory: 2535MiB / 32055MiB
sharp resize: 415.966ms
@napi-rs/image resize: 529.884ms
fast resize: 316.731ms

ARM64 (NEON)

OS: macOS 13.1 22C65 arm64
Host: MacBookPro18,2
Kernel: 22.2.0
CPU: Apple M1 Max
Memory: 8915MiB / 65536MiB
sharp resize: 616.549ms
@napi-rs/image resize: 525.776ms
fast resize: 431.185ms

@vercel
Copy link

vercel bot commented Jan 13, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
image ❌ Failed (Inspect) Jan 14, 2023 at 10:20AM (UTC)

@Brooooooklyn
Copy link
Owner Author

@lovell sharp (0.31.3) resize this image is extremely slow on my Linux environment, both wsl2 and Docker. Does it as expect?

await sharp(NASA)
  .resize(1024, 768, {
    kernel: sharp.kernel.lanczos3,
  })
  .png()
  .toBuffer()
sharp resize: 3.718s
@napi-rs/image resize: 525.389ms
fast resize: 345.407ms

@lovell
Copy link

lovell commented Jan 13, 2023

Hi, that doesn't look right, thanks for alerting me to this.

The use of sequential read (rather than random access read) might help with such a large image, which you can try via:

- await sharp(NASA)
+ await sharp(NASA, { sequentialRead: true })

(I've been considering making this the default behaviour.)

If this doesn't help, I should have some time tomorrow to able to check/profile this image and see what's consuming all the CPU time. PNG encoding (rather than anything to do with resizing) would be my best guess right now.

As an aside, we're considering the use of the Highway library for SIMD in libvips (and therefore sharp), which should improve resize speed on e.g. AVX2 CPUs.

@Brooooooklyn
Copy link
Owner Author

With { sequentialRead: true }, the result seems to go normal:

sharp resize: 584.666ms
@napi-rs/image resize: 493.363ms
fast resize: 311.735ms

kodiakhq bot pushed a commit to vercel/next.js that referenced this pull request Jan 14, 2023
This can reduce memory usage and might improve performance on some systems.

Related: Brooooooklyn/Image#34 (comment)
@Brooooooklyn Brooooooklyn changed the title Fast resize feat(image): provide fast resize method Jan 14, 2023
@Brooooooklyn Brooooooklyn merged commit f52fd45 into main Jan 14, 2023
@Brooooooklyn Brooooooklyn deleted the fast-resize branch January 14, 2023 10:39
@lovell
Copy link

lovell commented Feb 6, 2023

I took a look at the "avif" and "webp" benchmarks in this repo that take a JPEG input, auto-orient, resize, then encode as either WebP or AVIF. Both this library and sharp depend on libwebp and libaom for encoding so I'd expect the results to be closer than those published in the readme.

For the JPEG-to-WebP test, which uses with-exif.jpg as the input, calling @napi-rs/image via callgrind I see:

62,330,051 (12.95%)  ???:0x00000000001ca150 [node_modules/@napi-rs/image-linux-x64-gnu/image.linux-x64-gnu.node]
22,265,184 ( 4.63%)  ???:0x00000000001ca860 [node_modules/@napi-rs/image-linux-x64-gnu/image.linux-x64-gnu.node]
11,495,533 ( 2.39%)  ???:0x00000000002b1e50 [node_modules/@napi-rs/image-linux-x64-gnu/image.linux-x64-gnu.node]

and with sharp I see:

55,435,179 ( 8.15%)  ???:cmsReverseToneCurveEx [/usr/lib/x86_64-linux-gnu/liblcms2.so.2.0.12]
19,912,548 ( 2.93%)  libvips/resample/templates.h:vips_reduceh_gen(_VipsRegion*, void*, void*, void*, int*)
18,259,794 ( 2.68%)  ???:0x000000000003e650 [/usr/lib/x86_64-linux-gnu/liblcms2.so.2.0.12]
12,697,334 ( 1.87%)  ???:0x000000000001fba0 [/usr/lib/x86_64-linux-gnu/libwebp.so.7.1.3]
 9,651,522 ( 1.42%)  libvips/conversion/rot.c:vips_rot270_gen [/usr/local/lib/x86_64-linux-gnu/libvips.so.42.17.0]
 9,261,053 ( 1.36%)  ???:0x0000000000026f30 [/usr/lib/x86_64-linux-gnu/libjpeg.so.8.2.2]

So most of the CPU time is spent transforming linear RGB input pixel values to non-linear sRGB using the embedded ICC colour profile.

ColorSync color profile 2.2, type appl, RGB/XYZ-mntr device by appl, 1960 bytes, 25-2-2009 11:26:11 "Generic RGB Profile"

Manually removing the colour profile shows the calls to lcms2 are gone:

19,912,548 ( 3.44%)  libvips/resample/templates.h:vips_reduceh_gen(_VipsRegion*, void*, void*, void*, int*)
12,401,734 ( 2.14%)  ???:0x000000000001fba0 [/usr/lib/x86_64-linux-gnu/libwebp.so.7.1.3]
 9,651,522 ( 1.67%)  libvips/conversion/rot.c:vips_rot270_gen [/usr/local/lib/x86_64-linux-gnu/libvips.so.42.17.0]
 9,261,053 ( 1.60%)  ???:0x0000000000026f30 [/usr/lib/x86_64-linux-gnu/libjpeg.so.8.2.2]

Does @napi-rs/image support colour profiles? If not, it might be better to remove the ICC profile from with-exif.jpg and regenerate the results.

@lovell
Copy link

lovell commented Feb 6, 2023

I also meant to add, as a result of these tests, I found a possible performance regression in sharp when resizing RGBA images, where pixel values were being cast from integer to float, which then meant a slower float-based shrink path was taken.

This is fixed via lovell/sharp@9e2207f and will be in sharp v0.32.0 so thank you for bringing it to my attention! It's great to see what you're doing here bringing the world of Rust to Node.js 👍

@styfle
Copy link

styfle commented May 15, 2024

@lovell I found this discussion because I was looking through the history of Next.js and found this related PR:

In the latest version of sharp (0.33.3), is it still recommended to set { sequentialRead: true } unconditionally for all image inputs/outputs like this?

@lovell
Copy link

lovell commented May 16, 2024

@styfle There's no need to set sequentialRead any more, the latest sharp now manages all that for you, although the option is left there for backwards compatibility for those who might want the non-sequential ("random access") behaviour. For Next.js I'd recommend removing it.

If sharp discovers use of an operation that does not support sequential read, it will insert a cache into the pipeline to store decoded pixel values and allow sequential read of the input to continue. Decoding the input once and once only is generally faster, but at the potential cost of slightly increased memory usage for some operations (e.g. rotation, horizontal flip, Gaussian blur).

https://github.com/search?q=repo%3Alovell%2Fsharp%20StaySequential&type=code

@styfle
Copy link

styfle commented May 16, 2024

Thanks!

Decoding the input once and once only is generally faster

This is the typical workload for Next.js because it has a filesystem cache for optimized images, thus an image is only ever optimized once.

However, that same input image could be resized to different widths but those would be different requests so not sure if the sharp cache would be relevant here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants