feat(image): provide fast resize method #34

Brooooooklyn · 2023-01-13T16:10:52Z

x86_64 (AVX2)

OS:  Windows 11 x86_64
Kernel: 10.0.22621
CPU: AMD Ryzen 9 5950X (32) @ 3.400GHz
Memory: 2535MiB / 32055MiB

sharp resize: 415.966ms
@napi-rs/image resize: 529.884ms
fast resize: 316.731ms

ARM64 (NEON)

OS: macOS 13.1 22C65 arm64
Host: MacBookPro18,2
Kernel: 22.2.0
CPU: Apple M1 Max
Memory: 8915MiB / 65536MiB

sharp resize: 616.549ms
@napi-rs/image resize: 525.776ms
fast resize: 431.185ms

vercel · 2023-01-13T16:10:56Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
image	❌ Failed (Inspect)			Jan 14, 2023 at 10:20AM (UTC)

Brooooooklyn · 2023-01-13T16:34:21Z

@lovell sharp (0.31.3) resize this image is extremely slow on my Linux environment, both wsl2 and Docker. Does it as expect?

await sharp(NASA)
  .resize(1024, 768, {
    kernel: sharp.kernel.lanczos3,
  })
  .png()
  .toBuffer()

sharp resize: 3.718s
@napi-rs/image resize: 525.389ms
fast resize: 345.407ms

lovell · 2023-01-13T17:41:58Z

Hi, that doesn't look right, thanks for alerting me to this.

The use of sequential read (rather than random access read) might help with such a large image, which you can try via:

- await sharp(NASA)
+ await sharp(NASA, { sequentialRead: true })

(I've been considering making this the default behaviour.)

If this doesn't help, I should have some time tomorrow to able to check/profile this image and see what's consuming all the CPU time. PNG encoding (rather than anything to do with resizing) would be my best guess right now.

As an aside, we're considering the use of the Highway library for SIMD in libvips (and therefore sharp), which should improve resize speed on e.g. AVX2 CPUs.

Brooooooklyn · 2023-01-14T03:19:25Z

With { sequentialRead: true }, the result seems to go normal:

sharp resize: 584.666ms
@napi-rs/image resize: 493.363ms
fast resize: 311.735ms

This can reduce memory usage and might improve performance on some systems. Related: Brooooooklyn/Image#34 (comment)

lovell · 2023-02-06T15:27:34Z

I took a look at the "avif" and "webp" benchmarks in this repo that take a JPEG input, auto-orient, resize, then encode as either WebP or AVIF. Both this library and sharp depend on libwebp and libaom for encoding so I'd expect the results to be closer than those published in the readme.

For the JPEG-to-WebP test, which uses with-exif.jpg as the input, calling @napi-rs/image via callgrind I see:

62,330,051 (12.95%)  ???:0x00000000001ca150 [node_modules/@napi-rs/image-linux-x64-gnu/image.linux-x64-gnu.node]
22,265,184 ( 4.63%)  ???:0x00000000001ca860 [node_modules/@napi-rs/image-linux-x64-gnu/image.linux-x64-gnu.node]
11,495,533 ( 2.39%)  ???:0x00000000002b1e50 [node_modules/@napi-rs/image-linux-x64-gnu/image.linux-x64-gnu.node]

and with sharp I see:

55,435,179 ( 8.15%)  ???:cmsReverseToneCurveEx [/usr/lib/x86_64-linux-gnu/liblcms2.so.2.0.12]
19,912,548 ( 2.93%)  libvips/resample/templates.h:vips_reduceh_gen(_VipsRegion*, void*, void*, void*, int*)
18,259,794 ( 2.68%)  ???:0x000000000003e650 [/usr/lib/x86_64-linux-gnu/liblcms2.so.2.0.12]
12,697,334 ( 1.87%)  ???:0x000000000001fba0 [/usr/lib/x86_64-linux-gnu/libwebp.so.7.1.3]
 9,651,522 ( 1.42%)  libvips/conversion/rot.c:vips_rot270_gen [/usr/local/lib/x86_64-linux-gnu/libvips.so.42.17.0]
 9,261,053 ( 1.36%)  ???:0x0000000000026f30 [/usr/lib/x86_64-linux-gnu/libjpeg.so.8.2.2]

So most of the CPU time is spent transforming linear RGB input pixel values to non-linear sRGB using the embedded ICC colour profile.

ColorSync color profile 2.2, type appl, RGB/XYZ-mntr device by appl, 1960 bytes, 25-2-2009 11:26:11 "Generic RGB Profile"

Manually removing the colour profile shows the calls to lcms2 are gone:

19,912,548 ( 3.44%)  libvips/resample/templates.h:vips_reduceh_gen(_VipsRegion*, void*, void*, void*, int*)
12,401,734 ( 2.14%)  ???:0x000000000001fba0 [/usr/lib/x86_64-linux-gnu/libwebp.so.7.1.3]
 9,651,522 ( 1.67%)  libvips/conversion/rot.c:vips_rot270_gen [/usr/local/lib/x86_64-linux-gnu/libvips.so.42.17.0]
 9,261,053 ( 1.60%)  ???:0x0000000000026f30 [/usr/lib/x86_64-linux-gnu/libjpeg.so.8.2.2]

Does @napi-rs/image support colour profiles? If not, it might be better to remove the ICC profile from with-exif.jpg and regenerate the results.

lovell · 2023-02-06T15:31:33Z

I also meant to add, as a result of these tests, I found a possible performance regression in sharp when resizing RGBA images, where pixel values were being cast from integer to float, which then meant a slower float-based shrink path was taken.

This is fixed via lovell/sharp@9e2207f and will be in sharp v0.32.0 so thank you for bringing it to my attention! It's great to see what you're doing here bringing the world of Rust to Node.js 👍

styfle · 2024-05-15T22:09:37Z

@lovell I found this discussion because I was looking through the history of Next.js and found this related PR:

Use sequentialRead while encoding images with sharp vercel/next.js#44881

In the latest version of sharp (0.33.3), is it still recommended to set { sequentialRead: true } unconditionally for all image inputs/outputs like this?

lovell · 2024-05-16T07:17:20Z

@styfle There's no need to set sequentialRead any more, the latest sharp now manages all that for you, although the option is left there for backwards compatibility for those who might want the non-sequential ("random access") behaviour. For Next.js I'd recommend removing it.

If sharp discovers use of an operation that does not support sequential read, it will insert a cache into the pipeline to store decoded pixel values and allow sequential read of the input to continue. Decoding the input once and once only is generally faster, but at the potential cost of slightly increased memory usage for some operations (e.g. rotation, horizontal flip, Gaussian blur).

https://github.com/search?q=repo%3Alovell%2Fsharp%20StaySequential&type=code

styfle · 2024-05-16T14:23:57Z

Thanks!

Decoding the input once and once only is generally faster

This is the typical workload for Next.js because it has a filesystem cache for optimized images, thus an image is only ever optimized once.

However, that same input image could be resized to different widths but those would be different requests so not sure if the sharp cache would be relevant here.

vercel bot had a problem deploying to Preview January 13, 2023 16:31 Failure

vercel bot had a problem deploying to Preview January 13, 2023 16:53 Failure

Brooooooklyn force-pushed the fast-resize branch from f9b561c to b6578d9 Compare January 14, 2023 10:00

vercel bot had a problem deploying to Preview January 14, 2023 10:02 Failure

Brooooooklyn mentioned this pull request Jan 14, 2023

Use sequentialRead while encoding images with sharp vercel/next.js#44881

Merged

Brooooooklyn force-pushed the fast-resize branch from b6578d9 to f9b561c Compare January 14, 2023 10:11

Brooooooklyn added 6 commits January 14, 2023 18:12

feat: provide a way to fast resize image with SIMD

5189064

Update

05dd179

Update x86_64-unknown-linux-gnu build

858efc5

Optimize

e07d778

Add fast resize to Transformer

0dfbd4c

Resolve conflict

9a0d792

Brooooooklyn force-pushed the fast-resize branch from f9b561c to 9a0d792 Compare January 14, 2023 10:19

vercel bot had a problem deploying to Preview January 14, 2023 10:20 Failure

kodiakhq bot pushed a commit to vercel/next.js that referenced this pull request Jan 14, 2023

Use sequentialRead while encoding images with sharp (#44881)

d61b076

This can reduce memory usage and might improve performance on some systems. Related: Brooooooklyn/Image#34 (comment)

Brooooooklyn changed the title ~~Fast resize~~ feat(image): provide fast resize method Jan 14, 2023

Brooooooklyn merged commit f52fd45 into main Jan 14, 2023

Brooooooklyn deleted the fast-resize branch January 14, 2023 10:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(image): provide fast resize method #34

feat(image): provide fast resize method #34

Brooooooklyn commented Jan 13, 2023 •

edited

Loading

vercel bot commented Jan 13, 2023 •

edited

Loading

Brooooooklyn commented Jan 13, 2023

lovell commented Jan 13, 2023

Brooooooklyn commented Jan 14, 2023

lovell commented Feb 6, 2023

lovell commented Feb 6, 2023

styfle commented May 15, 2024

lovell commented May 16, 2024

styfle commented May 16, 2024

feat(image): provide fast resize method #34

feat(image): provide fast resize method #34

Conversation

Brooooooklyn commented Jan 13, 2023 • edited Loading

x86_64 (AVX2)

ARM64 (NEON)

vercel bot commented Jan 13, 2023 • edited Loading

Brooooooklyn commented Jan 13, 2023

lovell commented Jan 13, 2023

Brooooooklyn commented Jan 14, 2023

lovell commented Feb 6, 2023

lovell commented Feb 6, 2023

styfle commented May 15, 2024

lovell commented May 16, 2024

styfle commented May 16, 2024

Brooooooklyn commented Jan 13, 2023 •

edited

Loading

vercel bot commented Jan 13, 2023 •

edited

Loading