Skip to content

Strip buffers after writing to disk to reduce memory usage (issue #302)#315

Closed
jornmineur wants to merge 1 commit into11ty:mainfrom
jornmineur:disable-memory-cache
Closed

Strip buffers after writing to disk to reduce memory usage (issue #302)#315
jornmineur wants to merge 1 commit into11ty:mainfrom
jornmineur:disable-memory-cache

Conversation

@jornmineur
Copy link
Copy Markdown

@jornmineur jornmineur commented Jan 16, 2026

EDIT: the approach in this PR didn't turn out to work well and is superseded by PR 319.

  • This PR tries to reduce memory usage by stripping buffers (which turned out to be a dead end).
  • The new PR does solve the problem by introducing a separate manifest cache.

This PR addresses issue #302 by removing image buffers from memory after files have been written to disk.

Problem
When processing many images, the memory cache retains image buffers even after they've been written to disk. This causes memory usage to grow linearly with the number of images processed, leading to builds failing on large sites.

Solution
After processing completes, strip the buffer property from results when:

  • The file was written to disk (outputPath exists)
  • Not in dryRun mode (which needs buffers)

The metadata (dimensions, URLs, paths) remains available for generating HTML.

Testing

  • All existing tests pass
  • Added test to verify buffers are stripped after writing to disk
  • Tested on a site with 1000+ images - build completes successfully

@Aankhen
Copy link
Copy Markdown

Aankhen commented Jan 18, 2026

Your approach makes perfect sense to me, but, bizarrely, I couldn’t build my blog using your branch because it kept running out of memory. I realized this was the first time I’d upgraded from v5 of the plugin, so I tried it with the upstream v6 and had the same issue. Increasing the heap size allowed me to complete that build successfully, though much more slowly (~120s vs ~40s) and using much more memory than with v5.

I tried the branch again. The runtime was about the same. I don’t have a scientific way to measure memory usage without creating a VM or container, but just eyeballing what the system tells me as it runs, I don’t any tangible difference. I don’t think Node is releasing the memory from the buffers, at least in my case. But I may be doing something unusual. I’m sorry to say all I can offer is this anecdotal evidence, heh. I hope someone else can try it with a normal image-heavy site.

@jornmineur
Copy link
Copy Markdown
Author

Great that you invested the time to test this, @Aankhen , much appreciated.

I'll take another look!

@jornmineur jornmineur marked this pull request as draft January 18, 2026 19:12
@jornmineur
Copy link
Copy Markdown
Author

jornmineur commented Jan 21, 2026

It looks like the problem runs deeper than I thought, which explains why the patch isn't working (not sure yet why it worked for me – perhaps I mixed something up on my end).

This is my updated understanding of how eleventy-image works:

  1. Image.create(src, options) is called

    • Creates new Image instance
    • Computes key via getInMemoryCacheKey() → calls getHash()reads file into #contents
    • Checks memCache.get(key) — if found, returns cached Image instance
    • If not found, adds Image instance to memCache
  2. img.queue() is called

    • Adds processing job to p-queue
    • Calls getInput() → returns #contents (already loaded in step 1)
    • Calls resize(input) → Sharp processes, writes to disk
    • Returns stats

So the hash is computed before checking the cache, and the file is read into memory as a side effect of computing the hash.

The cache lookup is:

  • Key: hash of (file contents + options)
  • Value: Image instance (which holds the buffer in #contents)

A hit means: "we've seen this exact image with these exact options before, here's the Image instance that's already processing or has processed it."

The problem: even on a cache hit, the new Image instance has already read the file into #contents just to compute the key. And the cached Image instance also has the buffer.

Hence the memory explosion.

This approach of comparing hash keys seems like a really safe approach, but it is also quite expensive.

Possibly it was done in a different way in v5 – haven't looked into that yet.

It could be done much faster, and with >99% cache hit accuracy, by keeping a cache file keeping track of image paths, mtime and file size. But reworking eleventy-img to achieve that doesn't look trivial to me.

I'll keep looking! :-)

@Aankhen
Copy link
Copy Markdown

Aankhen commented Jan 22, 2026

That’s interesting. Thank you for the detailed analysis. I guess the cache is really just to prevent processing images more than once in this model, not to prevent reading them. I wonder if there could be an option for an alternative hash function that uses only the filename and file stats without reading the contents.

@jornmineur
Copy link
Copy Markdown
Author

@Aankhen
I created a new version that appears to solve the problem.

The patched version is now running in production on my end without issues. Build times are very fast, and memory usage remains negligible even with a large image set.

If you'd be interested in giving it a try, I would be curious to hear how the patched version works for you.

The PR introduces two changes:

1. Persistent manifest cache

A new file .cache/eleventy-img-manifest.json stores metadata about processed images. On subsequent builds, eleventy-img checks this manifest first using the file’s path, modification time, and size. If nothing changed and the output files exist, it returns the cached metadata immediately without reading the image at all.

The manifest is conservative: any change in path, mtime, or size forces a full reprocess.

2. Avoids buffering images in JS memory for production builds

For local images in production mode, the source file path is passed directly to Sharp instead of reading the file into a buffer first. Sharp supports this natively, so the data flows from disk → Sharp → disk without ever sitting in JavaScript memory.

What This Means
• First build: Same behavior as before
• Subsequent builds: Effectively near-instant for unchanged images, with constant memory usage regardless of image count
• Changed images: Automatically detected via mtime/size and reprocessed
• Deleted manifest: Gracefully rebuilds
• Deleted output files: Detected and regenerated

When Does This Apply?

The optimization kicks in when producing local binary images (not SGV), in production mode (not dryRun, statsOnly, or transformOnRequest).

In all other cases, existing behavior remains unchanged.

No Changes Required

This works as a drop-in replacement. No configuration or code changes are needed; the manifest is managed automatically.

@zeroby0
Copy link
Copy Markdown
Contributor

zeroby0 commented Jan 23, 2026

This approach of comparing hash keys seems like a really safe approach, but it is also quite expensive.
It could be done much faster, and with >99% cache hit accuracy, by keeping a cache file keeping track of image paths, mtime and file size.

Calculating the hash of the images is really inexpensive. You can make it even cheaper if you have multiple GB of images by using xxhash instead of SHA.

I tried using mtime to skip hashing, but mtime is not so reliable. For example, consider building on Netlify. They restore the cache contents, but may not restore the mtime and other file attributes. If your CI/CD uses a build system that mounts files over network, mtime is unreliable. One might check mtime to detect if a file has possibly changed, but it shouldn't be relied on to detect if a file hasn't changed.

Hashing is a nice way to store the image and the options in a stateless way, it's really fast, and the filenames have a hash so the browser could "cache forever" in deployment.

Passing the filepaths directly to sharp is very nice, I wonder why this wasn't already done.

@jornmineur
Copy link
Copy Markdown
Author

jornmineur commented Jan 24, 2026

Hashing is a nice way to store the image and the options in a stateless way, it's really fast, and the filenames have a hash so the browser could "cache forever" in deployment.

That's a really good point @zeroby0!

I updated the code to use the content hash for cache validation instead of mtime+size, tested locally (not on Netlify), and pushed to the new PR.

What do you think?

@zeroby0
Copy link
Copy Markdown
Contributor

zeroby0 commented Jan 26, 2026

The premise and the methodology look good to me, but the code has changed so much since I last looked at it that I can't provide good feedback on whether this causes any unforeseen bugs.

But the worst that might happen is that some images might get processed again and again, so I'd say it's alright :D

@jornmineur
Copy link
Copy Markdown
Author

Thanks for the feedback @zeroby0 !

Let me close this PR 315 for good housekeeping since it's been superseded by PR 319, which includes your feedback.

This conversation is still available for reference as the conversation in PR 319 links back to this one.

@jornmineur jornmineur closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants