feat(core): persist asset fingerprinting cache by rix0rrr · Pull Request #37822 · aws/aws-cdk

rix0rrr · 2026-05-11T09:05:49Z

Asset fingerprinting is now at the highest possible speed (dominated by single-threaded reading of all files), and yet it can still take a lot of time to fingerprint a large directory (~13s to do ~37k files on my machine).

We already used to have an in-memory fingerprinting cache to speed up multiple fingerprints of the same files.

This PR now persists that cache across executions, to bring the same speed to re-synths, bringing the time down to ~3s (a ~75% reduction).

The cache file itself has a maximum number of entries, and so does the CDK cache subdirectory
(~/.cdk/cache/fingerprints) that holds the set of all possible fingerprints.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

During some recent investigations package fingerprinting was found to do a bunch of duplicate work (mostly `stat`ting the same files over and over again). That and a number of minor tweaks bring the fingerprinting time of one specific directory down from ~19s to ~13s (about ~30% improvement), while maintaining byte-for-byte compatibility with the previous implementation. In a future change, we will investigate changes that are allowed to change the hash to improve the performance even more.

Asset fingerprinting is now at the highest possible speed, and yet it can still take a lot of time to fingerprint a large directory (~13s to do ~4k files on my machine). We already used to have an in-memory fingerprinting cache to speed up multiple fingerprints of the same files. This PR now persists that cache across executions, to bring the same speed to re-synths. The cache file itself has a maximum number of entries, and so does the CDK cache subdirectory (`~/.cdk/cache/fingerprints`) that holds the set of all possible fingerprints.

github-actions · 2026-05-11T09:07:23Z

⚠️ This pull request description does not follow the correct template structure.

PRs without a linked issue will receive lower priority for review and merging. Please update the description to follow the PR template and include a line like Closes #123 in the Issue section. If no existing issue matches your change, create one first.

aws-cdk-automation

(This review is outdated)

…nto huijbers/asset-fingerprinting-cache

kumvprat

Curious to see the improvement with these new changes

Do we need to add new unit tests to test out the behaviour of this new directory based cached or the existing tests already cover this ?

kumvprat · 2026-05-13T08:29:34Z

+        }
+      }
+    } catch {
+      // Cache file doesn't exist or is corrupt — start fresh


What happens here ? Should we delete the corrupt file if it exists so that the data can be generated fresh and cached on next save call ?

That happens automatically. If the file is corrupt, then one the next save() we will overwrite it with a good file.

Should we have some log lines that tell us about cache hit/misses ? If this already being logged inside the logic in fingerprint.ts it should be okay

I don't really want to do this right now. There's no real place for that information to go at the moment. I definitely don't want to send it to stdout or stderr, because it will interfere with user-controlled app output. There is also no place yet in telemetry for us to send non-timing numbers.

Both of those seem too heavy a lift for the current PR. And ultimately we mostly care about the duration (which we will have numbers on) and I'm pretty confident that the duration is going to linearly correlate with cache hit rate.

So let me turn the question around: what future decisions are you thinking of making from that cache hit rate?

The idea was to directly correlate any synth time improvements to this change, without logging or telemetry data how can we definitely attribute that this change led to improvements

Asking again: what future decisions would we make based on that?

The only one I can see is "take the code out again". I suppose that is fair. We could take it out at any point if we want to, because it doesn't affect functionality.

If we ever think this code is producing more problems than it's worth maintaining, we can always instrument it then to see if it's pulling its weight.

Unfortunately, we will not have before/after telemetry to compare, so we can't see the impact of the change to pat ourselves on the back for a job well done. But we've run CDK for years without that kind of telemetry, and we've done an okay job, I would say. I think we'll manage.

In the mean time, we will be looking at the telemetry of synthesis times and duration hotspots and continuing to drive those down.

My main concern here is not whether we congratulate ourselves for a job well done or badly done.

Like you said with telemetry we can determine if later we want to keep the code or not, my counter question would be : If this code is not pulling it's weight in near or long term future, why this optimisation is not working as we expect it to work? And the answer to that question is almost always hidden in the opaque implementation changes we do to cdk. Opaque not in the sense to customers but opaque to telemetry/metrics.
If a proper tracking mechanism via telemetry is harder to implement or beyond the scope of this PR, I agree maybe we tackle it together and have proper tracking mechanism so that we become pro-active in these kind of optimisation opportunities.

Again, nothing against the changes here but advocating for better instrumentation, that's all

rix0rrr · 2026-05-13T08:58:03Z

Do we need to add new unit tests to test out the behaviour of this new directory based cached or the existing tests already cover this ?

My plan was indeed to say: the current existing tests and integ tests are exercising this code path, ensuring that it doesn't break any functionality.

Otherwise, all tests I really want to add are end-to-end. I don't want to add mocks (or whatever) than confirm that "exactly this code path is followed", because those end up brittle against code changes. The real functionality we would test is "the second time you call fingerprint() on the same directory is faster"... but that is a timing test which is easily disturbed by machine load, which ends up as a flaky test.

So I opted for no test and manual confirmation.

mrgrain

Help me understand this @rix0rrr

The large file fingerprint cache is effectively replacing hash(content) with a faster hash(inode+mtime+size). Previously this was kept in memory, to avoid re-caching large files for a second time in case we have lookups and the synth loop needs to run multiple time.

If I understand it correctly, this change seems to propose:

we now cache the hash of all files
we now persist this cache across multiple executions

Doesn't this effectively mean we will always use hash(inode+mtime+size)? Sure the first time we see a file we calculate it's hash, but after that we seem to not care anymore. Why not just switch to hash(inode+mtime+size) in general?

rix0rrr · 2026-05-13T09:38:36Z

The large file fingerprint cache is effectively replacing hash(content) with a faster hash(inode+mtime+size).

Well it shouldn't be that, so I sure hope that's not what accidentally happened 👀 . Let me double-check the code to be sure.

We are still calculating and outputting the hash of the contents of the target file.

What is supposed to be happening is that the key into the cache table is hash(inode+mtime+size).

Why the hash() of those values? Shrug, good question. That was already there. I suppose a simple concatenation of those values would also be just fine, and saves a hashing operation so slightly faster. Why not.
Why inode and not file name? I think this is done for quick invalidation. A different inode could obviously not be the same file, so that's an easy way to force a re-read.
Why store fields that invalidate the cache in the key and not the value? In other words, why { [inode+mtime+size] -> hash } and not { inode -> [mtime, size, hash] }? Just another easy way of invalidating: if we have a cache hit we know it's good, and we don't have to have another if to check the additional fields.

Why not just switch to hash(inode+mtime+size) in general?

Because that's not portable. The same file on a different computer would most likely have a different mtime and definitely would have a different inode. So snapshot tests would basically always fail. We are still fingerprinting the file contents, it's just that the (per-machine) cache uses (per-machine) filesystem metadata.

satisfied

mrgrain · 2026-05-13T09:53:30Z

Because that's not portable. The same file on a different computer would most likely have a different mtime and definitely would have a different inode. So snapshot tests would basically always fail. We are still fingerprinting the file contents, it's just that the (per-machine) cache uses (per-machine) filesystem metadata.

Make sense 👍🏻

Why the hash() of those values? Shrug, good question. That was already there. I suppose a simple concatenation of those values would also be just fine, and saves a hashing operation so slightly faster. Why not.

I can tell you this: It's the standard way to quickly check if a file changed without hashing its content and when you don't care that much about accuracy (e.g. because a different slower process will catch a change later on).

rix0rrr · 2026-05-13T09:57:05Z

I can tell you this: It's the standard way to quickly check if a file changed without hashing its content and when you don't care that much about accuracy (e.g. because a different slower process will catch a change later on).

Well yeah, but you can also just compare prev_inode + prev_time + prev_size === cur_inode + cur_time + cur_size. No need to do hash(prev_inode + prev_time + prev_size) === hash(cur_inode + cur_time + cur_size).

In fact it wasn't hash(), it was JSON.stringify(), which is more akin to the plain comparison. But equally unnecessary, I replaced it with a string concat.

mergify · 2026-05-13T10:29:09Z

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

mergify · 2026-05-13T10:29:14Z

mergify · 2026-05-13T10:59:27Z

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

github-actions · 2026-05-13T10:59:53Z

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

rix0rrr added 2 commits May 8, 2026 15:35

rix0rrr temporarily deployed to automation May 11, 2026 09:05 — with GitHub Actions Inactive

rix0rrr had a problem deploying to automation May 11, 2026 09:05 — with GitHub Actions Failure

rix0rrr added the pr-linter/exempt-integ-test The PR linter will not require integ test changes label May 11, 2026

rix0rrr temporarily deployed to automation May 11, 2026 09:06 — with GitHub Actions Inactive

github-actions Bot added the p2 label May 11, 2026

rix0rrr temporarily deployed to automation May 11, 2026 09:06 — with GitHub Actions Inactive

mergify Bot added the contribution/core This is a PR that came from AWS. label May 11, 2026

mergify Bot temporarily deployed to automation May 11, 2026 09:06 Inactive

aws-cdk-automation previously requested changes May 11, 2026

View reviewed changes

Do not accumulate tempdirs

7108baf

rix0rrr temporarily deployed to automation May 11, 2026 09:30 — with GitHub Actions Inactive

rix0rrr added 2 commits May 11, 2026 11:50

OFix test on build

2dcbf99

Merge remote-tracking branch 'origin/huijbers/fingerprinting-speed' i…

623a802

…nto huijbers/asset-fingerprinting-cache

rix0rrr added pr-linter/exempt-readme The PR linter will not require README changes pr-linter/exempt-test The PR linter will not require test changes labels May 11, 2026

rix0rrr temporarily deployed to automation May 11, 2026 09:52 — with GitHub Actions Inactive

aws-cdk-automation temporarily deployed to automation May 11, 2026 12:49 — with GitHub Actions Inactive

rix0rrr changed the title ~~feat: persist asset fingerprinting cache~~ feat(core): persist asset fingerprinting cache May 13, 2026

rix0rrr temporarily deployed to automation May 13, 2026 08:04 — with GitHub Actions Inactive

kumvprat reviewed May 13, 2026

View reviewed changes

tempfile + rename

b3f66f8

rix0rrr temporarily deployed to automation May 13, 2026 08:55 — with GitHub Actions Inactive

mrgrain previously requested changes May 13, 2026

View reviewed changes

Save a stringify operation from the cache key

66d6291

rix0rrr temporarily deployed to automation May 13, 2026 09:41 — with GitHub Actions Inactive

mrgrain self-requested a review May 13, 2026 09:51

kumvprat approved these changes May 13, 2026

View reviewed changes

Merge branch 'main' into huijbers/asset-fingerprinting-cache

b9b8573

mergify Bot temporarily deployed to automation May 13, 2026 10:29 Inactive

mergify Bot merged commit 605a776 into main May 13, 2026
19 of 20 checks passed

mergify Bot deleted the huijbers/asset-fingerprinting-cache branch May 13, 2026 10:59

github-actions Bot locked as resolved and limited conversation to collaborators May 13, 2026

aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label May 13, 2026

aws-cdk-automation temporarily deployed to automation May 13, 2026 11:01 — with GitHub Actions Inactive

Conversation

rix0rrr commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aws-cdk-automation left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kumvprat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kumvprat May 13, 2026

Choose a reason for hiding this comment

Uh oh!

rix0rrr May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kumvprat May 13, 2026

Choose a reason for hiding this comment

Uh oh!

rix0rrr May 13, 2026

Choose a reason for hiding this comment

Uh oh!

kumvprat May 13, 2026

Choose a reason for hiding this comment

Uh oh!

rix0rrr May 13, 2026

Choose a reason for hiding this comment

Uh oh!

kumvprat May 13, 2026

Choose a reason for hiding this comment

Uh oh!

rix0rrr commented May 13, 2026

Uh oh!

mrgrain left a comment

Choose a reason for hiding this comment

Uh oh!

rix0rrr commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrgrain commented May 13, 2026

Uh oh!

rix0rrr commented May 13, 2026

Uh oh!

mergify Bot commented May 13, 2026

Uh oh!

mergify Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Queue Status

Uh oh!

mergify Bot commented May 13, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rix0rrr commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading

aws-cdk-automation left a comment •

edited

Loading

rix0rrr May 13, 2026 •

edited

Loading

rix0rrr commented May 13, 2026 •

edited

Loading

mergify Bot commented May 13, 2026 •

edited

Loading