Skip to content

Refactor(PrefixCache): New load API, per-layer Tries, async ops & stats#269

Merged
copybara-service[bot] merged 1 commit intomainfrom
yuyan-prefix-cache
May 12, 2025
Merged

Refactor(PrefixCache): New load API, per-layer Tries, async ops & stats#269
copybara-service[bot] merged 1 commit intomainfrom
yuyan-prefix-cache

Conversation

@yuyanpeng-google
Copy link
Copy Markdown
Collaborator

Add async to prevent device_get blocking on the critical paths waiting prefill result. Use per-layer tries to prevent load cache from DRAM when common length tie. Add statistic for debug and benchmark.

Copy link
Copy Markdown
Collaborator

@vipannalla vipannalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, can you share benchmark results before and with this PR and what metrics did this improve?

Comment thread jetstream/core/prefix_cache.py Outdated
@github-actions github-actions Bot added the pull ready This label is needed if we want the copybara service to auto sync it to g3. label May 7, 2025
@yuyanpeng-google yuyanpeng-google force-pushed the yuyan-prefix-cache branch 2 times, most recently from fc0d025 to 8e22444 Compare May 8, 2025 09:49
Add async to prevent device_get blocking on the critical paths waiting prefill result.
Use per-layer tries to prevent load cache from DRAM when common length tie.
Add statistic for debug and benchmark.
@yuyanpeng-google
Copy link
Copy Markdown
Collaborator Author

Looks good, can you share benchmark results before and with this PR and what metrics did this improve?

There is no formally result before this PR. There is just some causal small experiments and found that the device_get would block at the critical path. After this PR there is a first version benchmark result in b/397854862

@copybara-service copybara-service Bot merged commit 2756c6f into main May 12, 2025
6 checks passed
@copybara-service copybara-service Bot deleted the yuyan-prefix-cache branch May 12, 2025 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pull ready This label is needed if we want the copybara service to auto sync it to g3.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants