Move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager #25276

feich-ms · 2025-07-03T08:05:39Z

Description

This PR is to move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager.

Motivation and Context

The OnRefresh is executed after a batch(16) ep runs and inside the batch runs, the buffer can not be really reused which is a waste for gpu buffer resources. This PR proposed a strightforward optimization that release or cache the buffer early in ReleaseBuffer instead of OnRefresh to improve the buffer cache or release efficiency which will improve the peak and average GPU memory usage. The experimental result also shows a reasonable memory optimization without perf regressions.

Phi3

Optimization Strategy	Peak Memory (MB)	Avg Memory (MB)	Token Gen Latency (ms)	Tokens/sec
Default Bucket	3603.83	3127.05	7.17	139.50
Default Bucket with Early Release Optimization	3534.77 (+1.92%)	3073.97 (+1.70%)	7.14 (+0.36%)	140.01 (+0.36%)

Deepseek-R1

Optimization Strategy	Peak Memory (MB)	Avg Memory (MB)	Token Gen Latency (ms)	Tokens/sec
Default Bucket	2089.03	1716.15	6.07	164.67
Default Bucket with Early Release Optimization	2034.00 (+2.63%)	1674.49 (+2.43%)	6.09 (-0.20%)	164.34 (-0.20%)

LLama3.2-1B

Optimization Strategy	Peak Memory (MB)	Avg Memory (MB)	Token Gen Latency (ms)	Tokens/sec
Default Bucket	1736.03	1424.64	3.37	296.53
Default Bucket with Early Release Optimization	1659.78 (+4.39%)	1366.78 (+4.06%)	3.41 (-1.09%)	293.34 (-1.08%)

feich-ms · 2025-07-03T09:53:24Z

Hi @fs-eire, @guschmue, this is to improve the buffer reuse in batch runs in BucketCacheMode, can you help to review? Cc @qjia7.

qjia7 · 2025-07-04T00:17:22Z

@guschmue Please help check the changes' correctness with your 90 models. It reuses the storage buffers in one batch (16 dispatches) compared with before.

fs-eire · 2025-07-04T03:40:11Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-07-04T03:40:31Z

Azure Pipelines successfully started running 5 pipeline(s).

guschmue · 2025-07-07T15:26:01Z

I can run some tests on it.

fs-eire · 2025-07-08T07:36:17Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-07-08T07:36:38Z

Azure Pipelines successfully started running 5 pipeline(s).

release or cache buffer early in Release not OnRefresh

a3fe4a1

guschmue added the ep:WebGPU ort-web webgpu provider label Jul 3, 2025

fs-eire previously approved these changes Jul 4, 2025

View reviewed changes

Merge branch 'main' into feich-ms/reuse_buffers_in_batch

15db811

feich-ms dismissed fs-eire’s stale review via 15db811 July 8, 2025 05:47

fs-eire approved these changes Jul 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager #25276

Move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager #25276

feich-ms commented Jul 3, 2025 •

edited

Loading

Uh oh!

feich-ms commented Jul 3, 2025 •

edited

Loading

Uh oh!

qjia7 commented Jul 4, 2025

Uh oh!

fs-eire commented Jul 4, 2025

Uh oh!

azure-pipelines bot commented Jul 4, 2025

Uh oh!

guschmue commented Jul 7, 2025

Uh oh!

fs-eire commented Jul 8, 2025

Uh oh!

azure-pipelines bot commented Jul 8, 2025

Uh oh!

Uh oh!

Move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager #25276

Are you sure you want to change the base?

Move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager #25276

Conversation

feich-ms commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Phi3

Deepseek-R1

LLama3.2-1B

Uh oh!

feich-ms commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qjia7 commented Jul 4, 2025

Uh oh!

fs-eire commented Jul 4, 2025

Uh oh!

azure-pipelines bot commented Jul 4, 2025

Uh oh!

guschmue commented Jul 7, 2025

Uh oh!

fs-eire commented Jul 8, 2025

Uh oh!

azure-pipelines bot commented Jul 8, 2025

Uh oh!

Uh oh!

feich-ms commented Jul 3, 2025 •

edited

Loading

feich-ms commented Jul 3, 2025 •

edited

Loading