Skip to content

feat: add model warmup after deployment to reduce cold-start latency#11

Merged
alez007 merged 1 commit intomainfrom
feat/model-warmup
Apr 9, 2026
Merged

feat: add model warmup after deployment to reduce cold-start latency#11
alez007 merged 1 commit intomainfrom
feat/model-warmup

Conversation

@alez007
Copy link
Copy Markdown
Owner

@alez007 alez007 commented Apr 9, 2026

Run a minimal dummy inference request through each model right after start() completes. This pre-compiles CUDA kernels, allocates KV caches, and triggers any JIT/torch.compile tracing before the replica accepts real traffic.

Run a minimal dummy inference request through each model right after
start() completes. This pre-compiles CUDA kernels, allocates KV caches,
and triggers any JIT/torch.compile tracing before the replica accepts
real traffic.
@alez007 alez007 merged commit 4343adf into main Apr 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant