feat: add model warmup after deployment to reduce cold-start latency by alez007 · Pull Request #11 · alez007/modelship

alez007 · 2026-04-09T09:41:12Z

Run a minimal dummy inference request through each model right after start() completes. This pre-compiles CUDA kernels, allocates KV caches, and triggers any JIT/torch.compile tracing before the replica accepts real traffic.

alez007 merged commit 4343adf into main Apr 9, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add model warmup after deployment to reduce cold-start latency#11

feat: add model warmup after deployment to reduce cold-start latency#11
alez007 merged 1 commit intomainfrom
feat/model-warmup

alez007 commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alez007 commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant