Enhance performance of batch inferencing tutorial with vllm and running on L40s and H100 GPUs #247

jeremiaswerner · 2025-11-14T14:39:12Z

Following changes have been performed:

rework the application code to use vLLM
use the Granite-4.0-Micro model from Huggingface
store the model on the input store as a model cache to speed up subsequent runs
support for concurrency on H100 to support larger models

As a result, this tutorial can now support high-throughput batch inferencing use cases with any LLM.

Rendered version:
https://github.com/IBM/CodeEngine/blob/4a58342856c3828b7477f67af9aa2c8338d41a47/serverless-fleets/tutorials/inferencing/README.md

Since I added 8000 recipes pls review at a per commit basis

reggeenr

LGTM

jeremiaswerner added 7 commits November 11, 2025 11:00

add more recipies

b641b0a

add more batches

c198e36

provide example fo l40s and h100 example

497ff02

adjust architecture documents

6063a98

rework application to use vLLM

e861926

provide helper script to create commands.jsonl

7393e34

rework to tutorial to use vLLM with L40s and H100

f144253

jeremiaswerner self-assigned this Nov 14, 2025

jeremiaswerner requested a review from reggeenr November 14, 2025 14:42

Share observations

4a58342

reggeenr approved these changes Nov 17, 2025

View reviewed changes

jeremiaswerner merged commit b680aa8 into IBM:main Nov 19, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance performance of batch inferencing tutorial with vllm and running on L40s and H100 GPUs #247

Enhance performance of batch inferencing tutorial with vllm and running on L40s and H100 GPUs #247

Uh oh!

jeremiaswerner commented Nov 14, 2025 •

edited

Loading

Uh oh!

reggeenr left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enhance performance of batch inferencing tutorial with vllm and running on L40s and H100 GPUs #247

Enhance performance of batch inferencing tutorial with vllm and running on L40s and H100 GPUs #247

Uh oh!

Conversation

jeremiaswerner commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reggeenr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeremiaswerner commented Nov 14, 2025 •

edited

Loading