Creating standardized apis for inframework and hf deployment #302

pthombre · 2025-08-05T02:59:59Z

Creating standardized APIs for inframework and HF deployment. These will be used in the deploy scripts and the NeMo Eval repo.

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

copy-pr-bot · 2025-08-05T03:00:03Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

oyilmaz-nvidia

This looks very good to me. Hoping PyTriton in-frameowkr can also have the same structure for multi-GPU after this PR :)

nemo_deploy/nlp/hf_deployable_ray.py

athitten · 2025-08-07T19:07:21Z

nemo_deploy/deploy_ray.py

+        enable_flash_decode: bool = False,
+        legacy_ckpt: bool = False,
+        max_batch_size: int = 32,
+        random_seed: Optional[int] = None


Should we add max_ongoing_requests for in-fw as well ?

Would this this be still required if we expose max_ongoing_requests to the user ?

Also how does max_ongoing_requests work ? Is it per replica and is a batch of requests for ex if bs=8 then 8 requests together considered as 1 request by the max_ongoing_requests arg ?

Ignore the question on per replica. Just saw that it is per replica

athitten · 2025-08-07T19:19:56Z

Thank you so much @pthombre for the PR! Overall LGTM. Left some comments. Lmk once they are addressed and tests are added, I can approve.

athitten · 2025-08-07T19:23:01Z

nemo_deploy/nlp/hf_deployable_ray.py

        Args:
            device_map (str): The device mapping strategy ('auto', 'balanced', etc.)
        """
-        if device_map == "balanced" or device_map == "auto":


Also is the device_map removed to resolve the issue with latest transformers version ?

Yes. The default value of device_map is None. User's can pass whatever value they need to support their deployment now

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

pthombre · 2025-08-12T01:52:49Z

/ok to test 029119b

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

pthombre · 2025-08-13T00:47:03Z

/ok to test 2a51fc7

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

pthombre · 2025-08-14T05:32:29Z

/ok to test 42fceaf

Creating standardized apis for inframework and hf deployment

3d71d0c

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

pthombre requested a review from athitten August 5, 2025 03:00

pthombre added 2 commits August 4, 2025 20:40

Bugfixes in the deployment scripts

429f071

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

Adding API for trtllm deployment

00285ec

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

oyilmaz-nvidia approved these changes Aug 5, 2025

View reviewed changes

athitten reviewed Aug 7, 2025

View reviewed changes

nemo_deploy/nlp/hf_deployable_ray.py Show resolved Hide resolved

athitten reviewed Aug 7, 2025

View reviewed changes

pthombre added 3 commits August 8, 2025 16:28

Adding functional tests for nemo and hf deployment using Ray

dfe0b21

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

Adding a functional test for trtllm ray deployment

5035d0f

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

Adding missing copywright to ray utils file

029119b

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci August 12, 2025 01:52 Inactive

pthombre marked this pull request as ready for review August 13, 2025 00:28

pthombre requested review from Laplasjan107 and janekl as code owners August 13, 2025 00:28

Adding updated unit tests for the Ray Deploy class

2a51fc7

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci August 13, 2025 00:47 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci August 13, 2025 23:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci August 13, 2025 23:49 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci August 13, 2025 23:55 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci August 13, 2025 23:55 Failure

copy-pr-bot bot temporarily deployed to nemo-ci August 13, 2025 23:55 Inactive

Fixing failing functional test

42fceaf

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci August 14, 2025 05:32 Inactive

copy-pr-bot bot temporarily deployed to test August 14, 2025 05:33 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci August 14, 2025 05:42 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci August 14, 2025 05:42 Error

copy-pr-bot bot temporarily deployed to nemo-ci August 14, 2025 05:42 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci August 14, 2025 06:02 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci August 14, 2025 06:55 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci August 14, 2025 07:14 Inactive

pthombre merged commit 62485cc into main Aug 14, 2025
59 of 60 checks passed

pthombre deleted the pranav/uniform_deployment_apis branch August 14, 2025 07:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating standardized apis for inframework and hf deployment #302

Creating standardized apis for inframework and hf deployment #302

Uh oh!

pthombre commented Aug 5, 2025

Uh oh!

copy-pr-bot bot commented Aug 5, 2025

Uh oh!

oyilmaz-nvidia left a comment

Uh oh!

Uh oh!

athitten Aug 7, 2025

Uh oh!

athitten Aug 7, 2025 •

edited

Loading

Uh oh!

athitten commented Aug 7, 2025 •

edited

Loading

Uh oh!

athitten Aug 7, 2025

Uh oh!

pthombre Aug 8, 2025

Uh oh!

pthombre commented Aug 12, 2025

Uh oh!

pthombre commented Aug 13, 2025

Uh oh!

pthombre commented Aug 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Creating standardized apis for inframework and hf deployment #302

Creating standardized apis for inframework and hf deployment #302

Uh oh!

Conversation

pthombre commented Aug 5, 2025

Uh oh!

copy-pr-bot bot commented Aug 5, 2025

Uh oh!

oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

athitten Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

athitten Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

athitten commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

athitten Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

pthombre Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

pthombre commented Aug 12, 2025

Uh oh!

pthombre commented Aug 13, 2025

Uh oh!

pthombre commented Aug 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

athitten Aug 7, 2025 •

edited

Loading

athitten commented Aug 7, 2025 •

edited

Loading