Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MegatronGPTDeployable for serving .nemo models in pytriton #8958

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

jukim-nv
Copy link

@jukim-nv jukim-nv commented Apr 17, 2024

What does this PR do ?

We currently only support deploying TRTLLM models in the Triton inference server. This MR adds the necessary support to let us deploy .nemo models to Triton as well.

Collection: deploy

Changelog

  • Adds MegatronLLMDeployable class which implements ITritonDeployable

Usage

  • Usage is shown in the new tests/deploy/test_triton_deployable.py
    megatron_deployable = MegatronLLMDeployable(args.nemo_checkpoint, args.num_gpus)
    nm = DeployPyTriton(
        model=megatron_deployable,
        triton_model_name=model_name,
        triton_model_version=1,
        max_batch_size=8,
        port=8000,
        address="0.0.0.0",
        streaming=False,
    )
    nm.deploy()
    nm.run()

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Copy link
Collaborator

@oyilmaz-nvidia oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test similar to this https://github.com/jukim-nv/NeMo/blob/megatrongpt_deployable/tests/export/test_nemo_export.py under tests/deploy/test_deploy_pytriton.py?

And in that test, can we test the class you created?

LOGGER = logging.getLogger("NeMo")


class MegatronGPTDeployable(ITritonDeployable):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this name to PyTritonDeployableLLM and move this file under nemo/deploy/nlp

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to MegatronLLMDeployable for now. As discussed offline, whatever name we decide on for this class should match the Model class that we create inside of it. So if the final name is MegatronLLMDeployable, the internal self.model should be of type MegatronLLMModel.

nemo/deploy/tritondeployables/megatrongpt_deployable.py Outdated Show resolved Hide resolved
tests/deploy/test_triton_deployable.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@oyilmaz-nvidia oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add this pytorch level deployment in this script https://github.com/NVIDIA/NeMo/blob/main/scripts/deploy/nlp/deploy_triton.py please?

@oyilmaz-nvidia oyilmaz-nvidia marked this pull request as ready for review May 13, 2024 17:30
oyilmaz-nvidia and others added 2 commits May 13, 2024 13:31
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
tests/deploy/test_pytriton_deploy.py Fixed Show fixed Hide fixed
scripts/deploy/nlp/deploy_triton.py Fixed Show resolved Hide resolved
scripts/deploy/nlp/deploy_triton.py Fixed Show fixed Hide fixed
nemo/deploy/nlp/megatronllm_deployable.py Fixed Show resolved Hide resolved
tests/deploy/test_pytriton_deploy.py Fixed Show fixed Hide fixed
tests/deploy/test_pytriton_deploy.py Fixed Show fixed Hide fixed
nemo/deploy/nlp/megatronllm_deployable.py Fixed Show fixed Hide fixed
nemo_checkpoint_filepath: str = None,
num_devices: int = 1,
num_nodes: int = 1,
existing_model: MegatronGPTModel = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please change the existing_model to model?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular convention we are trying to satisfy with renaming to model? I personally don't think that model alone is descriptive enough for the argument, considering that the word "model" can refer to many different things the context of AI and LLMs.
existing_model is not particularly good either so I would be open to other suggestions. Maybe preinitialized_model?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We basically pass a model in the memory. That's why I was suggesting the model. But I also agree that might be descriptive enough. maybe initialized_model?

nemo/deploy/nlp/megatronllm_deployable.py Fixed Show resolved Hide resolved
scripts/deploy/nlp/deploy_triton.py Fixed Show resolved Hide resolved
custom_config = MegatronGPTModel.restore_from(
nemo_checkpoint_filepath, trainer=trainer, return_config=True
)
# transformer_engine should always be true according to EricH, but GPT-2B model will fail if it is enabled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the according to EricH, but GPT-2B model will fail if it is enabled part and assume that it should be set to true for now.

)
# transformer_engine should always be true according to EricH, but GPT-2B model will fail if it is enabled
custom_config.transformer_engine = True
# using multi-gpu for tensor parallelism directly for now, could do pipeline parallel instead or a combination
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use nemo logger.info here

Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
@github-actions github-actions bot removed the NLP label May 22, 2024
oyilmaz-nvidia
oyilmaz-nvidia previously approved these changes May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants