-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MegatronGPTDeployable for serving .nemo models in pytriton #8958
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test similar to this https://github.com/jukim-nv/NeMo/blob/megatrongpt_deployable/tests/export/test_nemo_export.py under tests/deploy/test_deploy_pytriton.py?
And in that test, can we test the class you created?
for more information, see https://pre-commit.ci
LOGGER = logging.getLogger("NeMo") | ||
|
||
|
||
class MegatronGPTDeployable(ITritonDeployable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change this name to PyTritonDeployableLLM and move this file under nemo/deploy/nlp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to MegatronLLMDeployable
for now. As discussed offline, whatever name we decide on for this class should match the Model class that we create inside of it. So if the final name is MegatronLLMDeployable
, the internal self.model should be of type MegatronLLMModel
.
…elds from the relevant internal classes instead of hard-coding whenever possible
for more information, see https://pre-commit.ci
… gpus to be controlled via argument
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add this pytorch level deployment in this script https://github.com/NVIDIA/NeMo/blob/main/scripts/deploy/nlp/deploy_triton.py please?
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…7B and Nemotron3-22B
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…i-prompt logprob calculation
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
nemo_checkpoint_filepath: str = None, | ||
num_devices: int = 1, | ||
num_nodes: int = 1, | ||
existing_model: MegatronGPTModel = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please change the existing_model to model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular convention we are trying to satisfy with renaming to model
? I personally don't think that model
alone is descriptive enough for the argument, considering that the word "model" can refer to many different things the context of AI and LLMs.
existing_model
is not particularly good either so I would be open to other suggestions. Maybe preinitialized_model
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We basically pass a model in the memory. That's why I was suggesting the model. But I also agree that might be descriptive enough. maybe initialized_model?
Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
custom_config = MegatronGPTModel.restore_from( | ||
nemo_checkpoint_filepath, trainer=trainer, return_config=True | ||
) | ||
# transformer_engine should always be true according to EricH, but GPT-2B model will fail if it is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the according to EricH, but GPT-2B model will fail if it is enabled
part and assume that it should be set to true for now.
) | ||
# transformer_engine should always be true according to EricH, but GPT-2B model will fail if it is enabled | ||
custom_config.transformer_engine = True | ||
# using multi-gpu for tensor parallelism directly for now, could do pipeline parallel instead or a combination |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use nemo logger.info here
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
What does this PR do ?
We currently only support deploying TRTLLM models in the Triton inference server. This MR adds the necessary support to let us deploy .nemo models to Triton as well.
Collection: deploy
Changelog
Usage
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information