Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NeMo-UX] Add llm.generate to nemo.collections.llm #10471

Merged
merged 10 commits into from
Oct 15, 2024
Merged

Conversation

hemildesai
Copy link
Collaborator

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

from typing_extensions import Annotated

import nemo.lightning as nl

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note

Module 'nemo.lightning' is imported with both 'import' and 'import from'.
from megatron.core.models.gpt.gpt_model import GPTModel as MCoreGPTModel
from pytorch_lightning.trainer.states import TrainerFn

import nemo.lightning as nl

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note

Module 'nemo.lightning' is imported with both 'import' and 'import from'.
return self.tokenizer.text_to_ids(prompt)


def _setup_trainer_and_restore_model(path: Path, trainer: nl.Trainer, model: pl.LightningModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO to this to move this to the Fabric-API instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcromeyn Just to make sure, is the Fabric-API going to be a go-to method for loading NeMo2 checkpoints?

_setup_trainer_and_restore_model(path=path, trainer=trainer, model=model)

mcore_model = model.module.module.module
inference_wrapped_model = GPTInferenceWrapper(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to a method of the MegatronParallel class?

InferenceWrapperConfig(
hidden_size=mcore_model.config.hidden_size,
params_dtype=params_dtype,
inference_batch_times_seqlen_threshold=1000,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you just want to make it a static value or make this a user given parameter ? Its actually sometimes really much faster to set this to the highest value possible depending on your config.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can make it user defined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6cd5e50

shanmugamr1992
shanmugamr1992 previously approved these changes Sep 19, 2024
@athitten
Copy link
Collaborator

Adding @oyilmaz-nvidia for viz.

# TODO: Move to lightning Fabric API.
def _setup_trainer_and_restore_model(path: Path, trainer: nl.Trainer, model: pl.LightningModule):
assert isinstance(trainer.strategy, MegatronStrategy), "Only MegatronStrategy is supported for trainer.strategy."
assert trainer.strategy.context_parallel_size <= 1, "Context parallelism is not supported for inference."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about pipeline parallelism - is it supported for generation currently and, if yes, how?

@janekl
Copy link
Collaborator

janekl commented Sep 26, 2024

@hemildesai would you please add some code snippet in your MR description to demonstrate how to use generation implemented? I mean how to use it with a container like nvcr.io/nvidia/nemo:dev for some basic case like Llama with, say, TP=4 (PP > 1 example would also be great if that is already supported).

This is to have some basic example that users can immediately try out - generation is quite a common use-case.

athitten
athitten previously approved these changes Sep 26, 2024
Copy link
Collaborator

@athitten athitten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks, can be merged once unaddressed comments are resolved. Also it'll be good to have an example showcasing how to use the generate function.

trainer = trainer or io.load_context(path=path, subpath="trainer")
_setup_trainer_and_restore_model(path=path, trainer=trainer, model=model)

mcore_model = model.module.module.module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please detail this nesting/unwrapping to get mcore_model in a comment for clarity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes will do.

@hemildesai
Copy link
Collaborator Author

@hemildesai would you please add some code snippet in your MR description to demonstrate how to use generation implemented? I mean how to use it with a container like nvcr.io/nvidia/nemo:dev for some basic case like Llama with, say, TP=4 (PP > 1 example would also be great if that is already supported).

This is to have some basic example that users can immediately try out - generation is quite a common use-case.

Hi @janekl yes we will be adding an example for using llm.generate to this PR soon

hemildesai and others added 7 commits October 14, 2024 12:14
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
@hemildesai
Copy link
Collaborator Author

@janekl I've added an example script in 6ad3d94

Copy link
Collaborator

@cuichenx cuichenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hemildesai hemildesai enabled auto-merge (squash) October 14, 2024 20:36
Copy link
Contributor

[🤖]: Hi @hemildesai 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

@hemildesai hemildesai merged commit 23334b6 into main Oct 15, 2024
168 of 172 checks passed
@hemildesai hemildesai deleted the hemil/llm-generate branch October 15, 2024 01:17
yashaswikarnati pushed a commit that referenced this pull request Oct 20, 2024
* Add llm.generate

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Remove comment

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Fix launching with python

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Add assert cp

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Add example script

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
artbataev pushed a commit to artbataev/NeMo that referenced this pull request Oct 22, 2024
* Add llm.generate

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Remove comment

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Fix launching with python

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Add assert cp

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Add example script

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
hemildesai added a commit that referenced this pull request Oct 22, 2024
* Add llm.generate

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Remove comment

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Fix launching with python

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Add assert cp

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Add example script

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
@pablo-garay pablo-garay mentioned this pull request Oct 22, 2024
8 tasks
pablo-garay pushed a commit that referenced this pull request Oct 22, 2024
* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471)

* Add llm.generate

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Remove comment

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Fix launching with python

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Add assert cp

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Add example script

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>

* Fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
akoumpa pushed a commit that referenced this pull request Oct 24, 2024
* Add llm.generate

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Remove comment

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Fix launching with python

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* PR feedback

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Add assert cp

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Add example script

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants