-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NeMo-UX] Add llm.generate to nemo.collections.llm #10471
Conversation
from typing_extensions import Annotated | ||
|
||
import nemo.lightning as nl |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note
from megatron.core.models.gpt.gpt_model import GPTModel as MCoreGPTModel | ||
from pytorch_lightning.trainer.states import TrainerFn | ||
|
||
import nemo.lightning as nl |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note
return self.tokenizer.text_to_ids(prompt) | ||
|
||
|
||
def _setup_trainer_and_restore_model(path: Path, trainer: nl.Trainer, model: pl.LightningModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a TODO to this to move this to the Fabric-API instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcromeyn Just to make sure, is the Fabric-API going to be a go-to method for loading NeMo2 checkpoints?
_setup_trainer_and_restore_model(path=path, trainer=trainer, model=model) | ||
|
||
mcore_model = model.module.module.module | ||
inference_wrapped_model = GPTInferenceWrapper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this to a method of the MegatronParallel
class?
InferenceWrapperConfig( | ||
hidden_size=mcore_model.config.hidden_size, | ||
params_dtype=params_dtype, | ||
inference_batch_times_seqlen_threshold=1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you just want to make it a static value or make this a user given parameter ? Its actually sometimes really much faster to set this to the highest value possible depending on your config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can make it user defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 6cd5e50
5f54de8
to
0fa0565
Compare
Adding @oyilmaz-nvidia for viz. |
# TODO: Move to lightning Fabric API. | ||
def _setup_trainer_and_restore_model(path: Path, trainer: nl.Trainer, model: pl.LightningModule): | ||
assert isinstance(trainer.strategy, MegatronStrategy), "Only MegatronStrategy is supported for trainer.strategy." | ||
assert trainer.strategy.context_parallel_size <= 1, "Context parallelism is not supported for inference." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about pipeline parallelism - is it supported for generation currently and, if yes, how?
@hemildesai would you please add some code snippet in your MR description to demonstrate how to use generation implemented? I mean how to use it with a container like This is to have some basic example that users can immediately try out - generation is quite a common use-case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks, can be merged once unaddressed comments are resolved. Also it'll be good to have an example showcasing how to use the generate function.
trainer = trainer or io.load_context(path=path, subpath="trainer") | ||
_setup_trainer_and_restore_model(path=path, trainer=trainer, model=model) | ||
|
||
mcore_model = model.module.module.module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please detail this nesting/unwrapping to get mcore_model
in a comment for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes will do.
Hi @janekl yes we will be adding an example for using llm.generate to this PR soon |
0fa0565
to
c249cab
Compare
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
c249cab
to
6ad3d94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
[🤖]: Hi @hemildesai 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
* Add llm.generate Signed-off-by: Hemil Desai <hemild@nvidia.com> * Remove comment Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Fix launching with python Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Add assert cp Signed-off-by: Hemil Desai <hemild@nvidia.com> * Add example script Signed-off-by: Hemil Desai <hemild@nvidia.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Add llm.generate Signed-off-by: Hemil Desai <hemild@nvidia.com> * Remove comment Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Fix launching with python Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Add assert cp Signed-off-by: Hemil Desai <hemild@nvidia.com> * Add example script Signed-off-by: Hemil Desai <hemild@nvidia.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Add llm.generate Signed-off-by: Hemil Desai <hemild@nvidia.com> * Remove comment Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Fix launching with python Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Add assert cp Signed-off-by: Hemil Desai <hemild@nvidia.com> * Add example script Signed-off-by: Hemil Desai <hemild@nvidia.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471) * Add llm.generate Signed-off-by: Hemil Desai <hemild@nvidia.com> * Remove comment Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Fix launching with python Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Add assert cp Signed-off-by: Hemil Desai <hemild@nvidia.com> * Add example script Signed-off-by: Hemil Desai <hemild@nvidia.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Add llm.generate Signed-off-by: Hemil Desai <hemild@nvidia.com> * Remove comment Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Fix launching with python Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Add assert cp Signed-off-by: Hemil Desai <hemild@nvidia.com> * Add example script Signed-off-by: Hemil Desai <hemild@nvidia.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information