[NeMo-UX] Add llm.generate to nemo.collections.llm #10471

hemildesai · 2024-09-11T23:14:19Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

nemo/collections/llm/api.py

 from typing_extensions import Annotated

+import nemo.lightning as nl


nemo/collections/llm/inference/base.py

+from megatron.core.models.gpt.gpt_model import GPTModel as MCoreGPTModel
+from pytorch_lightning.trainer.states import TrainerFn
+
+import nemo.lightning as nl


marcromeyn · 2024-09-12T07:27:32Z

nemo/collections/llm/inference/base.py

+        return self.tokenizer.text_to_ids(prompt)
+
+
+def _setup_trainer_and_restore_model(path: Path, trainer: nl.Trainer, model: pl.LightningModule):


Please add a TODO to this to move this to the Fabric-API instead.

@marcromeyn Just to make sure, is the Fabric-API going to be a go-to method for loading NeMo2 checkpoints?

marcromeyn · 2024-09-12T07:29:05Z

nemo/collections/llm/inference/base.py

+    _setup_trainer_and_restore_model(path=path, trainer=trainer, model=model)
+
+    mcore_model = model.module.module.module
+    inference_wrapped_model = GPTInferenceWrapper(


Could we move this to a method of the MegatronParallel class?

shanmugamr1992 · 2024-09-19T22:00:08Z

nemo/collections/llm/inference/base.py

+        InferenceWrapperConfig(
+            hidden_size=mcore_model.config.hidden_size,
+            params_dtype=params_dtype,
+            inference_batch_times_seqlen_threshold=1000,


Do you just want to make it a static value or make this a user given parameter ? Its actually sometimes really much faster to set this to the highest value possible depending on your config.

Yeah I can make it user defined.

Addressed in 6cd5e50

athitten · 2024-09-23T18:46:14Z

Adding @oyilmaz-nvidia for viz.

janekl · 2024-09-26T11:51:27Z

nemo/collections/llm/inference/base.py

+# TODO: Move to lightning Fabric API.
+def _setup_trainer_and_restore_model(path: Path, trainer: nl.Trainer, model: pl.LightningModule):
+    assert isinstance(trainer.strategy, MegatronStrategy), "Only MegatronStrategy is supported for trainer.strategy."
+    assert trainer.strategy.context_parallel_size <= 1, "Context parallelism is not supported for inference."


How about pipeline parallelism - is it supported for generation currently and, if yes, how?

janekl · 2024-09-26T12:01:40Z

@hemildesai would you please add some code snippet in your MR description to demonstrate how to use generation implemented? I mean how to use it with a container like nvcr.io/nvidia/nemo:dev for some basic case like Llama with, say, TP=4 (PP > 1 example would also be great if that is already supported).

This is to have some basic example that users can immediately try out - generation is quite a common use-case.

athitten

LGTM thanks, can be merged once unaddressed comments are resolved. Also it'll be good to have an example showcasing how to use the generate function.

janekl · 2024-10-02T17:39:40Z

nemo/collections/llm/inference/base.py

+    trainer = trainer or io.load_context(path=path, subpath="trainer")
+    _setup_trainer_and_restore_model(path=path, trainer=trainer, model=model)
+
+    mcore_model = model.module.module.module


Could you please detail this nesting/unwrapping to get mcore_model in a comment for clarity?

Yes will do.

hemildesai · 2024-10-02T23:26:56Z

@hemildesai would you please add some code snippet in your MR description to demonstrate how to use generation implemented? I mean how to use it with a container like nvcr.io/nvidia/nemo:dev for some basic case like Llama with, say, TP=4 (PP > 1 example would also be great if that is already supported).

This is to have some basic example that users can immediately try out - generation is quite a common use-case.

Hi @janekl yes we will be adding an example for using llm.generate to this PR soon

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

Signed-off-by: Hemil Desai <hemild@nvidia.com>

hemildesai · 2024-10-14T19:36:14Z

@janekl I've added an example script in 6ad3d94

cuichenx

LGTM!

github-actions · 2024-10-15T01:17:30Z

[🤖]: Hi @hemildesai 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

* Add llm.generate Signed-off-by: Hemil Desai <hemild@nvidia.com> * Remove comment Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Fix launching with python Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Add assert cp Signed-off-by: Hemil Desai <hemild@nvidia.com> * Add example script Signed-off-by: Hemil Desai <hemild@nvidia.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>

* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471) * Add llm.generate Signed-off-by: Hemil Desai <hemild@nvidia.com> * Remove comment Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Fix launching with python Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Add assert cp Signed-off-by: Hemil Desai <hemild@nvidia.com> * Add example script Signed-off-by: Hemil Desai <hemild@nvidia.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>

* Add llm.generate Signed-off-by: Hemil Desai <hemild@nvidia.com> * Remove comment Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Fix launching with python Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * Add assert cp Signed-off-by: Hemil Desai <hemild@nvidia.com> * Add example script Signed-off-by: Hemil Desai <hemild@nvidia.com> * Fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

hemildesai requested review from marcromeyn, cuichenx and shanmugamr1992 September 11, 2024 23:14

github-advanced-security bot found potential problems Sep 11, 2024

View reviewed changes

marcromeyn reviewed Sep 12, 2024

View reviewed changes

shanmugamr1992 reviewed Sep 19, 2024

View reviewed changes

hemildesai added the Run CICD label Sep 19, 2024

shanmugamr1992 previously approved these changes Sep 19, 2024

View reviewed changes

hemildesai dismissed shanmugamr1992’s stale review via 0fa0565 September 20, 2024 00:56

hemildesai force-pushed the hemil/llm-generate branch from 5f54de8 to 0fa0565 Compare September 20, 2024 00:56

hemildesai added Run CICD and removed Run CICD labels Sep 20, 2024

hemildesai requested a review from athitten September 20, 2024 18:23

athitten requested a review from oyilmaz-nvidia September 23, 2024 18:45

janekl reviewed Sep 26, 2024

View reviewed changes

athitten previously approved these changes Sep 26, 2024

View reviewed changes

janekl reviewed Oct 2, 2024

View reviewed changes

hemildesai dismissed athitten’s stale review via c249cab October 8, 2024 22:21

hemildesai force-pushed the hemil/llm-generate branch from 0fa0565 to c249cab Compare October 8, 2024 22:21

hemildesai and others added 7 commits October 14, 2024 12:14

Add llm.generate

41e3876

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Remove comment

1d08a32

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Apply isort and black reformatting

baf5d3d

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

Fix launching with python

0e7bce3

Signed-off-by: Hemil Desai <hemild@nvidia.com>

PR feedback

462363b

Signed-off-by: Hemil Desai <hemild@nvidia.com>

PR feedback

b16a8a3

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Apply isort and black reformatting

b138431

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

hemildesai added 2 commits October 14, 2024 12:14

Add assert cp

48f40e8

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Add example script

6ad3d94

Signed-off-by: Hemil Desai <hemild@nvidia.com>

hemildesai force-pushed the hemil/llm-generate branch from c249cab to 6ad3d94 Compare October 14, 2024 19:16

Fix

b01fc80

Signed-off-by: Hemil Desai <hemild@nvidia.com>

hemildesai added r2.0.0 Run CICD and removed Run CICD labels Oct 14, 2024

cuichenx approved these changes Oct 14, 2024

View reviewed changes

hemildesai enabled auto-merge (squash) October 14, 2024 20:36

hemildesai merged commit 23334b6 into main Oct 15, 2024
168 of 172 checks passed

hemildesai deleted the hemil/llm-generate branch October 15, 2024 01:17

pablo-garay mentioned this pull request Oct 22, 2024

Cherry pick llm.generate #10998

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NeMo-UX] Add llm.generate to nemo.collections.llm #10471

[NeMo-UX] Add llm.generate to nemo.collections.llm #10471

hemildesai commented Sep 11, 2024

marcromeyn Sep 12, 2024

Laplasjan107 Sep 26, 2024

marcromeyn Sep 12, 2024

shanmugamr1992 Sep 19, 2024

hemildesai Sep 19, 2024

hemildesai Sep 19, 2024

athitten commented Sep 23, 2024

janekl Sep 26, 2024

janekl commented Sep 26, 2024

athitten left a comment •

edited

Loading

janekl Oct 2, 2024

hemildesai Oct 2, 2024

hemildesai commented Oct 2, 2024

hemildesai commented Oct 14, 2024

cuichenx left a comment

github-actions bot commented Oct 15, 2024

		from typing_extensions import Annotated

		import nemo.lightning as nl

		return self.tokenizer.text_to_ids(prompt)


		def _setup_trainer_and_restore_model(path: Path, trainer: nl.Trainer, model: pl.LightningModule):

[NeMo-UX] Add llm.generate to nemo.collections.llm #10471

[NeMo-UX] Add llm.generate to nemo.collections.llm #10471

Conversation

hemildesai commented Sep 11, 2024

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

athitten commented Sep 23, 2024

Choose a reason for hiding this comment

janekl commented Sep 26, 2024

athitten left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hemildesai commented Oct 2, 2024

hemildesai commented Oct 14, 2024

cuichenx left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 15, 2024

athitten left a comment •

edited

Loading