[NeMo-UX] Add mixed-precision plugin by marcromeyn · Pull Request #9065 · NVIDIA-NeMo/NeMo

marcromeyn · 2024-04-30T07:33:06Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Jenkins CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

There's no need to comment jenkins on the PR to trigger Jenkins CI.
The GitHub Actions CI will run automatically when the PR is opened.
To run CI on an untrusted fork, a NeMo user with write access must click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

for more information, see https://pre-commit.ci

tests/lightning/test_megatron_parallel.py

nemo/llm/gpt/model/base.py

nemo/lightning/pytorch/plugins/mixed_precision.py

+    def optimizer_step(
+        self,
+        optimizer: torch.optim.Optimizer,
+        model: Union["pl.LightningModule", torch.nn.Module],
+        closure: Callable[[], Any],
+        **kwargs: Any,
+    ) -> None:


nemo/lightning/pytorch/plugins/mixed_precision.py

Signed-off-by: Chen Cui <chcui@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

ericharper · 2024-05-01T23:13:10Z

tests/lightning/test_megatron_parallel.py

+        assert megatron_parallel.forward_step == mock_forward_step
+        assert megatron_parallel.loss_reduction == mock_loss_reduction
+
+    def test_init_with_virtual_pipeline(self, mocker, mock_pipeline):


this test is getting called and failing during the CPU unit tests

added cpu=True to fix this

* Adding MegatronParallel * Move over _strategy_liMegatronCheckpointIO * Adding GPTModel & MockDataModule * Adding mixed-precision to NeMo * Fix import * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert unintended changes Signed-off-by: Chen Cui <chcui@nvidia.com> * clean up code and reinstate mix precision tests Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: Chen Cui <chcui@nvidia.com> * use cpu for unit test Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Ao Tang <aot@nvidia.com>

* Adding MegatronParallel * Move over _strategy_liMegatronCheckpointIO * Adding GPTModel & MockDataModule * Adding mixed-precision to NeMo * Fix import * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert unintended changes Signed-off-by: Chen Cui <chcui@nvidia.com> * clean up code and reinstate mix precision tests Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: Chen Cui <chcui@nvidia.com> * use cpu for unit test Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Chen Cui <chcui@nvidia.com>

marcromeyn and others added 5 commits April 30, 2024 00:28

Adding MegatronParallel

b08428d

Move over _strategy_liMegatronCheckpointIO

c2d00b2

Adding GPTModel & MockDataModule

6f4a180

Adding mixed-precision to NeMo

c23ffc2

Fix import

6a4a870

marcromeyn requested a review from cuichenx April 30, 2024 07:33

[pre-commit.ci] auto fixes from pre-commit.com hooks

c72b6de

for more information, see https://pre-commit.ci

github-advanced-security bot found potential problems Apr 30, 2024

View reviewed changes

cuichenx and others added 4 commits May 1, 2024 13:15

revert unintended changes

cf0f134

Signed-off-by: Chen Cui <chcui@nvidia.com>

clean up code and reinstate mix precision tests

18b816f

Signed-off-by: Chen Cui <chcui@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

46195d4

for more information, see https://pre-commit.ci

clean up

9e9ff53

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx added the Run CICD label May 1, 2024

use cpu for unit test

27df23b

Signed-off-by: Chen Cui <chcui@nvidia.com>

ericharper reviewed May 1, 2024

View reviewed changes

cuichenx added Run CICD and removed Run CICD labels May 1, 2024

cuichenx mentioned this pull request May 2, 2024

[NeMo-UX] Add mistral-7b model #9066

Merged

8 tasks

cuichenx approved these changes May 2, 2024

View reviewed changes

cuichenx merged commit 0643511 into main May 2, 2024

cuichenx deleted the nemo-ux/mixed-precision branch May 2, 2024 03:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NeMo-UX] Add mixed-precision plugin#9065

[NeMo-UX] Add mixed-precision plugin#9065
cuichenx merged 11 commits intomainfrom
nemo-ux/mixed-precision

marcromeyn commented Apr 30, 2024

Uh oh!

Uh oh!

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ericharper May 1, 2024

Uh oh!

cuichenx May 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

marcromeyn commented Apr 30, 2024

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Uh oh!

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ericharper May 1, 2024

Choose a reason for hiding this comment

Uh oh!

cuichenx May 2, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants