Wrap all fp8 extra states in LocalNonpersistentObject #9422

jbaczek · 2024-06-10T09:25:11Z

What does this PR do ?

This PR generalizes FP8 extra state wrapping for all tensors.

Collection: nlp

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

github-actions · 2024-06-28T01:48:46Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

mikolajblaz · 2024-07-05T15:47:50Z

nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py

@@ -1850,7 +1850,7 @@ def sharded_state_dict(self, prefix: str = '') -> Dict[str, Any]:

            # WAR: This is a temporary fix to skip loading FP8 parameters for Dot Product Attention
            def skip_fp8_load(x):
-                if isinstance(x, ShardedObject) and 'fused_attention' in x.key and '_extra_state' in x.key:


I'm concerned that this PR basically makes the GPT models ignore the FP8 state for all layers in the checkpoint.

In the meantime I prepared a thorough solution, with an almost merged MCore branch and a corresponding NeMo branch.

@jbaczek could you check (in theory or in practice) if this would solve your problem?
The required flag to set would be model.dist_ckpt_load_sctrictness=log_all

I think that non-strict loading would solve the problem. I see that this branch is already merged to mcore. When should we expect the sync of repositories, so I could use the public implementation?

I asked for a public sync, ideally should be available later today, but we don't have an official ETA

github-actions · 2024-07-23T01:50:40Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2024-07-30T01:50:42Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

github-actions bot added the NLP label Jun 10, 2024

jbaczek requested a review from timmoon10 June 10, 2024 09:25

Wrap all fp8 extra states in LocalNonpersistentObject

549c541

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

jbaczek force-pushed the jbaczek/llm/fix_war_for_fp8_load branch from 7f822fa to 549c541 Compare June 10, 2024 09:46

jbaczek added the Run CICD label Jun 13, 2024

github-actions bot added the stale label Jun 28, 2024

jbaczek removed the stale label Jul 5, 2024

mikolajblaz reviewed Jul 5, 2024

View reviewed changes

github-actions bot added the stale label Jul 23, 2024

github-actions bot closed this Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrap all fp8 extra states in LocalNonpersistentObject #9422

Wrap all fp8 extra states in LocalNonpersistentObject #9422

jbaczek commented Jun 10, 2024

github-actions bot commented Jun 28, 2024

mikolajblaz Jul 5, 2024

jbaczek Jul 8, 2024

mikolajblaz Jul 8, 2024

github-actions bot commented Jul 23, 2024

github-actions bot commented Jul 30, 2024

Wrap all fp8 extra states in LocalNonpersistentObject #9422

Wrap all fp8 extra states in LocalNonpersistentObject #9422

Conversation

jbaczek commented Jun 10, 2024

What does this PR do ?

Before your PR is "Ready for review"

github-actions bot commented Jun 28, 2024

mikolajblaz Jul 5, 2024

Choose a reason for hiding this comment

jbaczek Jul 8, 2024

Choose a reason for hiding this comment

mikolajblaz Jul 8, 2024

Choose a reason for hiding this comment

github-actions bot commented Jul 23, 2024

github-actions bot commented Jul 30, 2024