Fix for TransformerLayer MLP parameters not being set with specific hyperparameters #8845

OlegSudakov · 2024-04-08T10:12:06Z

What does this PR do ?

Detailed description here: #8846
This PR fixes Transformer layer MLP hyperparameters not being set with model.mcore_gpt=False, model.transformer_engine=True, and model.megatron_amp_O2=True.

Collection: NLP

Changelog

Passing missing arguments to layer constructors.

Usage

Add the following lines to opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py after model initialization to debug:

logging.warning(f"DEBUG: layernorm_mlp.activation={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.activation}")
logging.warning(f"DEBUG: layernorm_mlp.use_bias={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.use_bias}")
logging.warning(f"DEBUG: layernorm_mlp.normalization={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.normalization}")
logging.warning(f"DEBUG: layernorm_mlp.layernorm_mlp.fc1_weight.shape={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.fc1_weight.shape}")
logging.warning(f"DEBUG: layernorm_mlp.layernorm_mlp.fc2_weight.shape={model.model.module.language_model.encoder.layers._modules['0'].layernorm_mlp.fc2_weight.shape}")

Test with the following script with and without the changes:

#!/bin/bash

python /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py \
    model.mcore_gpt=False \
    model.transformer_engine=True \
    trainer.precision=bf16 \
    model.megatron_amp_O2=True \
    model.activation=fast-swiglu \
    model.bias=false \
    model.normalization=rmsnorm \

The bias, normalization, and activation should be correctly set with the fix.

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- [] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

…False, transformer_engine=True, megatron_amp_O2=True Signed-off-by: Oleg Sudakov <oleg.sudakov@outlook.com>

for more information, see https://pre-commit.ci

github-actions · 2024-04-23T01:44:54Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2024-05-01T01:46:53Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

Fix for TransformerLayer MLP parameters not being set with mcore_gpt=…

bb7377b

…False, transformer_engine=True, megatron_amp_O2=True Signed-off-by: Oleg Sudakov <oleg.sudakov@outlook.com>

github-actions bot added the NLP label Apr 8, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

9eded27

for more information, see https://pre-commit.ci

OlegSudakov mentioned this pull request Apr 8, 2024

TransformerLayer MLP parameters are not being set during model initialization #8846

Closed

github-actions bot added the stale label Apr 23, 2024

github-actions bot closed this May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for TransformerLayer MLP parameters not being set with specific hyperparameters #8845

Fix for TransformerLayer MLP parameters not being set with specific hyperparameters #8845

OlegSudakov commented Apr 8, 2024 •

edited

github-actions bot commented Apr 23, 2024

github-actions bot commented May 1, 2024

Fix for TransformerLayer MLP parameters not being set with specific hyperparameters #8845

Fix for TransformerLayer MLP parameters not being set with specific hyperparameters #8845

Conversation

OlegSudakov commented Apr 8, 2024 • edited

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

github-actions bot commented Apr 23, 2024

github-actions bot commented May 1, 2024

OlegSudakov commented Apr 8, 2024 •

edited