Skip to content

Assert fp32 for rope embeddings, misc test fixes#1496

Merged
pstjohn merged 2 commits intoNVIDIA:mainfrom
pstjohn:pstjohn/assert-fp32-for-rope-embeddings
Mar 5, 2026
Merged

Assert fp32 for rope embeddings, misc test fixes#1496
pstjohn merged 2 commits intoNVIDIA:mainfrom
pstjohn:pstjohn/assert-fp32-for-rope-embeddings

Conversation

@pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Mar 5, 2026

This wouldn't have caught @savitha-eng's cast_forward_inputs=True bug (that casts these right as they enter the TransformerLayer), but it turns out our test suite was actually casting these to bfloat16 with model.to(bfloat16) calls 😬 .

This also fixes a few other misc. test failures I saw locally making sure the esm2 & llama3 recipe and model tests pass.

will require #1495 for tests to pass

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b5554505-bad8-4a0b-87ec-fed7f5c589d5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator

@savitha-eng savitha-eng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@pstjohn pstjohn enabled auto-merge March 5, 2026 19:05
savitha-eng added a commit that referenced this pull request Mar 5, 2026
Self-contained FSDP2 + TransformerEngine recipe for OpenGenome2 training,
extracted from the generic llama3_native_te recipe with OG2-specific defaults:
- FP32 master weights with MixedPrecisionPolicy (cast_forward_inputs=False)
- Megatron-style scaled init for proj/fc2 layers
- Spike-No-More embedding initialization (std=1.0)
- Genomic masking for degenerate bases
- Weight decay grouping (skip bias/1D params)
- THD sequence packing with GQA
- FP8 training with first/last layer BF16 override
- RoPE fp32 assertion (from PR #1496)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@savitha-eng savitha-eng mentioned this pull request Mar 5, 2026
9 tasks
pstjohn added 2 commits March 5, 2026 14:55
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
@pstjohn pstjohn force-pushed the pstjohn/assert-fp32-for-rope-embeddings branch from e609c74 to a5deb7c Compare March 5, 2026 22:22
@pstjohn pstjohn added this pull request to the merge queue Mar 5, 2026
Merged via the queue into NVIDIA:main with commit b2ddae1 Mar 5, 2026
21 checks passed
@pstjohn pstjohn deleted the pstjohn/assert-fp32-for-rope-embeddings branch March 5, 2026 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants