Skip to content

test: loosening absolute tolerance for layer norm tests#1581

Open
laserkelvin wants to merge 1 commit intoNVIDIA:mainfrom
laserkelvin:layernorm-test-precisions
Open

test: loosening absolute tolerance for layer norm tests#1581
laserkelvin wants to merge 1 commit intoNVIDIA:mainfrom
laserkelvin:layernorm-test-precisions

Conversation

@laserkelvin
Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

This PR modifies the absolute numerical tolerance used to compare fused and unfused implementations of the equivariant layer norm, which triggers on certain platforms.

Checklist

Dependencies

N/A

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
@laserkelvin laserkelvin added the bug Something isn't working label Apr 22, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR increases the absolute tolerance for torch.float32 equivariance tests from 1e-4 to 5e-3 (a 50× relaxation) in the shared get_rtol_atol helper to accommodate platform-specific numerical differences in the fused layer norm. The change is minimal and the motivation is clear, but the new atol now equals the float16 threshold, and the Notes docstring still claims float32 delivers "~4-5 decimal digits" of precision.

Important Files Changed

Filename Overview
test/experimental/nn/symmetry/conftest.py Loosens float32 atol from 1e-4 to 5e-3 (50×) for layer norm equivariance tests; docstring precision claim now inconsistent with the new value.

Comments Outside Diff (1)

  1. test/experimental/nn/symmetry/conftest.py, line 95-96 (link)

    P2 Docstring precision claim inconsistent with new atol

    The Notes section still documents float32 as providing ~4-5 decimal digits of precision, but atol=5e-3 corresponds to roughly 2-3 decimal digits — the same level as float16. Consider updating the comment to reflect the actual tolerance in use.

Reviews (1): Last reviewed commit: "test: loosening absolute tolerance for l..." | Re-trigger Greptile

@laserkelvin
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Comment thread test/experimental/nn/symmetry/conftest.py
@laserkelvin
Copy link
Copy Markdown
Collaborator Author

GLOBE related tests fail on GitHub CI - paging Dr. @peterdsharpe!

@laserkelvin laserkelvin enabled auto-merge April 22, 2026 23:39
@peterdsharpe
Copy link
Copy Markdown
Collaborator

peterdsharpe commented Apr 23, 2026

@laserkelvin Aware of this, seems to be sporadically failing - apologies! Investigating here: #1580

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants