Skip to content

Conversation

@jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Jul 15, 2025

✨ Description

Fix: #304

New testing features:

  • Add fine-grained capacity to comparison configs, so we can vary the thresholds and other options by tensor, step, etc. This will prevent situations where we have to use very high thresholds because of worst-case tensors.
  • Add threshold scaling option to model and distributed configs, so we don't have to increase all thresholds because of one config.
  • Add option to disable specific distributed configurations for some models that may not support them.
  • Show comparison configs (pprint).
  • Use a separate directory for each worker when not parallel-safe.
  • Delete the entire testing cache on each test run to prevent issues.

Fixes:

Testing tweaks:

  • Run most model tests in fp32 instead of bf16. This allows setting more useful comparison thresholds, and there isn't much specific to bf16 anyway (basically just flash attention).
  • Add specific testing configurations for bf16 and fp16, with their own separate comparison thresholds.
  • Enable comparison between configs with different hidden state layouts, ex. sequence-first vs not (gradient only).
  • Adjust all thresholds so we can use lower values for most comparisons.
  • Skip sequence-first tests for SSMs.

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

Base automatically changed from model_tests to main July 16, 2025 15:25
@jlamypoirier jlamypoirier marked this pull request as ready for review July 16, 2025 15:51
@jlamypoirier jlamypoirier requested review from bigximik, oleksost and tscholak and removed request for oleksost July 16, 2025 15:51
@jlamypoirier jlamypoirier merged commit 69e4f45 into main Jul 16, 2025
4 checks passed
@jlamypoirier jlamypoirier deleted the fp32_tests branch July 16, 2025 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Model comparison tests are flaky WRT word_embeddings_weight gradients

3 participants