[bug] Model comparison tests are flaky WRT word_embeddings_weight gradients

# 🐞 Describe the Bug

Model comparison tests (aka `run_test_script`) wit have recently started showing random failures with excessive diff on word_embeddings_weight gradients (`>>>> [train_2] Excessive diff for tensor Global gradient: layers.0.word_embeddings_weight`), with diffs slightly above the threshold. We need to investigate whether there is an actual bug/regression behind this or if it's just random.


Example:
```
>>>> [train_2] Excessive diff for tensor Global gradient: layers.0.word_embeddings_weight:
  * Max diff scaled = 0.15082430839538574 > 0.15 (scale=0.001214031595736742, unregularized=0.0006883841124363244)
  Ref samples:    0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  2.9449e-03
  Test samples:   0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  2.9182e-03
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] Model comparison tests are flaky WRT word_embeddings_weight gradients #304

🐞 Describe the Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] Model comparison tests are flaky WRT word_embeddings_weight gradients #304

Description

🐞 Describe the Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions