log(metrics): show exact step/samples + bump loss precision to 6dp by shuheng-liu · Pull Request #292 · TensorAuto/OpenTau

shuheng-liu · 2026-05-12T17:45:02Z

What this does

The metrics log line was losing detail for two reasons:

format_big_number collapses step:15123 to step:15K, so 14903 and 15127 look identical in the log.
Losses are formatted at :.3f, so a real l1_loss: 0.0001 reads as l1_loss:0.000 — indistinguishable from a true zero.

This PR:

Appends the exact integer in parentheses after step and smpl in MetricsTracker.__str__, e.g. step:15K(15123) smpl:968K(968000). Small values render as step:42(42) smpl:336(336) — uniform shape across the whole run.
Bumps the four loss meters (total_loss, mse_loss, ce_loss, l1_loss) from :.3f to :.6f in both the training tracker and the validation tracker in scripts/train.py. accuracy, grad_norm, and lr are unchanged.
Confirmed the regression-test regex patterns in .github/scripts/check_loss_drop.py (mse_loss:([0-9.eE+-]+)) and .github/scripts/check_nonzero_grad_norm.py (grad_norm:([0-9.eE+-]+)) still match the new precision — the [0-9.eE+-]+ char class captures 0.105234 just as cleanly as 0.105.

Example before / after:

# before
step:15K smpl:2M total_loss:0.291 mse_loss:0.105 ce_loss:0.186 l1_loss:0.000 accuracy:0.000 lr:1.9e-05 grad_norm:0.854

# after
step:15K(15123) smpl:2M(2015872) total_loss:0.291482 mse_loss:0.105234 ce_loss:0.186891 l1_loss:0.000123 accuracy:0.000 lr:1.9e-05 grad_norm:0.854

Label: 📝 Documentation (logging cosmetics) — feel free to relabel.

How it was tested

Added test_metrics_tracker_str_step_and_samples_exact_value and test_metrics_tracker_str_step_and_samples_small_value in tests/utils/test_logging_utils.py covering both the rounded (15K(15123)) and small (42(42)) cases.
Verified the existing test_metrics_tracker_str still passes — its assertions don't touch the step/samples columns.
pytest tests/utils/test_logging_utils.py tests/utils/test_utils_utils.py tests/scripts/test_train.py → 71 passed, 4 skipped, 0 failed.
Ran the full CPU suite (pytest -m "not gpu" -n auto). The 44 failures and 10 errors that remain are all pre-existing on the branch base (5dfd5df): tests/policies/test_pi07_paligemma_low_level_planner.py, tests/utils/test_hub.py, tests/utils/test_libero_utils.py (missing robosuite), tests/datasets/test_loc_tokens_paligemma.py. None touch logging.

Sanity-checked the loss-drop regex on the new format:

>>> re.search(r"mse_loss:([0-9.eE+-]+)", "mse_loss:0.105234 ce_loss:0.186").group(1)
'0.105234'

pre-commit run on all changed files: clean.

How to checkout & try? (for the reviewer)

git fetch origin claude/enhance-metrics-logging-AeF8g
git checkout claude/enhance-metrics-logging-AeF8g
pytest -sx tests/utils/test_logging_utils.py

To see the new format in a real log line, run a smoke training:

opentau-train --accelerate-config configs/examples/accelerate_ddp_config.yaml --config_path=configs/examples/pi05_training_config.json

Checklist

I have added Google-style docstrings to important functions and ensured function parameters are typed.
My PR includes policy-related changes.
- If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Note: Before submitting this PR, please read the contributor guideline.

Generated by Claude Code

log(metrics): show exact step/samples + bump loss precision to 6dp

b8f1835

shuheng-liu added the documentation Improvements or additions to documentation label May 12, 2026 — with Claude

shuheng-liu self-assigned this May 12, 2026

shuheng-liu mentioned this pull request May 12, 2026

ci: move Claude PR Review continue-on-error to step level #293

Merged

3 tasks

ci: move Claude PR Review continue-on-error to step level (#293)

e380699

shuheng-liu requested review from WilliamYue37 and akshay18iitg May 12, 2026 18:08

shuheng-liu marked this pull request as ready for review May 12, 2026 18:08

shuheng-liu merged commit c9bfd2d into main May 12, 2026
6 checks passed

shuheng-liu deleted the claude/enhance-metrics-logging-AeF8g branch May 12, 2026 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log(metrics): show exact step/samples + bump loss precision to 6dp#292

log(metrics): show exact step/samples + bump loss precision to 6dp#292
shuheng-liu merged 2 commits into
mainfrom
claude/enhance-metrics-logging-AeF8g

shuheng-liu commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuheng-liu commented May 12, 2026

What this does

How it was tested

How to checkout & try? (for the reviewer)

Checklist

Note: Before submitting this PR, please read the contributor guideline.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant