fix: mixture-weighted validation aggregate + per-name dup suffix by shuheng-liu · Pull Request #189 · TensorAuto/OpenTau

shuheng-liu · 2026-04-26T06:32:11Z

What this does

Follow-up to #169. Two small fixes around per-dataset validation reporting:

Mixture-weighted overall Validation/Loss. PR feat: per-dataset validation loss reporting #169 kept the aggregate scalar but, because it accumulated each batch into a parallel agg_tracker with n=1, the meaning silently changed: large val subsets dominate. This drops the parallel tracker and computes the aggregate from the per-dataset trackers using WeightedDatasetMixture.dataset_weights, so the overall Validation/{Loss,MSE Loss,CE Loss,L1 Loss,Accuracy} reflects the training mixture again. Per-dataset scalars are unchanged.
Per-name sequential suffix when repo_id collides. When two DatasetConfigs share the same repo_id (e.g. same HF repo, different episodes slices), _make_dataset_names previously used the global index, so ['A','B','A'] became ['A#0','B','A#2']. Now it uses a per-name counter: ['A#0','B','A#1'].

Helper extracted to top-level _mixture_weighted_aggregate(per_dataset_trackers, name_to_weight, metric_keys) so it can be unit-tested. Renormalizes weights over the keys actually present in the trackers (matches get_per_dataset_dataloaders skipping empty datasets), and returns 0.0 for every metric when there is no signal (empty trackers / all-zero weights).

Out of scope: _get_worker_name_mapping_overrides still keys off bare repo_id and would collide between two datasets that share repo_id but have different data_features_name_mapping. That is a feature-mapping bug rather than a reporting one and is left for a separate PR.

How it was tested

New tests/scripts/test_train.py::TestMixtureWeightedAggregate covers equal weights / unequal weights / renormalizing over present keys only / all five metrics / empty-trackers / all-zero-weights edge cases.
New tests/datasets/test_dataset_mixture.py::TestMakeDatasetNames pins the disambiguation behaviour: all-unique, repeated repo_id, all-identical, mixed vqa/repo_id, missing repo_id+vqa (class-name fallback), mismatched cfg length, and absent dataset_mixture attribute.
pre-commit run clean on all touched files.
Standalone exercise of the extracted aggregation logic and disambiguation logic confirms the arithmetic and the ['A#0','B','A#1'] ordering.
Full GPU validation in a real training run is still pending (no GPU available in this environment), same caveat as PR feat: per-dataset validation loss reporting #169.

How to checkout & try? (for the reviewer)

pytest -sx tests/scripts/test_train.py::TestMixtureWeightedAggregate tests/datasets/test_dataset_mixture.py::TestMakeDatasetNames

Then launch any training config with val_freq > 0 and at least two datasets with different dataset_weights in dataset_mixture.datasets. Confirm in wandb that Validation/Loss now matches sum(w_i * Validation/<name_i>/Loss) / sum(w_i) instead of being dominated by the largest val subset. To exercise the duplicate-name path, point two DatasetConfig entries at the same repo_id (with different episodes) and confirm the per-dataset W&B groups appear as <repo_id>#0 and <repo_id>#1.

Checklist

I have added Google-style docstrings to important functions and ensured function parameters are typed.
My PR includes policy-related changes.
- If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Generated by Claude Code

Restores meaningful overall Validation/Loss after PR #169 by computing the aggregate as a mixture-weighted mean of per-dataset metrics rather than implicitly weighting by val-subset size. Also fixes disambiguation of duplicate repo_ids to use per-name sequential suffix (['A','B','A'] -> ['A#0','B','A#1'] instead of ['A#0','B','A#2']).

Conflict resolutions: - src/opentau/scripts/train.py: auto-merged. - tests/scripts/test_train.py: union resolution. HEAD added the ``test_*_deepspeed_*`` tests (#175) and main added the ``TestMixtureWeightedAggregate`` class (#189). Both exercise different helpers so they coexist; the import block was combined.

shuheng-liu marked this pull request as ready for review April 26, 2026 06:49

shuheng-liu self-assigned this Apr 26, 2026

akshay18iitg approved these changes Apr 27, 2026

View reviewed changes

akshay18iitg merged commit c924595 into main Apr 27, 2026
5 checks passed

akshay18iitg deleted the claude/setup-precommit-hooks-9GLX5 branch April 27, 2026 20:23

shuheng-liu mentioned this pull request Apr 29, 2026

fix(ci): gate extract-claude-lessons on head branch, not PR author #212

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mixture-weighted validation aggregate + per-name dup suffix#189

fix: mixture-weighted validation aggregate + per-name dup suffix#189
akshay18iitg merged 1 commit into
mainfrom
claude/setup-precommit-hooks-9GLX5

shuheng-liu commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shuheng-liu commented Apr 26, 2026

What this does

How it was tested

How to checkout & try? (for the reviewer)

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants