Deprecate DatasetConfig.val_split_ratio in favor of mixture-level#225
Merged
Conversation
Contributor
|
[claude-review] summary for commit 6f1e2de No blocking issues found. Reviewed end-to-end on the latest commit. The previously-flagged false-positive warning has been fixed: Verified:
|
Member
Author
|
@claude fix per review |
- addresses @claude[bot] (false-positive deprecation warning): changed `DatasetConfig.val_split_ratio` default from `0.05` to `float | None = None`; the warning now fires only when a child's value is explicitly set, not when the user only customizes the mixture-level value. - addresses @claude[bot] (sentinel default for clarity): same change; `None` annotation visually marks the field as deprecated/inert. - addresses @claude[bot] (missing tests): added two cases to `tests/configs/test_default.py` — one asserts a warning fires when a child diverges, the other (the case the old code regressed) asserts no warning fires when only the mixture is customized. tests: passed — pytest -m "not gpu" tests/configs/test_default.py
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Backward-compatible alternative to #224.
val_split_ratiowas declared on bothDatasetConfigandDatasetMixtureConfig, but the per-dataset value was always silently overwritten by the mixture-level value insideDatasetMixtureConfig.__post_init__:So setting
val_split_ratioon aDatasetConfigalready had no effect — it was misleading dead config surface.This PR removes the duplication without breaking any pre-existing JSON config:
val_split_ratioas a field onDatasetConfig(default0.05, same as before) so old configs still parse — but flags it as deprecated in the docstring and points users at the mixture-level field.DatasetMixtureConfig.__post_init__with aDeprecationWarningthat fires only when a child's value diverges from the mixture's. This actually surfaces the previously-silent override behavior to the user.factory.make_datasetto readtrain_cfg.dataset_mixture.val_split_ratiodirectly — the mixture is now the single source of truth for the split.Behavior preserved end-to-end:
val_split_ratioonly onDatasetMixtureConfig→ unchanged.val_split_ratioon a childDatasetConfig→ still parse; the mixture's value still wins (as before); now also see a deprecation warning instead of silent override.0.05, no warning.Tag: 🧹 Cleanup / 📝 Documentation
How it was tested
git grep val_split_ratioconfirms the field still exists onDatasetConfig(for parser back-compat), the docstring is updated, the warning fires on divergence, andfactory.make_datasetreads the mixture-level value.configs/, no test undertests/, and no notebook references the per-dataset field, so existing callers see no behavior change other than the new deprecation warning when they explicitly diverge.python -c "import ast; ast.parse(...)") on both edited files.uvvenv / nopre-commitbinary); please re-run locally before merging:pre-commit run --all-files pytest -m "not gpu" -n autoHow to checkout & try? (for the reviewer)
git fetch origin claude/deprecate-dataset-val-split-ratio git checkout claude/deprecate-dataset-val-split-ratio pre-commit run --all-files pytest -m "not gpu" -n autoQuick manual smoke for the deprecation warning:
Checklist
Note: Before submitting this PR, please read the contributor guideline.
Generated by Claude Code