(#2523) validation epoch interval should calculate starting point the same as global step #2546
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #2523
This pull request improves how validation steps are determined when using dynamic epoch schedules in training. The main change is the introduction of logic to correctly compute the step within the current epoch, even when the number of steps per epoch varies due to dataset scheduling. This ensures that validation triggers at the correct times, especially in complex training scenarios. The update is accompanied by new and updated tests to verify correctness.
Validation logic improvements:
_epoch_relative_stepmethod invalidation.pyto accurately compute the current step within an epoch, accounting for dynamicepoch_batches_scheduleandgradient_accumulation_stepssettings. This method helps ensure validation occurs at the correct epoch boundaries.should_perform_intermediary_validationto use the new_epoch_relative_stepmethod, improving correctness when epochs have varying lengths.Test enhancements:
test_epoch_2_validation_at_correct_steptest to include a dynamicepoch_batches_scheduleandgradient_accumulation_steps, ensuring it covers the new logic.test_validation_uses_epoch_start_step_with_schedule, to verify that validation triggers correctly at the end of epochs when using dynamic schedules, covering scenarios from related issues.test_validation_aligns_with_checkpointsto includeepoch_batches_schedulein its setup, further validating the updated logic.