Skip to content

Conversation

@VassilisVassiliadis
Copy link
Member

This PR adds a new method for automatically stopping measurements that divides the training runtime of a measurement into 2 parts:

  1. Warmup Phase: Contains the first few optimization steps which collectively require at least 60 seconds
  2. Stable Phase: Contains the remaining optimization steps, for at least 120 seconds and at least 10 optimization steps.

The method then drops system metrics and training metrics (e.g. throughput) related to the Warmup phase. The final observed properties for measurements that use auto_stop_method=1 will only contain information from the Stable Phase.

@VassilisVassiliadis VassilisVassiliadis added the ci This is related to CI label Sep 5, 2025
@VassilisVassiliadis VassilisVassiliadis force-pushed the vv_add_support_for_auto_stop_method branch from 0721735 to f67ffe6 Compare September 5, 2025 11:03
@VassilisVassiliadis
Copy link
Member Author

@danielelotito here's the PR for the auto-stop-method you developed

Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
…ameter

Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
…eriments

Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
…ttrainer.py

Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
Copy link
Member

@AlessandroPomponio AlessandroPomponio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@VassilisVassiliadis VassilisVassiliadis added this pull request to the merge queue Sep 8, 2025
Merged via the queue into main with commit 6be963f Sep 8, 2025
16 checks passed
@VassilisVassiliadis VassilisVassiliadis deleted the vv_add_support_for_auto_stop_method branch September 8, 2025 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci This is related to CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants