-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DrainJobsMode #2569
Add DrainJobsMode #2569
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
wait-to-finish: | ||
description: 'Wait for the workflow run to finish' | ||
required: true | ||
default: "true" | ||
wait-to-running: | ||
description: 'Wait for the workflow run to start running' | ||
required: true | ||
default: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have been added to make the action a bit more generic. Sometimes, we just want to trigger the workflow run and not necessarily wait until it finishes successfully. These inputs allow for that.
- name: Wait for workflow to start running | ||
if: inputs.wait-to-running == 'true' && inputs.wait-to-finish == 'false' | ||
uses: actions/github-script@v6 | ||
with: | ||
script: | | ||
function sleep(ms) { | ||
return new Promise(resolve => setTimeout(resolve, ms)) | ||
} | ||
const owner = '${{inputs.repo-owner}}' | ||
const repo = '${{inputs.repo-name}}' | ||
const workflow_run_id = ${{steps.query_workflow.outputs.workflow_run}} | ||
const workflow_job_id = ${{steps.query_workflow.outputs.workflow_job}} | ||
let count = 0 | ||
while (count++<10) { | ||
await sleep(30 * 1000); | ||
let getRunResponse = await github.rest.actions.getWorkflowRun({ | ||
owner: owner, | ||
repo: repo, | ||
run_id: workflow_run_id | ||
}) | ||
console.log(`${getRunResponse.data.html_url}: ${getRunResponse.data.status} (${getRunResponse.data.conclusion})`); | ||
if (getRunResponse.data.status == 'in_progress') { | ||
console.log(`Workflow run is in progress.`) | ||
return | ||
} | ||
} | ||
core.setFailed(`The triggered workflow run didn't start properly using ${{inputs.arc-name}}`) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This waits until the workflow run is in_progress
as opposed to finished
main.go
Outdated
@@ -131,6 +132,7 @@ func main() { | |||
flag.StringVar(&logLevel, "log-level", logging.LogLevelDebug, `The verbosity of the logging. Valid values are "debug", "info", "warn", "error". Defaults to "debug".`) | |||
flag.StringVar(&logFormat, "log-format", "text", `The log format. Valid options are "text" and "json". Defaults to "text"`) | |||
flag.BoolVar(&autoScalingRunnerSetOnly, "auto-scaling-runner-set-only", false, "Make controller only reconcile AutoRunnerScaleSet object.") | |||
flag.StringVar(&updateStrategy, "update-strategy", "immediate", "Immediately or eventually mutate resources on upgrade with running/pending jobs.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to provide more hints in flag docs
flag.StringVar(&updateStrategy, "update-strategy", "immediate", "Immediately or eventually mutate resources on upgrade with running/pending jobs.") | |
flag.StringVar(&updateStrategy, "update-strategy", "immediate", `Strategy for mutating resources on upgrade with running/pending jobs. (values: "eventual", "immediate"; default: "immediate")` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thank you
main.go
Outdated
if len(updateStrategy) > 0 { | ||
log.Info("update-strategy is set to: ", "updateStrategy", updateStrategy) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(updateStrategy) > 0 { | |
log.Info("update-strategy is set to: ", "updateStrategy", updateStrategy) | |
} | |
switch updateStrategy { | |
case "eventual", "immediate": | |
default: | |
log.Info(`Update strategy not recognized. Defaulting to "immediately"`, "updateStrategy", updateStrategy) | |
updateStrategy = "immediate" | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -57,6 +57,7 @@ type AutoscalingRunnerSetReconciler struct { | |||
ControllerNamespace string | |||
DefaultRunnerScaleSetListenerImage string | |||
DefaultRunnerScaleSetListenerImagePullSecrets []string | |||
UpdateStrategy string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be awesome if we create a type for update strategy.
type UpdateStrategy string
// Update strategies
const (
// UpdateStrategyImmediate will not recreate resources until all
// pending / running jobs have completed.
// This can lead to a larger time to apply the change but it will ensure
// that you don't have any overprovisioning of runners.
UpdateStrategyImmediate = UpdateStrategy("immediate")
// docs for update strategy
UpdateStrategyEventual = UpdateStrategy("eventual")
)
And then the UpdateStrategy fields is of type UpdateStrategy
. We can document the strategies on package level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
autoscalingRunnerSetTestTimeout, | ||
autoscalingRunnerSetTestInterval, | ||
).Should(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikola-jokic I added these to this test you wrote. This test was timing out randomly: https://github.com/actions/actions-runner-controller/actions/runs/5047834860/jobs/9073153675
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Context
Fixes https://github.com/github/c2c-actions-runtime/issues/2417
This PR adds
Drain Jobs Mode
flag (disabled by default) which will prevent the Listener and EphemeramRunnerSet from provisioning new runners whenever the gha-runner-scale-set has been patched while there are running jobs. This mode will force the controller manager to wait until all jobs have been completed before applying the changes.Contrary to our conversations, this mode is disabled by default and will need to be explicitly enabled.
Side quests
This PR also fixes thevalidate-gha-chart.yaml
workflow by using a released version of chart-tester2.4.0
instead of master branchInstall kubebuilder-tools instead of a standalone kubebuilder in go.yamlProblem details
Reproduction steps:
TODO