Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DrainJobsMode #2569

Merged
merged 46 commits into from
May 23, 2023
Merged

Add DrainJobsMode #2569

merged 46 commits into from
May 23, 2023

Conversation

Link-
Copy link
Collaborator

@Link- Link- commented May 8, 2023

Context

Fixes https://github.com/github/c2c-actions-runtime/issues/2417

This PR adds Drain Jobs Mode flag (disabled by default) which will prevent the Listener and EphemeramRunnerSet from provisioning new runners whenever the gha-runner-scale-set has been patched while there are running jobs. This mode will force the controller manager to wait until all jobs have been completed before applying the changes.

Contrary to our conversations, this mode is disabled by default and will need to be explicitly enabled.

Side quests

  • This PR also fixes the validate-gha-chart.yaml workflow by using a released version of chart-tester 2.4.0 instead of master branch
  • Install kubebuilder-tools instead of a standalone kubebuilder in go.yaml
  • These have been addressed in Fix broken chart validation workflows #2589

Problem details

Reproduction steps:

  • Setup ARC with runner image tag set to latest
  • Run N number of jobs
  • Change the runner image tag to a specific version
  • Helm upgrade

TODO

@Link- Link- requested review from mumoshu, toast-gear, a team and nikola-jokic as code owners May 8, 2023 14:42
@Link- Link- added the gha-runner-scale-set Related to the gha-runner-scale-set mode label May 9, 2023
@Link- Link- self-assigned this May 9, 2023
main.go Outdated Show resolved Hide resolved
nikola-jokic
nikola-jokic previously approved these changes May 22, 2023
Copy link
Member

@nikola-jokic nikola-jokic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +26 to +33
wait-to-finish:
description: 'Wait for the workflow run to finish'
required: true
default: "true"
wait-to-running:
description: 'Wait for the workflow run to start running'
required: true
default: "false"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These have been added to make the action a bit more generic. Sometimes, we just want to trigger the workflow run and not necessarily wait until it finishes successfully. These inputs allow for that.

Comment on lines +129 to +156
- name: Wait for workflow to start running
if: inputs.wait-to-running == 'true' && inputs.wait-to-finish == 'false'
uses: actions/github-script@v6
with:
script: |
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms))
}
const owner = '${{inputs.repo-owner}}'
const repo = '${{inputs.repo-name}}'
const workflow_run_id = ${{steps.query_workflow.outputs.workflow_run}}
const workflow_job_id = ${{steps.query_workflow.outputs.workflow_job}}
let count = 0
while (count++<10) {
await sleep(30 * 1000);
let getRunResponse = await github.rest.actions.getWorkflowRun({
owner: owner,
repo: repo,
run_id: workflow_run_id
})
console.log(`${getRunResponse.data.html_url}: ${getRunResponse.data.status} (${getRunResponse.data.conclusion})`);
if (getRunResponse.data.status == 'in_progress') {
console.log(`Workflow run is in progress.`)
return
}
}
core.setFailed(`The triggered workflow run didn't start properly using ${{inputs.arc-name}}`)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This waits until the workflow run is in_progress as opposed to finished

main.go Outdated
@@ -131,6 +132,7 @@ func main() {
flag.StringVar(&logLevel, "log-level", logging.LogLevelDebug, `The verbosity of the logging. Valid values are "debug", "info", "warn", "error". Defaults to "debug".`)
flag.StringVar(&logFormat, "log-format", "text", `The log format. Valid options are "text" and "json". Defaults to "text"`)
flag.BoolVar(&autoScalingRunnerSetOnly, "auto-scaling-runner-set-only", false, "Make controller only reconcile AutoRunnerScaleSet object.")
flag.StringVar(&updateStrategy, "update-strategy", "immediate", "Immediately or eventually mutate resources on upgrade with running/pending jobs.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to provide more hints in flag docs ☺️

Suggested change
flag.StringVar(&updateStrategy, "update-strategy", "immediate", "Immediately or eventually mutate resources on upgrade with running/pending jobs.")
flag.StringVar(&updateStrategy, "update-strategy", "immediate", `Strategy for mutating resources on upgrade with running/pending jobs. (values: "eventual", "immediate"; default: "immediate")`

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thank you

main.go Outdated
Comment on lines 175 to 177
if len(updateStrategy) > 0 {
log.Info("update-strategy is set to: ", "updateStrategy", updateStrategy)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(updateStrategy) > 0 {
log.Info("update-strategy is set to: ", "updateStrategy", updateStrategy)
}
switch updateStrategy {
case "eventual", "immediate":
default:
log.Info(`Update strategy not recognized. Defaulting to "immediately"`, "updateStrategy", updateStrategy)
updateStrategy = "immediate"
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -57,6 +57,7 @@ type AutoscalingRunnerSetReconciler struct {
ControllerNamespace string
DefaultRunnerScaleSetListenerImage string
DefaultRunnerScaleSetListenerImagePullSecrets []string
UpdateStrategy string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be awesome if we create a type for update strategy.

type UpdateStrategy string
// Update strategies
const (
        // UpdateStrategyImmediate will not recreate resources until all
        // pending / running jobs have completed.
        // This can lead to a larger time to apply the change but it will ensure
        // that you don't have any overprovisioning of runners.
        UpdateStrategyImmediate = UpdateStrategy("immediate")
        // docs for update strategy
        UpdateStrategyEventual = UpdateStrategy("eventual")
)

And then the UpdateStrategy fields is of type UpdateStrategy. We can document the strategies on package level

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +1731 to +1733
autoscalingRunnerSetTestTimeout,
autoscalingRunnerSetTestInterval,
).Should(BeTrue())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikola-jokic I added these to this test you wrote. This test was timing out randomly: https://github.com/actions/actions-runner-controller/actions/runs/5047834860/jobs/9073153675

Copy link
Member

@nikola-jokic nikola-jokic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@Link- Link- merged commit 8afef51 into master May 23, 2023
14 checks passed
@Link- Link- deleted the Link-/fix-overprovisioning branch May 23, 2023 11:42
@Link- Link- added this to the gha-runner-scale-set-0.5.0 milestone Jul 28, 2023
@Link- Link- mentioned this pull request Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gha-runner-scale-set Related to the gha-runner-scale-set mode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants