Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active Node Timeout for Build Pipelines #2130

Merged
merged 1 commit into from
Oct 14, 2020

Conversation

adamfarley
Copy link
Contributor

Currently, if there are no active nodes that match a build job's
labels, we fail immediately.

This change is intended to add a modifiable timeout value, so if we
don't find an active node that matches the build job's labels, we
wait for x minutes, periodically checking to see if a node matching
the labels has come online.

Signed-off-by: Adam Farley adam.farley@uk.ibm.com

Copy link
Contributor

@M-Davies M-Davies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long review, some of my comments are just repeated points to make it easier to change if they need changing 😬

pipelines/build/common/openjdk_build_pipeline.groovy Outdated Show resolved Hide resolved
pipelines/build/common/openjdk_build_pipeline.groovy Outdated Show resolved Hide resolved
pipelines/jobs/pipeline_job_template.groovy Show resolved Hide resolved
pipelines/build/common/config_regeneration.groovy Outdated Show resolved Hide resolved
pipelines/build/common/config_regeneration.groovy Outdated Show resolved Hide resolved
pipelines/build/common/config_regeneration.groovy Outdated Show resolved Hide resolved
pipelines/build/common/openjdk_build_pipeline.groovy Outdated Show resolved Hide resolved
pipelines/build/prTester/pr_test_pipeline.groovy Outdated Show resolved Hide resolved
@karianna karianna added the bug Issues that are problems in the code as reported by the community label Oct 8, 2020
@karianna karianna added this to the October 2020 milestone Oct 8, 2020
@adamfarley
Copy link
Contributor Author

All issues handled or replied to. Squashing and pushing.

@adamfarley adamfarley force-pushed the timeout_when_no_nodes_found branch 3 times, most recently from 189af5d to 64a24b8 Compare October 8, 2020 20:37
Copy link
Contributor

@karianna karianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more suggestions (not a hill I'm going to die on)

def waitForANodeToBecomeActive(def label) {
def NodeHelper = context.library(identifier: 'openjdk-jenkins-helper@master').NodeHelper

if (NodeHelper.nodeIsOnline(label)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we need to change nodeIsOnline a little here. What it's really doing is saying that there is 'a' node (perhaps out of several) that is available. It's a subtle distinction but I think an important one.

Even aNodeIsOnline(label) may be more descriptive.

I could also see us enhancing NodeHelper to add a numberOfOnlineNodes(label) or some such.

Copy link
Contributor Author

@adamfarley adamfarley Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numberOfOnlineNodes: I think this already exists in the Jenkins pipeline API. It's called "nodesByLabel".

nodesByLabel Jenkins pipeline API link.
"nodesByLabel: List of nodes by Label, by default excludes offline nodes."

We already use it here: https://github.com/AdoptOpenJDK/openjdk-tests/pull/1950/files



if (activeNodeTimeout > 0) {
context.println("Will check again periodically until either one comes online, or " + buildConfig.ACTIVE_NODE_TIMEOUT + " minutes (ACTIVE_NODE_TIMEOUT) has passed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

"Will check again periodically until a node labeled " + label + " comes online, or " + buildConfig.ACTIVE_NODE_TIMEOUT + " minutes (ACTIVE_NODE_TIMEOUT) has passed."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will change.

Copy link
Contributor

@M-Davies M-Davies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing those changes! 👍

@smlambert smlambert removed their request for review October 9, 2020 16:12
@smlambert
Copy link
Contributor

Removed myself as a reviewer, as 3 reviews seems like plenty already ;)

@karianna
Copy link
Contributor

TestCompilation > openjdk_build_pipelineTest() FAILED
    org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
    common/openjdk_build_pipeline.groovy: 681: [Static type checking] - Cannot find matching method testDoubles.ContextStub#sleep(java.util.LinkedHashMap <java.lang.String, java.io.Serializable>). Please check if the declared type is correct and if the method exists.
     @ line 681, column 17.
                       context.sleep(time: 1, unit: "MINUTES")

@adamfarley
Copy link
Contributor Author

adamfarley commented Oct 12, 2020

Hmm, looks like the ContextStub class we use for test compilation lacks a stub for the advanced sleep method. Looking to see if it's definitely there in the context class the actual pipeline uses.

EDIT: Sure took me long enough to figure out the advanced sleep method isn't in Groovy, it's in the pipeline library we import. Docs here:

https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#sleep-sleep

Will add the method's signature to the ContextStub class that we use for testing.

EDIT 2: Wouldn't work after multiple changes, so we're back to sleep(int) with a useful comment.

@adamfarley adamfarley force-pushed the timeout_when_no_nodes_found branch 6 times, most recently from da2a7b1 to bba0a61 Compare October 13, 2020 12:00
Currently, if there are no active nodes that match a build job's
labels, we fail immediately.

This change is intended to add a modifiable timeout value, so if we
don't find an active node that matches the build job's labels, we
wait for x minutes, periodically checking to see if a node matching
the labels has come online.

Signed-off-by: Adam Farley <adam.farley@uk.ibm.com>
@adamfarley adamfarley merged commit 9a80070 into adoptium:master Oct 14, 2020
@adamfarley adamfarley changed the title WIP: Active Node Timeout for Build Pipelines Active Node Timeout for Build Pipelines Oct 14, 2020
@adamfarley adamfarley self-assigned this Oct 14, 2020
gdams added a commit to gdams/openjdk-build that referenced this pull request Oct 20, 2020
@adamfarley adamfarley deleted the timeout_when_no_nodes_found branch July 10, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues that are problems in the code as reported by the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make build/test "Queue" phase abort if no nodes online after X mins
5 participants