Standardize and document a labelling scheme for Jenkins nodes #93

smlambert · 2017-12-12T19:46:31Z

As we add more machines, and write more pipeline scripts for various builds and tests in Jenkins, it is useful to settle on a labelling scheme that will allow us flexibility and improved machine management (even taking advantage of some of the Jenkins APIs for automated labelling).

Benefits to having and documenting a 'scheme':

as new machines and machine capabilities are added, it is clear how to add/organize new labels and sub-categories
avoids label duplication
prevent running jobs on machines we do not want them to run on
added flexibility in pipeline scripts
triage failures faster, as it becomes clearer to see that a test fails on a machine with particular labelled attributes, but passes on other machines (different label set) example: pipeline script to run tests on sw.os.windows (doesn’t care what version), but would be interesting to note if see failures only on windows.10 machines
allows for certain level of machine sanity checking, especially if we automate the labelling via Jenkins APIs, so that we can compare the expected machine config (via ansible) with the actual machine config (via calls to Jenkins API http://javadoc.jenkins-ci.org/hudson/model/Node.html#getAssignedLabels--).

I suggest the schema 'tree' below, categorizing under 3 top-level roots, hw, sw, and ci (continuous integration catch-all, for all groupings not hw or sw), each with its own sub-categories.

hw.platform.zlinux
xlinux
plinux
windows
aix
zos
osx

hw.arch.s390
ppc
x86

hw.endian.le

hw.bits.64bit
32bit

hw.physical.cpu.xx
disk.xx
memory.xx

sw.os.rhel.6
sw.os.rhel.7
sw.os.ubuntu.14
sw.os.ubuntu.16
sw.os.sles.11
sw.os.sles.12
sw.os.aix.6
sw.os.aix.7
sw.os.osx.10
sw.os.windows.8
sw.os.windows.10
sw.os.zos.1_13 (where dotted version numbers represented by _ )
sw.os.zos.2_1

sw.tool.gcc.xx (where xx represents version number)
sw.tool.docker.xx
sw.tool.hypervisor.kvm, etc

ci.role.perf
ci.role.compile
ci.role.test
ci.role.test.jck

We could just start with the labels that are of direct use to build/test scripts and add as we see the need.

Do people have strong thoughts on the matter? In general, labels are of no consequence to people unless they are writing scripts and/or adding builds to Jenkins, so if you are actively working on builds, please share your thoughts. Thanks.

smlambert · 2017-12-12T20:08:24Z

ci.sponsor.ibm
ci.sponsor.ljc
ci.sponsor.joyent
etc

tellison · 2017-12-13T12:38:31Z

Thanks for bringing order the chaos @smlambert ! I'm +1 to the idea of structured labels.
A few minor comments:

Although the labels 'look' hierarchical, they are not interpreted as such; so people will have to be aware that, for example, specifying sw.tool.docker won't run on a machine only labelled as sw.tool.docker.1_7_1. We may have to have a few labels on a node where such details are available.
Not sure that we need the physical segment, or if we want it let's use it consistently. So (hw.physical.cpu.4 and hw.physical.endian.be) or (hw.cpu.4 and hw.endian.be) but not a mixture. I think the hw implies physical so I would just drop it.
What is the size of the disk you are referencing? Presumably the actual size of the workspace disk; not the free space, or minimum size required etc. Likewise for memory. Not too sure how the scripts will use this physical size info as it won't represent available storage.
Is the platform segment designed as a shorthand for other tag groups? i.e. why have hw.platform.zos and sw.os.zos? If the scripts care about CPU architecture then they would specify e.g. hw.arch.s390x and if they care about the OS they would specify e.g. sw.os.osx.10
really pedantic now ;-) hw.bits.64bit -> hw.bits.64.

So your list, updated with those comments for your consideration, becomes:

hw.arch.s390x
ppc64le
x86

hw.endian.le
be

hw.bits.64
32

hw.cpu.xx
hw.disk.xx (workspace disk size in Gb)
hw.memory.xx (size in Gb)

sw.os.rhel.6
sw.os.rhel.7
sw.os.ubuntu.14
sw.os.ubuntu.16
sw.os.sles.11
sw.os.sles.12
sw.os.aix.6
sw.os.aix.7
sw.os.osx.10
sw.os.windows.8
sw.os.windows.10
sw.os.zos.1_13 (where dotted version numbers represented by _ )
sw.os.zos.2_1

sw.tool.gcc.xx (where xx represents version number)
sw.tool.docker.xx
sw.tool.hypervisor.kvm, etc

ci.role.perf
ci.role.compile
ci.role.test
ci.role.test.jck
ci.sponsor.ibm
ci.sponsor.ljc
ci.sponsor.joyent
etc

smlambert · 2017-12-13T14:28:54Z

I like your suggestions @tellison 👍

You are correct that writing a script that looks for a label named "sw.tool.docker" would not find a machine only labelled as sw.tool.docker.1_7_1 (and bear in mind, we may decide in this discussion that we do not care about version numbers at all for some tools/categories), but in the case where I am writing a script for a job that doesn't care what version, but just wants any machine that has docker installed, it would look for label.contains("sw.tool.docker") or label.startsWith("sw.tool.docker"), rather than label.equals("sw.tool.docker")...

I have also had the discussion with others regarding what if I am looking for a machine with memory or version greater than a certain value, and well, yes, as labels are returned as a string, I have to take the string and parse it and assign that memory or version portion to an int to compare it, but its scriptable...

For the hw labels like disk, memory etc, there are some tests that are designed to run on machines with a certain number of cores, or on machines with a certain amount of memory, but agreed, that if we decide we want these labels, it should be clear from the name what is meant by them, or best not to have them at all.

and as for turning chaos into order, I am mostly hoping for something more intuitive and useful than to refer to a map to know what machines I can use, a map that starts with the sentence, "if you are reading this, then our ... labels and ... nodes are a mystery to you..." eek!
https://cwiki.apache.org/confluence/display/INFRA/Jenkins+node+labels

sxa · 2017-12-13T16:22:15Z

I'd personally rather ditch the hierarchy, but maybe that's just me. plinux le ubuntu16.04 would seem like reasonably tags on their own. @smlambert are you planning to do granular searches on the tags to have a reason for the hieratchy? I can sort of see how it might make sense for hardware values such as RAM/Disk/CPU if we were going to base stuff on that though. I'm not massively bothered by it, but figured I'd share my view :-)

smlambert · 2017-12-13T22:41:47Z

Thanks @sxa555, I definitely have a use case for querying for all tools on a given machine, (so return all sw.tools.* per machine, where I may not know the set of tools in the list of labels (especially if some have been newly added via ansible and (eventually) automatically added to Jenkins node via its API), to be able to ask for each specifically.

Though to your point wouldn't need the deep hierarchy there, could just have tools.docker, etc. But this hierarchy also serves as guidance for where to add new labels and is generally benign/inoffensive for anyone not writing scripts that use them and possibly addresses some future uses of these categories.

"Print me today's map of the Jenkins machine farm" - what hardware is represented in it? what software? etc.

gdams · 2017-12-14T19:42:13Z

perhaps https://ci-jck.adoptopenjdk.net/ would be a good test bed to setup a labelling scheme that we could then migrate to our main jenkins?

gdams · 2018-03-05T01:54:05Z

So I'm kind of with @sxa555 on this..... all of this sw and ci stuff seems to be adding unncessary "fluff" to the schema. I do agree that we need to standardise the labeling schema as it is currently messy and I have been working on this in ansible by auto creating machines in jenkins.

My proposed schema is as follows but I am happy to modify/add labels if people are unhappy:

linux ubuntu1604 x64 cloudcone x86_64 test ubuntu ubuntu16

smlambert · 2018-03-19T03:44:59Z

schema - "a representation of a plan in the form of an outline or model", "organizes categories of information and the relationships among them"

The point of the 'fluff' is to organize labels into categories AND to make it clear how to add more labels as we grow and end up needing more labels. It enforces naming, that will help avoid label name conflicts later (when teams/companies want to merge their new features into this build farm).

I have worked on several projects where a schema was not put in place, and they end poorly, with a flat list of less than meaningful words (of a variety of styles, depending on who added them becoming more irregular as time marches on, since there is no obvious guidance or pattern to follow).

I am not sure I understand why the resistance to fluff. The fluff does not really cost anyone anything, since on label producer side, you have automated the labelling (therefore do not have to type in long names) and on the label consumer side, pipeline scripts are set up to use the labels, clarity in naming, make it easier to understand why a particular script is looking for a particular label.

I am happy to keep the list of labels to a minimum (only adding labels that will be 'consumed' by some script or job), but I think the labels should exemplify the 'plan'.

For the flat list linux ubuntu1604 x64 cloudcone x86_64 test ubuntu ubuntu16 there is near duplication, x64 / x86_64 and ubuntu / unbuntu16 / ubuntu1604. What scripts or jobs are using those labels presently?

gdams · 2018-03-19T09:32:13Z

the build scripts currently all use this schema

jdekonin · 2018-03-22T23:00:08Z

I like the updated list based on @tellison comments, with one addition...technically the arch ppcle does not exist. Arch and endian...no??

I am not fond of the idea of labels that mean the same thing. What is the difference between x64 and x86_64? I'm not saying your use of them is invalid, I'm just interested in why one or the other.

Is cloudcone a sponser or a service?

gdams · 2018-03-25T17:21:57Z

okay so are we happy with a schema based on #93 (comment)?

smlambert · 2018-03-27T21:03:21Z

Ok, let me clean it up, and document where I think we are at (in a README or Wiki on this repo). I will not remove any currently labelling at present. We can overlay the new schema, and switch testing over, then build scripts, then remove labels. I don't mind going around and doing the clean up of this over the next week or 2.

I also amend my initial statement about adding labels that are not used. I think we should add only those labels we actively need to differentiate machines, adding new ones only as needed.

smlambert · 2018-03-28T14:57:22Z

Working on the doc here (will replace jpg with better image shortly):
https://github.com/smlambert/openjdk-infrastructure/blob/labels/docs/jenkinslabels.md

Note that I still need to go and look at all of the 'consumers' of existing labels, to ensure we start with the minimal set based on usage. This implies that we can and possibly should fix scripts that could be more logically correct (which I will do as I find them).

I believe some of the label needs relate to restrictions around where you can build and subsequently run some of the linux builds (due to 'compile on lowest version of gcc available' story). Unclear if the linux flavours labels are used elsewhere either.

added csz25088.canlab.ibm.com ansible_host=9.23.30.88 adoptium#93

karianna · 2018-06-05T14:49:20Z

Hi all - did we come to a consensus here?

smlambert · 2018-06-05T22:08:00Z

On the test side, we have.

We have in the sense that I have labelled all of the test machines with the labels I proposed and have been using those labels for a while. This allows us to use the test CI scripts at Adopt and a few other Jenkins servers / open projects that follow the same labelling schema.

Because I was not sure of consensus, I have not:

removed the old labels
added new labels on build machines (just test machines)
updated build scripts to use new schema

AdamBrousseau · 2018-06-15T16:33:43Z

@smlambert can you open a PR for a Doc that has the schema that was decided upon. That will be easier to reference than trying to figure out which comment in this issue is the most correct version. I could do it if you wish but I know you were working on a doc already (don't want to step on any toes).

smlambert · 2018-06-15T17:06:10Z

Sure, will do @AdamBrousseau, thanks for the nudge!

- Change aix,ppcle,390 - Remove ubuntu version - Update to hierarchical labels based on standardized schema defined in adoptium/infrastructure#93 - Also remove nestmates spec which was added by default (eclipse-openj9#2270) Issue eclipse-openj9#1562 [skip ci] Signed-off-by: Adam Brousseau <adam.brousseau88@gmail.com>

- Conform to label convention outlined in adoptium/infrastructure#93 Signed-off-by: Adam Brousseau <adam.brousseau@ca.ibm.com>

karianna added the enhancement label Dec 12, 2017

karianna added this to Backlog in infrastructure Dec 12, 2017

This was referenced Jan 17, 2018

We do not currently have any sles12 s390x machines tagged with "test" #25

Closed

Sanitize /etc/hosts and hostnames on all our cloud machines #131

Closed

smlambert mentioned this issue Mar 4, 2018

WIP initial auto jenkins provisioning #192

Closed

AdamBrousseau mentioned this issue Mar 27, 2018

Update Jenkins node labels to match AdoptOpenJDK label schema eclipse-openj9/openj9#1562

Closed

3 tasks

jdekonin pushed a commit to jdekonin/adoptium-infrastructure that referenced this issue May 8, 2018

added new z/linux UB16 system

d1da57a

added csz25088.canlab.ibm.com ansible_host=9.23.30.88 adoptium#93

AdamBrousseau mentioned this issue Jun 15, 2018

Update machine labels to standard schema eclipse-openj9/openj9#2204

Merged

AdamBrousseau mentioned this issue Jul 17, 2018

Update node label adoptium/aqa-tests#481

Merged

AdamBrousseau added a commit to AdamBrousseau/aqa-tests that referenced this issue Jul 19, 2018

Update node labels for ppc64le builds

049c58d

- Conform to label convention outlined in adoptium/infrastructure#93 Signed-off-by: Adam Brousseau <adam.brousseau@ca.ibm.com>

AdamBrousseau mentioned this issue Jul 19, 2018

Update node labels for ppc64le builds adoptium/aqa-tests#482

Merged

smlambert pushed a commit to adoptium/aqa-tests that referenced this issue Jul 19, 2018

Update node labels for ppc64le builds (#482)

5524033

- Conform to label convention outlined in adoptium/infrastructure#93 Signed-off-by: Adam Brousseau <adam.brousseau@ca.ibm.com>

llxia mentioned this issue Jul 21, 2018

Java11 updates adoptium/aqa-tests#483

Merged

llxia mentioned this issue Aug 17, 2018

update label in windows scripts adoptium/aqa-tests#517

Closed

AdamBrousseau mentioned this issue Nov 6, 2018

Add label ci.role.test to Jenkins Pipelines and Test machines adoptium/aqa-tests#671

Closed

sxa modified the milestones: Icebox / On Hold, Backlog Mar 7, 2019

AdamBrousseau mentioned this issue Jan 13, 2021

Proposal: Unify labelling across adopt github repositories adoptium/temurin-build#2377

Open

smlambert mentioned this issue Nov 23, 2022

Some Jenkins nodes with sw.tool.docker label seem to be docker containers #2832

Closed

smlambert mentioned this issue Nov 30, 2022

remoteTriggerJckTests for jdk19+ AIX needs to use LABEL=sw.os.aix.7_2 adoptium/ci-jenkins-pipelines#521

Closed

smlambert mentioned this issue Mar 16, 2023

EPIC: Centralize and update test documentation adoptium/aqa-tests#1558

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize and document a labelling scheme for Jenkins nodes #93

Standardize and document a labelling scheme for Jenkins nodes #93

smlambert commented Dec 12, 2017

smlambert commented Dec 12, 2017

tellison commented Dec 13, 2017

smlambert commented Dec 13, 2017 •

edited

Loading

sxa commented Dec 13, 2017

smlambert commented Dec 13, 2017

gdams commented Dec 14, 2017

gdams commented Mar 5, 2018

smlambert commented Mar 19, 2018

gdams commented Mar 19, 2018

jdekonin commented Mar 22, 2018

gdams commented Mar 25, 2018

smlambert commented Mar 27, 2018

smlambert commented Mar 28, 2018

karianna commented Jun 5, 2018

smlambert commented Jun 5, 2018

AdamBrousseau commented Jun 15, 2018

smlambert commented Jun 15, 2018

Standardize and document a labelling scheme for Jenkins nodes #93

Standardize and document a labelling scheme for Jenkins nodes #93

Comments

smlambert commented Dec 12, 2017

smlambert commented Dec 12, 2017

tellison commented Dec 13, 2017

smlambert commented Dec 13, 2017 • edited Loading

sxa commented Dec 13, 2017

smlambert commented Dec 13, 2017

gdams commented Dec 14, 2017

gdams commented Mar 5, 2018

smlambert commented Mar 19, 2018

gdams commented Mar 19, 2018

jdekonin commented Mar 22, 2018

gdams commented Mar 25, 2018

smlambert commented Mar 27, 2018

smlambert commented Mar 28, 2018

karianna commented Jun 5, 2018

smlambert commented Jun 5, 2018

AdamBrousseau commented Jun 15, 2018

smlambert commented Jun 15, 2018

smlambert commented Dec 13, 2017 •

edited

Loading