Skip to content

Latest commit



506 lines (381 loc) · 27 KB

File metadata and controls

506 lines (381 loc) · 27 KB

not (epub or latex or html)

WARNING: You are looking at unreleased Cilium documentation. Please use the official rendered version released here:

CI / Jenkins

The main CI infrastructure is maintained at

Jobs Overview


Runs validated Ginkgo tests which are confirmed to be stable and have been verified. These tests must always pass.

The configuration for this job is contained within ginkgo.Jenkinsfile.

It first runs unit tests using docker-compose using a YAML located at test/docker-compose.yaml.

The next steps happens in parallel:

  • Runs the single-node e2e tests using the Docker runtime.
  • Runs the multi-node Kubernetes e2e tests against the latest default version of Kubernetes specified above.

This job can be used to run tests on custom branches. To do so, log into Jenkins and go to . Then add your branch name to GitHub Organization -> cilium -> Filter by name (with wildcards) -> Include field and save changes. After you don't need to run tests on your branch, please remove the branch from this field.


It is also possible to run specific tests from this suite via test-focus and test-gke. It takes trailing words as a regex. If you want to run only one It block, you need to prepend it with a test suite and create a regex, e.g test-focus K8sDatapathConfig.*Check connectivity with automatic direct nodes routes

test-focus K8s Runs all kubernetes tests
test-focus K8sConformance Runs all k8s conformance tests
test-focus K8sChaos Runs all k8s chaos tests
test-focus K8sDatapathConfig Runs all k8s datapath configuration tests
test-focus K8sDemos Runs all k8s demo tests
test-focus K8sKubeProxyFreeMatrix Runs all k8s kube-proxy free matrix tests
test-focus K8sFQDNTest Runs all k8s fqdn tests
test-focus K8sHealthTest Runs all k8s health tests
test-focus K8sHubbleTest Runs all k8s Hubble tests
test-focus K8sIdentity Runs all k8s identity tests
test-focus K8sIstioTest Runs all k8s Istio tests
test-focus K8sKafkaPolicyTest Runs all k8s Kafka tests
test-focus K8sPolicyTest Runs all k8s policy tests
test-focus K8sServicesTest Runs all k8s services tests
test-focus K8sUpdates Runs k8s update tests
test-focus Runtime Runs all runtime tests


Runs the Kubernetes e2e tests with a 4.19 kernel. The configuration for this job is contained within ginkgo-kernel.Jenkinsfile.


Runs the Kubernetes e2e tests against all Kubernetes versions that are not currently not tested as part of each pull-request, but which Cilium still supports, as well as the the most-recently-released versions of Kubernetes that that might not be declared stable by Kubernetes upstream. Check the contents of ginkgo-kubernetes-all.Jenkinsfile in the branch of Cilium for which you are running tests to see which Kubernetes versions will be tested against.




Runs long-lived tests which take extended time. Some of these tests have an expected failure rate.

Nightly tests run once per day in the Cilium-Nightly-Tests-PR job. The configuration for this job is stored in Jenkinsfile.nightly.

To see the results of these tests, you can view the JUnit Report for an individual job:

  1. Click on the build number you wish to get test results from on the left hand side of the Cilium-Nightly-Tests-PR job.
  2. Click on 'Test Results' on the left side of the page to view the results from the build. This will give you a report of which tests passed and failed. You can click on each test to view its corresponding output created from Ginkgo.

This first runs the Nightly tests with the following setup:

  • 4 Kubernetes 1.8 nodes
  • 4 GB of RAM per node.
  • 4 vCPUs per node.

Then, it runs tests Kubernetes tests against versions of Kubernetes that are currently not tested against as part of each pull-request, but that Cilium still supports.

It also runs a variety of tests against Envoy to ensure that proxy functionality is working correctly.


As part of Cilium development, we use a custom base box with a bunch of pre-installed libraries and tools that we need to enhance our daily workflow. That base box is built with Packer and it is hosted in the packer-ci-build GitHub repository.

New versions of this box can be created via Jenkins Packer Build, where new builds of the image will be pushed to Vagrant Cloud . The version of the image corresponds to the BUILD_ID environment variable in the Jenkins job. That version ID will be used in Cilium Vagrantfiles.

Changes to this image are made via contributions to the packer-ci-build repository. Authorized GitHub users can trigger builds with a GitHub comment on the PR containing the trigger phrase build-me-please. In case that a new box needs to be rebased with a different branch than master, authorized developers can run the build with custom parameters. To use a different Cilium branch in the job go to Build with parameters and a base branch can be set as the user needs.

This box will need to be updated when a new developer needs a new dependency that is not installed in the current version of the box, or if a dependency that is cached within the box becomes stale.

Make sure that you update vagrant box versions in test Vagrantfile and root Vagrantfile after new box is built and tested.

Once you change the image versions locally, create a branch named pr/update-packer-ci-build and open a PR It is important that you use that branch name so the VM images are cached into before the branch is merged.

Testing matrix

We are currently testing following kernel - k8s version pairs in our CI:

Kubernetes version Kernel version
Vagrant k8s clusters per PR
1.11 5.x.x (net-next)
1.17 4.19.57
1.18 4.9
Vagrant k8s clusters (in addition to PR) per backport
1.{12-17} 4.9
GKE clusters
1.14.10 4.14.138+

Triggering Pull-Request Builds With Jenkins

To ensure that build resources are used judiciously, builds on Jenkins are manually triggered via comments on each pull-request that contain "trigger-phrases". Only members of the Cilium GitHub organization are allowed to trigger these jobs. Refer to the table below for information regarding which phrase triggers which build, which build is required for a pull-request to be merged, etc. Each linked job contains a description illustrating which subset of tests the job runs.

Jenkins Job Trigger Phrases Required To Merge?
K8s-1.18-kernel-4.9 test-me-please, retest-4.9 Yes
K8s-1.17-Kernel-4.19 test-me-please, retest-4.19 Yes
K8s-1.11-Kernel-netnext test-me-please, retest-net-next Yes
Runtime-4.9 test-me-please, retest-runtime Yes
Cilium-Ginkgo-Tests-Focus test-focus No
Cilium-PR-Ginkgo-Tests-k8s test-missed-k8s No
Cilium-Nightly-Tests-PR test-nightly No
Cilium-PR-Kubernetes-Upstream test-upstream-k8s No
Cilium-PR-Flannel test-flannel No
Cilium-PR-K8s-GKE test-me-please, test-gke Yes

For Backport PRs, the phrase test-backport-x.x (with x.x being the target Cilium version) should be used to trigger all of the above jobs which are marked as required to validate changes to existing releases.

There are some feature flags based on Pull Requests labels, the list of labels are the following:

  • area/containerd: Enable containerd runtime on all Kubernetes test.
  • ci/net-next: Run tests on net-next kernel. This causes the test-me-please target to only run on the net-next kernel. It is purely for testing on a different kernel, to merge a PR it must pass the CI without this flag.

Using Jenkins for testing

Typically when running Jenkins tests via one of the above trigger phases, it will run all of the tests in that particular category. However, there may be cases where you just want to run a single test quickly on Jenkins and observe the test result. To do so, you need to update the relevant test to have a custom name, and to update the Jenkins file to focus that test. Below is an example patch that shows how this can be achieved.

diff --git a/ginkgo.Jenkinsfile b/ginkgo.Jenkinsfile
index ee17808748a6..637f99269a41 100644
--- a/ginkgo.Jenkinsfile
+++ b/ginkgo.Jenkinsfile
@@ -62,10 +62,10 @@ pipeline {
             steps {
-                        sh 'cd ${TESTDIR}; ginkgo --focus="RuntimeValidated*" -v -noColor'
+                        sh 'cd ${TESTDIR}; ginkgo --focus="XFoooo*" -v -noColor'
-                        sh 'cd ${TESTDIR}; K8S_VERSION=1.9 ginkgo --focus=" K8sValidated*" -v -noColor ${FAILFAST}'
+                        sh 'cd ${TESTDIR}; K8S_VERSION=1.9 ginkgo --focus=" K8sFooooo*" -v -noColor ${FAILFAST}'
                     failFast: true
diff --git a/test/k8sT/Nightly.go b/test/k8sT/Nightly.go
index 62b324619797..3f955c73a818 100644
--- a/test/k8sT/Nightly.go
+++ b/test/k8sT/Nightly.go
@@ -466,7 +466,7 @@ var _ = Describe("NightlyExamples", func() {


-               It("K8sValidated Updating Cilium stable to master", func() {
+               FIt("K8sFooooo K8sValidated Updating Cilium stable to master", func() {
                        podFilter := "k8s:zgroup=testapp"

                        //This test should run in each PR for now.

CI Failure Triage

This section describes the process to triage CI failures. We define 3 categories:

Keyword Description
Flake Failure due to a temporary situation such as loss of connectivity to external services or bug in system component, e.g. is down, VM race conditions, kube-dns bug, ...
CI-Bug Bug in the test itself that renders the test unreliable, e.g. timing issue when importing and missing to block until policy is being enforced before connectivity is verified.
Regression Failure is due to a regression, all failures in the CI that are not caused by bugs in the test are considered regressions.

Pipelines subject to triage

Build/test failures for the following Jenkins pipelines must be reported as GitHub issues using the process below:

Pipeline Description
Ginkgo-Tests-Validated-master Runs whenever a PR is merged into master
Ginkgo-CI-Tests-Pipeline Runs every two hours on the master branch
Master-Nightly Runs durability tests every night
Vagrant-Master-Boxes-Packer-Build Runs on merge into packer-ci-build repository.
Release-branch <> Runs various Ginkgo tests on merge into branch ""

Triage process

  1. Discover untriaged Jenkins failures via the script. It defaults to checking the previous 24 hours but this can be modified by setting the SINCE environment variable (it is a unix timestamp). The script checks the various test pipelines that need triage.

    $ contrib/scripts/


    You can quickly assign SINCE with statements like SINCE=`date -d -3days`

  2. Investigate the failure you are interested in and determine if it is a CI-Bug, Flake, or a Regression as defined in the table above.
    1. Search GitHub issues to see if bug is already filed. Make sure to also include closed issues in your search as a CI issue can be considered solved and then re-appears. Good search terms are:
      • The test name, e.g. :

        k8s-1.7.K8sValidatedKafkaPolicyTest Kafka Policy Tests KafkaPolicies (from (k8s-1.7.xml))
      • The line on which the test failed, e.g. :
      • The error message, e.g. :

        Failed to produce from empire-hq on topic deathstar-plan
  3. If a corresponding GitHub issue exists, update it with:
    1. A link to the failing Jenkins build (note that the build information is eventually deleted).
    2. Attach the zipfile downloaded from Jenkins with logs from the failing tests. A zipfile for all tests is also available.
    3. Check how much time has passed since the last reported occurrence of this failure and move this issue to the correct column in the CI flakes project board.
  4. If no existing GitHub issue was found, file a new GitHub issue:

    1. Attach zipfile downloaded from Jenkins with logs from failing test
    2. If the failure is a new regression or a real bug:
      1. Title: <Short bug description>
      2. Labels kind/bug and needs/triage.
    3. If failure is a new CI-Bug, Flake or if you are unsure:
      1. Title CI: <testname>: <cause>, e.g. CI: K8sValidatedPolicyTest Namespaces: cannot curl service
      2. Labels kind/bug/CI and needs/triage
      3. Include a link to the failing Jenkins build (note that the build information is eventually deleted).
      4. Attach zipfile downloaded from Jenkins with logs from failing test
      5. Include the test name and whole Stacktrace section to help others find this issue.
      6. Add issue to CI flakes project.


    Be extra careful when you see a new flake on a PR, and want to open an issue. It's much more difficult to debug these without context around the PR and the changes it introduced. When creating an issue for a PR flake, include a description of the code change, the PR, or the diff. If it isn't related to the PR, then it should already happen in master, and a new issue isn't needed.

  5. Edit the description of the Jenkins build to mark it as triaged. This will exclude it from future output.

    1. Login -> Click on build -> Edit Build Information
    2. Add the failure type and GH issue number. Use the table describing the failure categories, at the beginning of this section, to help categorize them.


    This step can only be performed with an account on Jenkins. If you are interested in CI failure reviews and do not have an account yet, ping us on Slack.


  • Flake, is down
  • Flake, DNS not ready, #3333
  • CI-Bug, K8sValidatedPolicyTest: Namespaces, pod not ready, #9939
  • Regression, k8s host policy, #1111

Bisect process

If you are unable to triage the issue, you may try to use bisect job to find when things went awry in Jenkins.

  1. Log in to Jenkins
  2. Go to .
  3. Under Git Bisect build step fill in Good start revision and Bad end revision.
  4. Write description of what you are looking for under Search Identifier.
  5. Adjust Retry number and Min Successful Runs to account for current CI flakiness.
  6. Save the configuration.
  7. Click "Build Now" in .
  8. This may take over a day depending on how many underlying builds will be created. The result will be in bisect-cilium console output, actual builds will be happening in job.

Infrastructure details

Logging into VM running tests

  1. If you have access to credentials for Jenkins, log into the Jenkins slave running the test workload
  2. Identify the vagrant box running the specific test
$ vagrant global-status
id       name                          provider   state   directory
6e68c6c  k8s1-build-PR-1588-6          virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q/tests/k8s
ec5962a  cilium-master-build-PR-1588-6 virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q
bfaffaa  k8s2-build-PR-1588-6          virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q/tests/k8s
3fa346c  k8s1-build-PR-1588-7          virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q@2/tests/k8s
b7ded3c  cilium-master-build-PR-1588-7 virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q@2
  1. Log into the specific VM
$ JOB_BASE_NAME=PR-1588 BUILD_NUMBER=6 vagrant ssh 6e68c6c

Jenkinsfiles Extensions

Cilium uses a custom Jenkins helper library to gather metadata from PRs and simplify our Jenkinsfiles. The exported methods are:

  • ispr(): return true if the current build is a PR.
  • setIfPr(string, string): return the first argument in case of a PR, if not a PR return the second one.
  • BuildIfLabel(String label, String Job): trigger a new Job if the PR has that specific Label.
  • Status(String status, String context): set pull request check status on the given context, example Status("SUCCESS", "$JOB_BASE_NAME")