Add CI for testing antrea compatibility with 4 K8s versions on CAPA #5476

jainpulkit22 · 2023-09-07T06:54:00Z

Add CI for testing compatibility with previous four K8s versions using Cluster API, with provider as AWS.

ci/jenkins/jobs/macros.yaml

ci/cluster-api/aws/templates/cluster.yaml

ci/jenkins/jobs/macros.yaml

jainpulkit22 · 2023-10-17T14:07:27Z

/test-rancher-e2e

rajnkamr · 2023-11-09T12:50:26Z

@jainpulkit22 ,
Need to resolve conflicting files on the PR

using Cluster API, with provider as AWS. Signed-off-by: Pulkit Jain <jainpu@vmware.com> Signed-off-by: Shengkai Lin <jefflin@sjtu.edu.cn> Signed-off-by: Zhengsheng Zhou <zhengshengz@vmware.com>

edwardbadboy · 2023-11-15T08:23:49Z

ci/jenkins/jobs/macros.yaml

+          #!/bin/bash
+          set -ex
+          DOCKER_REGISTRY="$(head -n1 ci/docker-registry)"
+          export JOB_NAME="matrix-${TEST_OS}-k8s-${K8S_VERSION//./-}-build-num"


I think Jenkins has a built-in variable JOB_NAME. We assign a custom value here. Is it because the custom values is more readable? In this case, maybe use CLUSTER_NAME as variable name. Same for JOB_NAME in other builders.

we can do that as well, i preferred this because it was already in use, when i started working on capa task.

edwardbadboy · 2023-11-15T08:26:15Z

ci/jenkins/jobs/macros.yaml

+          ./ci/jenkins/test-vmc.sh --cluster-name "${JOB_NAME}-${BUILD_NUMBER}" --setup-only --provider aws --aws-region "us-west-2" --aws-access-key-id "${AWS_ACCESS_KEY}" --aws-secret-access-key "${AWS_SECRET_KEY}" --aws-service-user-name "${AWS_SERVICE_USER_NAME}" --aws-service-user-role "${AWS_SERVICE_USER_ROLE_ARN}" --aws-vpc-id "${CAPA_VPC}" --aws-subnet-id "${CAPA_SUBNET}" 
+          testcases=("e2e" "conformance" "networkpolicy")
+          failure=0
+          for testcase in "${testcases[@]}"; do


I think we need to turn off errexit before running the test, and turn on errexit after running the test. For example:

set +e for testcase in ...; do ... done set -e

edwardbadboy · 2023-11-15T08:29:30Z

ci/jenkins/jobs/macros.yaml

+                if [[ $result == 124 ]]; then
+                  echo "Error: Clean up job of ${clustername} timeout"
+                fi
+                if [[ $result == 124 ]]; then


Not sure why there are two result == 124 conditions?

edwardbadboy · 2023-11-15T08:32:03Z

ci/jenkins/jobs/projects-cloud.yaml

+          branches:
+          - '*/main'
+          included_regions: []
+          cron: '' # 'H H * * *'


Can you explain why it's '' # 'H H * * *' ? I thought it should be 'H H * * *'. Same question for cron on line 1115.

edwardbadboy · 2023-11-15T08:33:18Z

ci/jenkins/jobs/projects-cloud.yaml

+                  - v1.23.1
+                  - v1.24.1
+                  - v1.25.1
+                  - v1.26.1


Are K8s v1.27 and v1.28 supported? Maybe we can change this to v1.25-v1.28.

edwardbadboy · 2023-11-15T08:37:54Z

ci/jenkins/test-vmc.sh

+    shift 2
+    if [ "$provider" = "aws" ]; then
+      SSH_USERNAME=ubuntu
+      SKIP_LIST="TestEgress|TestProxy|TestProxyHairpinIPv4"


May I know why these test cases need to be skipped? If need to skip them, maybe add short comment here to explain.

edwardbadboy · 2023-11-15T08:45:49Z

ci/jenkins/test-vmc.sh

+    sed -i "s/SSHAUTHORIZEDKEYS/default/g" ${GIT_CHECKOUT_DIR}/jenkins/out/cluster.yaml
+    sed -i "s/CLUSTERNAMESPACE/${CLUSTER}/g" ${GIT_CHECKOUT_DIR}/jenkins/out/namespace.yaml
+
+    sleep 15s


A bit confused, too. When did it trigger or start the initialization of the management cluster? I thought the management cluster was created in advance, and it should be ready before creating any workload clusters.

edwardbadboy · 2023-11-15T08:49:01Z

ci/jenkins/test-vmc.sh

+    export AWS_CONTROLLER_IAM_ROLE=$AWS_SERVICE_USER_ROLE_ARN
+    clusterctl delete --infrastructure aws
+    export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile --region $AWS_REGION)
+    clusterctl init --infrastructure aws


Can you explain why it need to run clusterctl delete and init here? Is it because the temporary assumed role has been expired?

edwardbadboy · 2023-11-15T08:52:05Z

ci/jenkins/test-vmc.sh

+    export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile --region $AWS_REGION)
+    clusterctl init --infrastructure aws
+
+    kubectl delete cluster ${CLUSTER} -n ${CLUSTER}


After deleting the cluster, can it traverse though all kinds of resources in the cluster template, and kubectl delete --all $KIND -n ${CLUSTER}, and finally delete the namespace? If this still cannot delete the ec2 and LB instance, it can try deleting via ec2 and elb commands.

edwardbadboy · 2023-11-15T08:53:24Z

ci/jenkins/test-vmc.sh

+    aws elb delete-load-balancer --load-balancer-name ${loadbalancer_name}
+    sleep 90s
+
+    echo "=== Cleaning up Security Groups ==="


Can it keep the security groups? In case some clusters are created at the same time, the security group may be used by other clusters.

XinShuYang · 2024-03-28T04:07:12Z

Our cloud CI has been migrated to jenkins.antrea.io, please continue verifying the job changes on it.

luolanzone · 2024-04-15T09:01:00Z

@jainpulkit22 are you still actively working on this? please help to estimate your bandwidth and make sure you deliver this in Antrea 2.0, or please remove it from the milestone if it's not a must-have.
cc @XinShuYang @rajnkamr

rajnkamr · 2024-04-15T09:25:12Z

@luolanzone ,
It is good to have candidate and aligned with cloud Jenkins goals wrt CAPA. We are considering this for 2.0
It was initially part of 1.15 however was moved out.

jainpulkit22 · 2024-08-12T06:57:06Z

Paused for now, shifting to CAPV testbeds or GCP testbeds.

jainpulkit22 marked this pull request as draft September 7, 2023 06:54

jainpulkit22 force-pushed the capi-aws branch 17 times, most recently from 3f232e3 to 41a1ec1 Compare September 13, 2023 17:36

jainpulkit22 marked this pull request as ready for review September 14, 2023 05:13

jainpulkit22 requested a review from rajnkamr September 14, 2023 05:13