test: Increase service/DNS timeout from 30 to 240 seconds #16820

jrajahalme · 2021-07-07T16:07:01Z

Increase the timeout for service to acquire it's backends in hopes
the 30 seconds was simply too optimistic in some scenarios. This could
avoid test flakes like:

Failed to resolve kafka-service DNS entry in pod empire-outpost-8888-544cdcd9b8-72hgl
Expected
    <*errors.errorString | 0xc001e05fd0>: {
        s: "unable to resolve DNS for service kafka-service in pod empire-outpost-8888-544cdcd9b8-72hgl: 30s timeout expired",
    }
to be nil
/home/jenkins/workspace/Cilium-PR-K8s-1.21-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/KafkaPolicies.go:157

In the above case the service did not have any backends yet, even when
the backend pod was up and running. Give k8s apiserver a bit more time
to converge.

            "status": {
                "loadBalancer": {}
            }

(from svc.json in https://jenkins.cilium.io/job/Cilium-PR-K8s-1.21-kernel-4.9/939/testReport/junit/Suite-k8s-1/21/K8sKafkaPolicyTest_Kafka_Policy_Tests_KafkaPolicies/)

Signed-off-by: Jarno Rajahalme jarno@isovalent.com

aditighag

I don't quite follow why absence of service backends would fail DNS resolution.

In the above case the service did not have any backends yet, even when
the backend pod was up and running. Give k8s apiserver a bit more time
to converge.

$ k get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
my-service   ClusterIP   10.39.240.33    <none>        80/TCP    6s

$ k get ep
NAME         ENDPOINTS          AGE
my-service   <none>             15s

$ k exec -it dnsutils nslookup my-service
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Server:		10.39.240.10
Address:	10.39.240.10#53

Name:	my-service.default.svc.cluster.local
Address: 10.39.240.33

jrajahalme · 2021-07-09T07:22:15Z

I don't quite follow why absence of service backends would fail DNS resolution.

This was a hasty assumption of mine. Not sure but maybe DNS resolution could fail for a headless service if there are no backends, but that should not be the case here. Now I don't know why the DNS resolution for the service failed, but a longer timeout might still avoid the test flake.

jrajahalme · 2021-07-09T07:23:19Z

test-me-please

Increase the timeout for service top acquire it's backends in hopes the 30 seconds was simply too optimistic in some scenarios. This could avoid test flakes like: Failed to resolve kafka-service DNS entry in pod empire-outpost-8888-544cdcd9b8-72hgl Expected <*errors.errorString | 0xc001e05fd0>: { s: "unable to resolve DNS for service kafka-service in pod empire-outpost-8888-544cdcd9b8-72hgl: 30s timeout expired", } to be nil /home/jenkins/workspace/Cilium-PR-K8s-1.21-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/KafkaPolicies.go:157 In the above case the service did not have any backends yet, even when the backend pod was up and running. Give k8s apiserver a bit more time to converge. "status": { "loadBalancer": {} } Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

jrajahalme · 2021-07-12T10:04:42Z

rebased due to completely broken CI

jrajahalme · 2021-07-12T10:05:15Z

test-me-please

jrajahalme · 2021-07-13T09:38:18Z

test-1.21-4.9
timed out

jrajahalme · 2021-07-13T09:39:18Z

test-1.21-4.9

jrajahalme added area/CI-improvement Topic or proposal to improve the Continuous Integration workflow release-note/misc This PR makes changes that have no direct user impact. labels Jul 7, 2021

jrajahalme requested review from a team as code owners July 7, 2021 16:07

jrajahalme requested review from tklauser and aditighag July 7, 2021 16:07

maintainer-s-little-helper bot assigned aditighag and tklauser Jul 7, 2021

tklauser approved these changes Jul 8, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned tklauser Jul 8, 2021

aditighag reviewed Jul 8, 2021

View reviewed changes

aditighag mentioned this pull request Jul 9, 2021

[v1.10] Fixes for platforms where init container fails to run #16824

Merged

jrajahalme force-pushed the pr/jrajahalme/test-kafka-service-timeout branch from b8b824e to 1da0082 Compare July 12, 2021 10:04

jrajahalme added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 14, 2021

nebril merged commit 507a249 into cilium:master Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Increase service/DNS timeout from 30 to 240 seconds #16820

test: Increase service/DNS timeout from 30 to 240 seconds #16820

jrajahalme commented Jul 7, 2021 •

edited

aditighag left a comment •

edited

jrajahalme commented Jul 9, 2021

jrajahalme commented Jul 9, 2021

jrajahalme commented Jul 12, 2021

jrajahalme commented Jul 12, 2021

jrajahalme commented Jul 13, 2021

jrajahalme commented Jul 13, 2021

test: Increase service/DNS timeout from 30 to 240 seconds #16820

test: Increase service/DNS timeout from 30 to 240 seconds #16820

Conversation

jrajahalme commented Jul 7, 2021 • edited

aditighag left a comment • edited

Choose a reason for hiding this comment

jrajahalme commented Jul 9, 2021

jrajahalme commented Jul 9, 2021

jrajahalme commented Jul 12, 2021

jrajahalme commented Jul 12, 2021

jrajahalme commented Jul 13, 2021

jrajahalme commented Jul 13, 2021

jrajahalme commented Jul 7, 2021 •

edited

aditighag left a comment •

edited