Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Increase service/DNS timeout from 30 to 240 seconds #16820

Merged

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Jul 7, 2021

Increase the timeout for service to acquire it's backends in hopes
the 30 seconds was simply too optimistic in some scenarios. This could
avoid test flakes like:

Failed to resolve kafka-service DNS entry in pod empire-outpost-8888-544cdcd9b8-72hgl
Expected
    <*errors.errorString | 0xc001e05fd0>: {
        s: "unable to resolve DNS for service kafka-service in pod empire-outpost-8888-544cdcd9b8-72hgl: 30s timeout expired",
    }
to be nil
/home/jenkins/workspace/Cilium-PR-K8s-1.21-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/KafkaPolicies.go:157

In the above case the service did not have any backends yet, even when
the backend pod was up and running. Give k8s apiserver a bit more time
to converge.

            "status": {
                "loadBalancer": {}
            }

(from svc.json in https://jenkins.cilium.io/job/Cilium-PR-K8s-1.21-kernel-4.9/939/testReport/junit/Suite-k8s-1/21/K8sKafkaPolicyTest_Kafka_Policy_Tests_KafkaPolicies/)

Signed-off-by: Jarno Rajahalme jarno@isovalent.com

@jrajahalme jrajahalme added area/CI-improvement Topic or proposal to improve the Continuous Integration workflow release-note/misc This PR makes changes that have no direct user impact. labels Jul 7, 2021
@jrajahalme jrajahalme requested review from a team as code owners July 7, 2021 16:07
Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow why absence of service backends would fail DNS resolution.

In the above case the service did not have any backends yet, even when
the backend pod was up and running. Give k8s apiserver a bit more time
to converge.

$ k get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
my-service   ClusterIP   10.39.240.33    <none>        80/TCP    6s

$ k get ep
NAME         ENDPOINTS          AGE
my-service   <none>             15s

$ k exec -it dnsutils nslookup my-service
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Server:		10.39.240.10
Address:	10.39.240.10#53

Name:	my-service.default.svc.cluster.local
Address: 10.39.240.33

@jrajahalme
Copy link
Member Author

I don't quite follow why absence of service backends would fail DNS resolution.

This was a hasty assumption of mine. Not sure but maybe DNS resolution could fail for a headless service if there are no backends, but that should not be the case here. Now I don't know why the DNS resolution for the service failed, but a longer timeout might still avoid the test flake.

@jrajahalme
Copy link
Member Author

test-me-please

Increase the timeout for service top acquire it's backends in hopes
the 30 seconds was simply too optimistic in some scenarios. This could
avoid test flakes like:

Failed to resolve kafka-service DNS entry in pod empire-outpost-8888-544cdcd9b8-72hgl
Expected
    <*errors.errorString | 0xc001e05fd0>: {
        s: "unable to resolve DNS for service kafka-service in pod empire-outpost-8888-544cdcd9b8-72hgl: 30s timeout expired",
    }
to be nil
/home/jenkins/workspace/Cilium-PR-K8s-1.21-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/KafkaPolicies.go:157

In the above case the service did not have any backends yet, even when
the backend pod was up and running. Give k8s apiserver a bit more time
to converge.

            "status": {
                "loadBalancer": {}
            }

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/test-kafka-service-timeout branch from b8b824e to 1da0082 Compare July 12, 2021 10:04
@jrajahalme
Copy link
Member Author

rebased due to completely broken CI

@jrajahalme
Copy link
Member Author

test-me-please

@jrajahalme
Copy link
Member Author

test-1.21-4.9
timed out

@jrajahalme
Copy link
Member Author

test-1.21-4.9

@jrajahalme jrajahalme added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 14, 2021
@nebril nebril merged commit 507a249 into cilium:master Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI-improvement Topic or proposal to improve the Continuous Integration workflow ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants