test: Bump timeout of service plumbing check #23439
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When restarting Cilium, we check a number of things to ensure it's ready, including that the kube-dns service is correctly plumbed (in the agent and in the datapath's maps).
This check is executed in a loop with a 5s timeout. All of the kube-dns checks, including that one, are executed in a loop with a 4min timeout.
To check the service plumbing, we shell out twice, to retrieve the retrieve the agent state and to dump the BPF map contents. These shelling out can take up to a few seconds, especially when running locally where we typically execute a
kubectl exec
inside an SSH command.As a result of those commands taking a few seconds to execute, the inner loop regularly times out at 5s. That means we retry until we get a runtime below 5s. What could have taken 7s now sometimes takes several 10s of seconds because we have to retry. Locally, this can get even worse and we sometimes hit the 4min timeout of the outer loop because the inner loop never succeeds in less than 5s.
To avoid this whole mess, we can simply bump the inner loop's timeout to 10s. As per the above, this should (counterintuitively) reduce the total runtime of the restart checks.