Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci-clustermesh-upgrade: Increment timeout between rollouts to 5min #29560

Conversation

mhofstetter
Copy link
Member

@mhofstetter mhofstetter commented Dec 1, 2023

Currently, the ClusterMesh upgrade test sets an explicit timeout of 1min to wait for the Cilium Agent DaemonSet to become ready between the rollouts.

In some cases, the Pods aren't ready after 1min. Therefore, this commit increases the timeout to 5min.

I think the most important part is that we set an explicit timeout on the command kubectl rollout status - as the default is wait forever.

Spotted on main: https://github.com/cilium/cilium/actions/runs/7061558462/job/19223486069

Sysdump analysis: The last missing agent pod on cluster 1 was soon about to become ready. (Init 3/6)

Cluster 1 / Context 1:

NAMESPACE            NAME                                             READY   STATUS     RESTARTS   AGE     IP             NODE                     NOMINATED NODE   READINESS GATES
...
kube-system          cilium-7vbtb                                     1/1     Running    0          60s     172.18.0.2     cluster1-worker          <none>           <none>
kube-system          cilium-gbd6v                                     0/1     Init:3/6   0          60s     172.18.0.3     cluster1-control-plane   <none>           <none>...

Cluster2 / Context 2:

kube-system          cilium-fr7mn                                     1/1     Running   0          92s     172.18.0.5     cluster2-control-plane   <none>           <none>
kube-system          cilium-wndch                                     1/1     Running   0          92s     172.18.0.4     cluster2-worker          <none>           <none>

Currently, the ClusterMesh upgrade test sets an explicit timeout of 1min
to wait for the Cilium Agent DaemonSet to become ready between the rollouts.

In some cases, the Pods aren't ready after 1min. Therefore, this commit
increases the timeout to 5min.

I think the most important part is that we set an explicit timeout on the
command `kubectl rollout status` - as the default is wait forever.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
@mhofstetter mhofstetter added kind/enhancement This would improve or streamline existing functionality. area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels Dec 1, 2023
@mhofstetter mhofstetter requested review from a team as code owners December 1, 2023 16:32
@mhofstetter
Copy link
Member Author

/test

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me.

I guess that most of the delays comes from downloading (again) the cilium images. We could probably set the pullPolicy to IfNotPresent to avoid that and save a bit of time.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 1, 2023
@aanm aanm added this pull request to the merge queue Dec 1, 2023
Merged via the queue into cilium:main with commit c749169 Dec 1, 2023
61 of 62 checks passed
@mhofstetter mhofstetter deleted the pr/mhofstetter/ci-clustermesh-upgrade-increase-timeout branch December 2, 2023 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants