Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes startup watch may never terminate if there is a Pod error #258

Closed
Skazza94 opened this issue Jan 4, 2024 · 0 comments
Closed

Comments

@Skazza94
Copy link
Member

Skazza94 commented Jan 4, 2024

The current implementation of the KubernetesMachine._wait_machines_startup method continuously loops on watch events from list_namespaced_pod. In specific cases, such as critical Pod errors (like CNI errors), no further events are generated.

Consequently, the for loop runs forever, causing the program to hang indefinitely.

To resolve this issue, it is necessary to introduce a mechanism that breaks the loop after a defined threshold. Our approach involves utilizing threading.Timer to establish a 3-minute timer. This timer will be reset upon receiving each new event. However, if no events occur within the 3-minute interval, the callback will be triggered, signaling an error and terminating the program.

@Skazza94 Skazza94 self-assigned this Jan 4, 2024
@Skazza94 Skazza94 added this to the Release 3.7.1 milestone Jan 4, 2024
Skazza94 added a commit that referenced this issue Jan 4, 2024
@tcaiazzi tcaiazzi closed this as completed Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants