-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dockerd lockup - AKS 1.11.5+1.11.7, kernel 4.15, with Moby 3.0.1+3.0.5, #838
Comments
The fix for https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021 is shipped in
|
No outages observed in the last 6 days running 4.15.0-1040-azure. The longest window between outages prior was 4 days. I want at least 2 weeks of uptime before closing this ticket as fixed. |
I have observed 2 weeks of uptime on 8 nodes without observation of the original symptoms since upgrading the AKS node kernel to 4.15.0-1040-azure. I am confident the kernel patch has resolved our problem. @jnoller: This GitHub issue can be closed. |
4.15.0-1040-azure+ is apparently not the default for new nodes following a scale-up event in (some subset of) AKS clusters. A new node added to an existing cluster on 2019-04-05 got linux kernel 4.15.0-1037-azure, complete with reproduction of this bug. Newer kernel versions were downloaded/installed, but the node was never rebooted (manually or otherwise). 10 days of uptime => boom. I will be adding more https://github.com/weaveworks/kured |
Known issue tracking bug.
User(s) have reported an intermittent issue where the dockerd daemon (Moby 3.0.1+3.0.5) will enter a hard lock, uninterruptible state rendering worker nodes unrecoverable until a forced reboot. This is due to a bug in the linux kernel. Links to the moby and ubuntu bugs is below
Please see the detailed bug reports on the Moby and Ubuntu repos
moby/moby#38750
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021
Thanks to alanjcastonguay for the support ticket and report that helped us chase the issue down.
The text was updated successfully, but these errors were encountered: