systemd segfault #1694
systemd segfault #1694
Comments
Does this segfault still happen on the latest Stable release? Can you provide the coredump? |
I turned off the auto upgrades because I ran into some bugs, but has anything related to this been fixed since then? Where are coredumps stored in CoreOS? |
It seems there's no coredump:
|
It's tough to tell since there is no coredump. A number of things have changed, so it's likely that something along the way addresses this failure. |
I got another crash and this time a coredump was produced:
Needless to say, having your PID 1 crash AND losing the ability to SSH into your machine because for some reason systemd is now handling the 22 port is a bad way to start your day. :) Also, I ran the update yesterday and could swear that I had upgraded to the latest stable (1185.3.0):
However I rebooted the machine because of the crash and it shows version 1122.2.0. Can it have reverted because of the crash? |
Can I provide additional help in debugging this issue? Should I file an issue with systemd? |
Potentially. The boot isn't marked "good" until a few minutes after it's running ( |
@crawford Where can I find debug symbols for the systemd binary? I'm very much interested in helping debug this problem. |
We split them when we build the OS. Unfortunately, they are not published at the moment, but we will be changing that. I tried your coredump against the debug symbols I generated, but GDB wasn't able to associate function names to the items in the backtrace. Either the stack was smashed or there is something wrong with the symbols I generated (I'm double checking that now). |
Which CoreOS version are you using? Do all 1185.x point releases use the same systemd binary? I'm almost positive I was not on 1122 at the time but you never know. |
As my co-worker pointed out, the stack is clearly messed up:
Could you try reproducing the failure on Beta or Alpha? That has systemd 231, which will go to Stable in a few weeks. |
CoreOS 1185 already has systemd 231:
I don't have other environments where I can try Beta or Alpha versions... Is there some logging that I can increase to help pinpoint the problem? How do I alter systemd's command line flags? |
Oh yes, ha. Could you try the current Stable and see if you are able to reproduce the failure? If you see the failure again, we can look into increasing the log level. |
The latest crash I had, the one which produced the core file, was with 1185.3.0. I don't see any systemd related changes comparing with the latest stable 1185.5.0. It was only after I rebooted the machine that it went back to 1122. The fact that I have disabled the update service had probably something to do with this. |
I see. I thought that coredump was from an older version of systemd. Either way, that stack has been corrupted. To increase the logging verbosity of system, you can add the following to LogLevel=debug |
I managed to catch a debug log of systemd crashing. This is a system running 1122.2.0, so systemd 229, but the crash seems similar when I upgrade to 231.
|
Relevant systemd issue: systemd/systemd#4869 |
According to the systemd folk, and I'm inclined to agree, I'm hitting this glibc 2.21 regression. There seems to be a patch but it has not been merged. Can CoreOS include it since upstream seems unable to? |
The glibc patch should be in the alpha tomorrow. |
Issue Report
Bug
Systemd crashes frequently on a number of Kubernetes nodes that run CoreOS.
CoreOS Version
Environment
Azure VM
Expected Behavior
Systemd should not crash.
Actual Behavior
Reproduction Steps
The text was updated successfully, but these errors were encountered: