Kernel Panic vmware 2 sockets 1 core #1695

Closed
moserke opened this Issue Dec 5, 2016 · 7 comments

Comments

Projects
None yet
6 participants
@moserke

moserke commented Dec 5, 2016

Issue Report

Bug

CoreOS Version

1235.1.0

Environment

ESXi 5.5.0

Expected Behavior

Successful boot

Actual Behavior

Kernel boots into a panic loop. Screen shot of the panic attached. There is no log as it can't get the CPUs loaded to do any work.
screen shot 2016-12-05 at 9 53 54 am

Things successfully boot if you go to 1 core, 1 socket, or up to 4 socket, 4 cores. Also if you do 1 socket, 2 cores.

Reproduction Steps

This can be reproduced by:

  1. Deploy beta .ova as a template
  2. Deploy from template
  3. Set sockets to 2 and cores to 1
@moserke

This comment has been minimized.

Show comment
Hide comment
@moserke

moserke Dec 5, 2016

Release 1192.2.0 was the last update our hosts received before the kernel panic'ing started, which was also the last 4.7.x kernel before 4.8.x kernel. Seems like maybe there was something introduced in that kernel? Have also narrowed this down to be only when the VM has 2 vCPUs and 4GB of RAM. If I move the VM to 2GB or 6GB of RAM it will boot normally.

moserke commented Dec 5, 2016

Release 1192.2.0 was the last update our hosts received before the kernel panic'ing started, which was also the last 4.7.x kernel before 4.8.x kernel. Seems like maybe there was something introduced in that kernel? Have also narrowed this down to be only when the VM has 2 vCPUs and 4GB of RAM. If I move the VM to 2GB or 6GB of RAM it will boot normally.

@moserke

This comment has been minimized.

Show comment
Hide comment
@moserke

moserke Dec 8, 2016

The configuration of vCPUs to RAM and what causes the issue appears to be more random than at first thought. Sometimes it boots, sometimes it doesn't. Sometimes changing the configuration works, sometimes not. Every time a new OS releases we get hit by this because the OS won't update due to the panic and we get into reboot, update, panic, reboot, update, panic loops. Just curious if others are having this issue or if it is potentially unique to our environment.

moserke commented Dec 8, 2016

The configuration of vCPUs to RAM and what causes the issue appears to be more random than at first thought. Sometimes it boots, sometimes it doesn't. Sometimes changing the configuration works, sometimes not. Every time a new OS releases we get hit by this because the OS won't update due to the panic and we get into reboot, update, panic, reboot, update, panic loops. Just curious if others are having this issue or if it is potentially unique to our environment.

@Sebas-

This comment has been minimized.

Show comment
Hide comment
@Sebas-

Sebas- Jan 2, 2017

Experiencing the same issue (beta channel, 1235.1.0). pressing "Restart guest" button seems to "fix" it most of the time.

I have disabled the updating mechanism so it doesn't reboot :P

My vm config is 8gb mem with 2 virtual sockets and 1 core per socket.

--

Changed the VM to 1 virtual socket and 1 core per socket, power cycled the system a lot of times resuling in 0 kernel panics.

Changed the virtual sockets back to 2 and got a kernel panic right away.

Sebas- commented Jan 2, 2017

Experiencing the same issue (beta channel, 1235.1.0). pressing "Restart guest" button seems to "fix" it most of the time.

I have disabled the updating mechanism so it doesn't reboot :P

My vm config is 8gb mem with 2 virtual sockets and 1 core per socket.

--

Changed the VM to 1 virtual socket and 1 core per socket, power cycled the system a lot of times resuling in 0 kernel panics.

Changed the virtual sockets back to 2 and got a kernel panic right away.

@GJKrupa

This comment has been minimized.

Show comment
Hide comment
@GJKrupa

GJKrupa Jan 21, 2017

Same issue running 1235.2.0 under ESXi 6.5. It's sometimes taken more than 10 restarts to get a successful boot (2 sockets, 1 core per CPU). Unlike @moserke, I'm not getting successful boots with 6GB RAM.

GJKrupa commented Jan 21, 2017

Same issue running 1235.2.0 under ESXi 6.5. It's sometimes taken more than 10 restarts to get a successful boot (2 sockets, 1 core per CPU). Unlike @moserke, I'm not getting successful boots with 6GB RAM.

@Sebas-

This comment has been minimized.

Show comment
Hide comment
@Sebas-

Sebas- Jan 21, 2017

@GJKrupa my problem went away after forcing an update (booting with 1 cpu, 1 core, 1gb mem), then changing the vm settings back to the old state. Currently running 1248.4.0 beta, haven't seen the problem since the update.

Sebas- commented Jan 21, 2017

@GJKrupa my problem went away after forcing an update (booting with 1 cpu, 1 core, 1gb mem), then changing the vm settings back to the old state. Currently running 1248.4.0 beta, haven't seen the problem since the update.

@mattkaar

This comment has been minimized.

Show comment
Hide comment
@mattkaar

mattkaar Feb 4, 2017

I was able to upgrade to the latest beta version (1298.3.0) by changing to 1 cpu/1 core as @Sebas- suggested. But switching back to the original CoreOS OVA settings—2 cpu/1 core—causes it to kernel panic again.

I was able to stop the kernel panic in both cases (1248.4.0 and 1298.3.0) by upgrading the VM compatibility to hardware version 9 or higher. The CoreOS OVA is set to hardware version 7 by default.

mattkaar commented Feb 4, 2017

I was able to upgrade to the latest beta version (1298.3.0) by changing to 1 cpu/1 core as @Sebas- suggested. But switching back to the original CoreOS OVA settings—2 cpu/1 core—causes it to kernel panic again.

I was able to stop the kernel panic in both cases (1248.4.0 and 1298.3.0) by upgrading the VM compatibility to hardware version 9 or higher. The CoreOS OVA is set to hardware version 7 by default.

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Feb 15, 2017

Member

1235.4.0 and above have a 4.7 kernel, so the current stable release shouldn't have this problem. 1298.3.0 and 1313.0.0 are still affected. This should be fixed in the next alpha; see coreos/linux#40.

Member

bgilbert commented Feb 15, 2017

1235.4.0 and above have a 4.7 kernel, so the current stable release shouldn't have this problem. 1298.3.0 and 1313.0.0 are still affected. This should be fixed in the next alpha; see coreos/linux#40.

@bgilbert bgilbert closed this Feb 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment