Kernel panic on AWS #1690

Closed
crawford opened this Issue Dec 2, 2016 · 4 comments

Comments

@crawford
Member

crawford commented Dec 2, 2016

Issue Report

Bug

CoreOS Version

Started in 1214.0.0

Environment

PV instances in AWS (all regions).

Expected Behavior

Boot

Actual Behavior

[17350396.328981] general protection fault: 0000 [#1] SMP
[17350396.328998] Modules linked in:
[17350396.329008] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.8.4-coreos #1
[17350396.329017] task: ffff880025f15400 task.stack: ffff880025f20000
[17350396.329023] RIP: e030:[<ffffffff81014dc3>]  [<ffffffff81014dc3>] rapl_cpu_online+0x63/0x80
[17350396.329043] RSP: e02b:ffff880025f23e18  EFLAGS: 00010212
[17350396.329049] RAX: 0000000000000080 RBX: ffffffff81014d60 RCX: 0000000000000000
[17350396.329059] RDX: 0000000000000080 RSI: 0000000000000080 RDI: 0000000000000080
[17350396.329068] RBP: ffff880025f23e30 R08: 0000000000000001 R09: 0000000000000000
[17350396.329077] R10: 0000000000007ff0 R11: 0000000000000000 R12: 4d6f456a4c0a6a35
[17350396.329086] R13: 0000000000000000 R14: ffff88002620da40 R15: 0000000000000000
[17350396.329100] FS:  0000000000000000(0000) GS:ffff880026200000(0000) knlGS:0000000000000000
[17350396.329110] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[17350396.329116] CR2: ffffc90000137000 CR3: 0000000001a06000 CR4: 0000000000002660
[17350396.329126] Stack:
[17350396.329132]  ffffffff81014d60 0000000000000000 000000000000004d ffff880025f23e70
[17350396.329145]  ffffffff8107a63a ffff88002620da40 ffff88002620da40 0000000000000000
[17350396.329157]  ffffffff81a3f6c0 ffff8800262182c0 ffff880025f13db0 ffff880025f23e90
[17350396.329169] Call Trace:
[17350396.329177]  [<ffffffff81014d60>] ? rapl_cpu_offline+0xa0/0xa0
[17350396.329188]  [<ffffffff8107a63a>] cpuhp_invoke_callback+0x4a/0x100
[17350396.329197]  [<ffffffff8107a8e1>] cpuhp_thread_fun+0x41/0x100
[17350396.329208]  [<ffffffff8109cd35>] smpboot_thread_fn+0x105/0x160
[17350396.329216]  [<ffffffff8109cc30>] ? sort_range+0x30/0x30
[17350396.329224]  [<ffffffff81098f98>] kthread+0xd8/0xf0
[17350396.329234]  [<ffffffff815be6bf>] ret_from_fork+0x1f/0x40
[17350396.329241]  [<ffffffff81098ec0>] ? kthread_park+0x60/0x60
[17350396.329247] Code: 18 8b 02 4c 8b a4 ca 10 01 00 00 48 c7 c2 00 a1 00 00 48 01 c2 e8 ae 07 30 00 3b 05 6c c7 b0 00 7c 0e 3e 4c 0f ab 2d 0d 18 8b 02 <45> 89 6c 24 08 5b 31 c0 41 5c 41 5d 5d c3 66 2e 0f 1f 84 00 00 
[17350396.329309] RIP  [<ffffffff81014dc3>] rapl_cpu_online+0x63/0x80
[17350396.329319]  RSP <ffff880025f23e18>
[17350396.329331] ---[ end trace 5056dc771de3e52b ]---
[17350396.329337] Kernel panic - not syncing: Fatal exception

Reproduction Steps

  1. Boot a PV instance (e.g. ami-04a90064)

Other Information

Might be related to https://lkml.org/lkml/2016/10/25/313.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Dec 13, 2016

Member

This was fixed in Linux 4.9 (coreos/linux@a6a198b).

Member

crawford commented Dec 13, 2016

This was fixed in Linux 4.9 (coreos/linux@a6a198b).

@mischief

This comment has been minimized.

Show comment
Hide comment
@mischief

mischief Dec 17, 2016

this is not fixed. i was able to reproduce this with an m3.large instance in us-west-1, ami-d08ddbb0.

Xen Minimal OS!
  start_info: 0x18ae000(VA)
    nr_pages: 0x1e0000
  shared_inf: 0x7dde1000(MA)
     pt_base: 0x18b1000(VA)
nr_pt_frames: 0x11
    mfn_list: 0x9ae000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: root=/dev/sda ro 4
  stack:      0x96d840-0x98d840
MM: Init
      _text: 0x0(VA)
     _etext: 0x7dc7d(VA)
   _erodata: 0x9a000(VA)
     _edata: 0x9fce0(VA)
stack start: 0x96d840(VA)
       _end: 0x9ade40(VA)
  start_pfn: 18c5
    max_pfn: 1e0000
Mapping memory range 0x1c00000 - 0x1e0000000
setting 0x0-0x9a000 readonly
skipped 0x1000
MM: Initialise page allocator for 27be000(27be000)-1e0000000(1e0000000)
MM: done
Demand map pfns at 1e0001000-21e0001000.
Heap resides at 21e0002000-41e0002000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0x1e0001000.
Initialising scheduler
Thread "Idle": pointer: 0x21e0002050, stack: 0x3710000
Thread "xenstore": pointer: 0x21e0002800, stack: 0x3720000
xenbus initialised on irq 1 mfn 0x10b112f
Thread "shutdown": pointer: 0x21e0002fb0, stack: 0x3730000
Dummy main: start_info=0x98d940
Thread "main": pointer: 0x21e0003760, stack: 0x3740000
"main" "root=/dev/sda" "ro" "4" 
vbd 2048 is hd0
******************* BLKFRONT for device/vbd/2048 **********


backend at /local/domain/0/backend/vbd/3204/2048
16777216 sectors of 512 bytes
**************************
vbd 2064 is hd1
******************* BLKFRONT for device/vbd/2064 **********


backend at /local/domain/0/backend/vbd/3204/2064
62899200 sectors of 512 bytes
**************************
�[H�[J

    GNU GRUB  version 0.97  (7864320K lower / 0K upper memory)



�[m�[4;2H+-------------------------------------------------------------------------+�[5;2H|�[5;76H|�[6;2H|�[6;76H|�[7;2H|�[7;76H|�[8;2H|�[8;76H|�[9;2H|�[9;76H|�[10;2H|�[10;76H|�[11;2H|�[11;76H|�[12;2H|�[12;76H|�[13;2H|�[13;76H|�[14;2H|�[14;76H|�[15;2H|�[15;76H|�[16;2H|�[16;76H|�[17;2H+-------------------------------------------------------------------------+�[m

    Use the ^ and v keys to select which entry is highlighted.

    Press enter to boot the selected OS, 'e' to edit the

    commands before booting, or 'c' for a command-line.�[5;78H �[m�[7m�[5;3H CoreOS GRUB2                                                            �[5;75H�[m�[m�[6;3H                                                                         �[6;75H�[m�[m�[7;3H                                                                         �[7;75H�[m�[m�[8;3H                                                                         �[8;75H�[m�[m�[9;3H                                                                         �[9;75H�[m�[m�[10;3H                                                                         �[10;75H�[m�[m�[11;3H                                                                         �[11;75H�[m�[m�[12;3H                                                                         �[12;75H�[m�[m�[13;3H                                                                         �[13;75H�[m�[m�[14;3H                                                                         �[14;75H�[m�[m�[15;3H                                                                         �[15;75H�[m�[m�[16;3H                                                                         �[16;75H�[m�[16;78H �[5;75H�[H�[J  Booting 'CoreOS GRUB2'



root    (hd0,0)

 Filesystem type is fat, partition type 0xc

kernel  /xen/pvboot-x86_64.elf



============= Init TPM Front ================
Tpmfront:Error Unable to read device/vtpm/0/backend-id during tpmfront initialization! error = ENOENT
Tpmfront:Info Shutting down tpmfront
close blk: backend=/local/domain/0/backend/vbd/3204/2048 node=device/vbd/2048
close blk: backend=/local/domain/0/backend/vbd/3204/2064 node=device/vbd/2064
�[30m�[47mWelcome to GRUB!


�[37m�[40m�[37m�[40m�[37m�[40mcolor_highlight=black/light-gray

color_normal=light-gray/black

feature_200_final=y

feature_all_video_module=y

feature_chainloader_bpb=y

feature_default_font_path=y

feature_menuentry_id=y

feature_menuentry_options=y

feature_nativedisk_cmd=y

feature_ntldr=y

feature_platform_search_hint=y

feature_timeout_style=y

grub_cpu=x86_64

grub_platform=xen

lang=

locale_dir=

pager=

prefix=(memdisk)

root=xen/sda,gpt1

secondary_locale_dir=

�[?25l�[37m�[40m�[H�[J�[1;1H�[2;27HGNU GRUB  version 2.02~beta2


�[37m�[40m�[4;2H+----------------------------------------------------------------------------+�[5;2H|�[5;79H|�[6;2H|�[6;79H|�[7;2H|�[7;79H|�[8;2H|�[8;79H|�[9;2H|�[9;79H|�[10;2H|�[10;79H|�[11;2H|�[11;79H|�[12;2H|�[12;79H|�[13;2H|�[13;79H|�[14;2H|�[14;79H|�[15;2H|�[15;79H|�[16;2H|�[16;79H|�[17;2H+----------------------------------------------------------------------------+�[37m�[40m�[18;2H�[19;2H�[37m�[40m     Use the ^ and v keys to select which entry is highlighted.          

      Press enter to boot the selected OS, `e' to edit the commands       

      before booting or `c' for a command-line.                           �[5;80H �[30m�[47m�[5;3H*CoreOS default                                                             �[37m�[40m�[5;78H�[37m�[40m�[37m�[40m�[6;3H CoreOS USR-A                                                               �[37m�[40m�[6;78H�[37m�[40m�[37m�[40m�[7;3H CoreOS USR-B                                                               �[37m�[40m�[7;78H�[37m�[40m�[37m�[40m�[8;3H                                                                            �[37m�[40m�[8;78H�[37m�[40m�[37m�[40m�[9;3H                                                                            �[37m�[40m�[9;78H�[37m�[40m�[37m�[40m�[10;3H                                                                            �[37m�[40m�[10;78H�[37m�[40m�[37m�[40m�[11;3H                                                                            �[37m�[40m�[11;78H�[37m�[40m�[37m�[40m�[12;3H                                                                            �[37m�[40m�[12;78H�[37m�[40m�[37m�[40m�[13;3H                                                                            �[37m�[40m�[13;78H�[37m�[40m�[37m�[40m�[14;3H                                                                            �[37m�[40m�[14;78H�[37m�[40m�[37m�[40m�[15;3H                                                                            �[37m�[40m�[15;78H�[37m�[40m�[37m�[40m�[16;3H                                                                            �[37m�[40m�[16;78H�[37m�[40m�[16;80H �[5;78H�[22;1H   The highlighted entry will be executed automatically in 1s.                 �[5;78H�[22;1H   The highlighted entry will be executed automatically in 0s.                 �[5;78H�[?25h�[H�[J�[1;1H�[H�[J�[1;1H  Booting `CoreOS default'


[    0.000000] Linux version 4.9.0-coreos (jenkins@localhost) (gcc version 4.9.3 (Gentoo Hardened 4.9.3 p1.5, pie-0.6.4) ) #1 SMP Wed Dec 14 23:31:02 UTC 2016
[    0.000000] Command line: mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132 rootflags=rw mount.usrflags=ro consoleblank=0 root=LABEL=ROOT console=hvc0 coreos.first_boot=1 coreos.randomize_disk_guid=00000000-0000-0000-0000-000000000001 coreos.oem.id=ec2 modprobe.blacklist=xen_fbfront net.ifnames=0 verity.usrhash=50eef95b216020082c7b1bb7261fd4817c6eb7bf2dfc2d0201a8ced5b854600a
[    0.000000] x86/fpu: Legacy x87 FPU detected.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] Released 0 page(s)
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x00000001e07fffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] MPS support code is not built-in.
[    0.000000] Using acpi=off or acpi=noirq or pci=noacpi may have problem
[    0.000000] DMI not present or invalid.
[    0.000000] Hypervisor detected: Xen
[    0.000000] e820: last_pfn = 0x1e0800 max_arch_pfn = 0x400000000
[    0.000000] MTRR: Disabled
[    0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC  
[    0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000
[    0.000000] NUMA turned off
[    0.000000] Faking a node at [mem 0x0000000000000000-0x00000001e07fffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x1df121000-0x1df126fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x00000001e07fffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x00000001e07fffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x00000001e07fffff]
[    0.000000] p2m virtual area at ffffc90000000000, size is 40000000
[    0.000000] Remapped 0 page(s)
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] APIC: switched to apic NOOP
[    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] e820: cannot find a gap in the 32bit address range
[    0.000000] e820: PCI devices with unassigned 32bit BARs may break!
[    0.000000] e820: [mem 0x1e0900000-0x1e0cfffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.2.amazon (preserve-AD)
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.000000] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:2 nr_node_ids:1
[    0.000000] percpu: Embedded 35 pages/cpu @ffff8801d6400000 s103768 r8192 d31400 u1048576
[    0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes)
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1937257
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: rootflags=rw mount.usrflags=ro mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132 rootflags=rw mount.usrflags=ro consoleblank=0 root=LABEL=ROOT console=hvc0 coreos.first_boot=1 coreos.randomize_disk_guid=00000000-0000-0000-0000-000000000001 coreos.oem.id=ec2 modprobe.blacklist=xen_fbfront net.ifnames=0 verity.usrhash=50eef95b216020082c7b1bb7261fd4817c6eb7bf2dfc2d0201a8ced5b854600a
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 7664600K/7872124K available (5949K kernel code, 1213K rwdata, 2456K rodata, 30472K init, 832K bss, 207524K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	Build-time adjustment of leaf fanout to 64.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=2.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=2
[    0.000000] Using NULL legacy PIC
[    0.000000] NR_IRQS:8448 nr_irqs:48 0
[    0.000000] xen:events: Using 2-level ABI
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [hvc0] enabled
[    0.000000] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] installing Xen timer for CPU 0
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2500.013 MHz processor
[    0.002000] Calibrating delay loop (skipped), value calculated using timer frequency.. 5000.04 BogoMIPS (lpj=2500020)
[    0.002000] pid_max: default: 32768 minimum: 301
[    0.002000] Security Framework initialized
[    0.002000] SELinux:  Initializing.
[    0.002000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.003807] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.004540] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.004563] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.004931] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    0.004938] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
[    0.004945] CPU: Physical Processor ID: 0
[    0.004948] CPU: Processor Core ID: 0
[    0.004954] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: ffff CPUID: 22
[    0.004959] [Firmware Bug]: CPU0: Using firmware package id 2047 instead of 0
[    0.004965] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
[    0.004969] Last level dTLB entries: 4KB 512, 2MB 0, 4MB 0, 1GB 4
[    0.032246] ftrace: allocating 23589 entries in 93 pages
[    0.037146] cpu 0 spinlock event irq 1
[    0.037161] smpboot: Max logical packages: 1
[    0.037168] VPMU disabled by hypervisor.
[    0.037182] Performance Events: unsupported p6 CPU model 62 no PMU driver, software events only.
[    0.037717] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.037723] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.037862] installing Xen timer for CPU 1
[    0.037888] SMP alternatives: switching to SMP code
[    0.002000] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: ffff CPUID: 22
[    0.002000] [Firmware Bug]: CPU1: Using firmware package id 2047 instead of 0
[    0.002000] smpboot: APIC(ffff) Converting physical 2047 to logical package 0
[    0.059039] cpu 1 spinlock event irq 13
[    0.060007] x86: Booted up 1 node, 2 CPUs
[    0.060615] devtmpfs: initialized
[    0.060615] x86/mm: Memory block size: 128MB
[    0.062101] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.062115] pinctrl core: initialized pinctrl subsystem
[    0.062171] NET: Registered protocol family 16
[    0.062187] xen:grant_table: Grant tables using version 1 layout
[    0.062199] Grant table initialized
[    0.063130] dca service started, version 1.12.1
[    0.063432] PCI: setting up Xen PCI frontend stub
[    0.067049] ACPI: Interpreter disabled.
[    0.067049] xen:balloon: Initialising balloon driver
[    0.071017] xen_balloon: Initialising balloon driver
[    0.071051] random: fast init done
[    0.071051] vgaarb: loaded
[    0.071051] dmi: Firmware registration failed.
[    0.071051] PCI: System does not support PCI
[    0.071051] PCI: System does not support PCI
[    0.071294] clocksource: Switched to clocksource xen
[    0.077804] VFS: Disk quotas dquot_6.6.0
[    0.077825] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.077846] hugetlbfs: disabling because there are no supported hugepage sizes
[    0.077877] pnp: PnP ACPI: disabled
[    0.080568] NET: Registered protocol family 2
[    0.080781] TCP established hash table entries: 65536 (order: 7, 524288 bytes)
[    0.080961] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.081135] TCP: Hash tables configured (established 65536 bind 65536)
[    0.081198] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[    0.081237] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[    0.081298] NET: Registered protocol family 1
[    0.484465] BUG: unable to handle kernel paging request at 000000000001e2cb
[    0.484480] IP: [<ffffffff81014ec3>] rapl_cpu_online+0x63/0x80
[    0.484491] PGD 0 [    0.484493] 
[    0.484497] Oops: 0002 [#1] SMP
[    0.484501] Modules linked in:
[    0.484507] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.9.0-coreos #1
[    0.484511] task: ffff8801d55e8000 task.stack: ffffc90040cac000
[    0.484515] RIP: e030:[<ffffffff81014ec3>]  [<ffffffff81014ec3>] rapl_cpu_online+0x63/0x80
[    0.484523] RSP: e02b:ffffc90040cafdf8  EFLAGS: 00010216
[    0.484526] RAX: 0000000000000080 RBX: 0000000000000000 RCX: 0000000000000000
[    0.484530] RDX: 0000000000000080 RSI: 0000000000000080 RDI: 0000000000000080
[    0.484534] RBP: ffffc90040cafe10 R08: 0000000000000001 R09: 0000000000000000
[    0.484539] R10: 0000000000007ff0 R11: 0000000000000000 R12: 000000000001e2c3
[    0.484543] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.484551] FS:  0000000000000000(0000) GS:ffff8801d6400000(0000) knlGS:0000000000000000
[    0.484556] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.484560] CR2: 000000000001e2cb CR3: 0000000001a07000 CR4: 0000000000002660
[    0.484565] Stack:
[    0.484568]  0000000000000000 000000000000006a ffffffff81014e60 ffffc90040cafe70
[    0.484576]  ffffffff8107b1c9 ffffc90040cafe78 ffffffff00000001 ffffffff81a410d0
[    0.484583]  ffff8801d640da40 00000000815ca2aa ffff8801d640da40 0000000000000000
[    0.484590] Call Trace:
[    0.484596]  [<ffffffff81014e60>] ? rapl_cpu_offline+0xa0/0xa0
[    0.484603]  [<ffffffff8107b1c9>] cpuhp_invoke_callback+0x89/0x3f0
[    0.484608]  [<ffffffff8107b7e6>] cpuhp_thread_fun+0x46/0xf0
[    0.484613]  [<ffffffff8109e6aa>] smpboot_thread_fn+0x10a/0x160
[    0.484618]  [<ffffffff8109e5a0>] ? sort_range+0x30/0x30
[    0.484623]  [<ffffffff8109a019>] kthread+0xd9/0xf0
[    0.484627]  [<ffffffff81099f40>] ? kthread_park+0x60/0x60
[    0.484634]  [<ffffffff815ca995>] ret_from_fork+0x25/0x30
[    0.484638] Code: b7 8e 02 4c 8b a4 ca 10 01 00 00 48 c7 c2 00 a1 00 00 48 01 c2 e8 4e 39 30 00 3b 05 ec 0e b1 00 7c 0e f0 4c 0f ab 2d 0d b7 8e 02 <45> 89 6c 24 08 5b 31 c0 41 5c 41 5d 5d c3 66 2e 0f 1f 84 00 00 
[    0.484680] RIP  [<ffffffff81014ec3>] rapl_cpu_online+0x63/0x80
[    0.484685]  RSP <ffffc90040cafdf8>
[    0.484689] CR2: 000000000001e2cb
[    0.484696] ---[ end trace 366b19d11bca2521 ]---
[    0.484700] Kernel panic - not syncing: Fatal exception

this is not fixed. i was able to reproduce this with an m3.large instance in us-west-1, ami-d08ddbb0.

Xen Minimal OS!
  start_info: 0x18ae000(VA)
    nr_pages: 0x1e0000
  shared_inf: 0x7dde1000(MA)
     pt_base: 0x18b1000(VA)
nr_pt_frames: 0x11
    mfn_list: 0x9ae000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: root=/dev/sda ro 4
  stack:      0x96d840-0x98d840
MM: Init
      _text: 0x0(VA)
     _etext: 0x7dc7d(VA)
   _erodata: 0x9a000(VA)
     _edata: 0x9fce0(VA)
stack start: 0x96d840(VA)
       _end: 0x9ade40(VA)
  start_pfn: 18c5
    max_pfn: 1e0000
Mapping memory range 0x1c00000 - 0x1e0000000
setting 0x0-0x9a000 readonly
skipped 0x1000
MM: Initialise page allocator for 27be000(27be000)-1e0000000(1e0000000)
MM: done
Demand map pfns at 1e0001000-21e0001000.
Heap resides at 21e0002000-41e0002000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0x1e0001000.
Initialising scheduler
Thread "Idle": pointer: 0x21e0002050, stack: 0x3710000
Thread "xenstore": pointer: 0x21e0002800, stack: 0x3720000
xenbus initialised on irq 1 mfn 0x10b112f
Thread "shutdown": pointer: 0x21e0002fb0, stack: 0x3730000
Dummy main: start_info=0x98d940
Thread "main": pointer: 0x21e0003760, stack: 0x3740000
"main" "root=/dev/sda" "ro" "4" 
vbd 2048 is hd0
******************* BLKFRONT for device/vbd/2048 **********


backend at /local/domain/0/backend/vbd/3204/2048
16777216 sectors of 512 bytes
**************************
vbd 2064 is hd1
******************* BLKFRONT for device/vbd/2064 **********


backend at /local/domain/0/backend/vbd/3204/2064
62899200 sectors of 512 bytes
**************************
�[H�[J

    GNU GRUB  version 0.97  (7864320K lower / 0K upper memory)



�[m�[4;2H+-------------------------------------------------------------------------+�[5;2H|�[5;76H|�[6;2H|�[6;76H|�[7;2H|�[7;76H|�[8;2H|�[8;76H|�[9;2H|�[9;76H|�[10;2H|�[10;76H|�[11;2H|�[11;76H|�[12;2H|�[12;76H|�[13;2H|�[13;76H|�[14;2H|�[14;76H|�[15;2H|�[15;76H|�[16;2H|�[16;76H|�[17;2H+-------------------------------------------------------------------------+�[m

    Use the ^ and v keys to select which entry is highlighted.

    Press enter to boot the selected OS, 'e' to edit the

    commands before booting, or 'c' for a command-line.�[5;78H �[m�[7m�[5;3H CoreOS GRUB2                                                            �[5;75H�[m�[m�[6;3H                                                                         �[6;75H�[m�[m�[7;3H                                                                         �[7;75H�[m�[m�[8;3H                                                                         �[8;75H�[m�[m�[9;3H                                                                         �[9;75H�[m�[m�[10;3H                                                                         �[10;75H�[m�[m�[11;3H                                                                         �[11;75H�[m�[m�[12;3H                                                                         �[12;75H�[m�[m�[13;3H                                                                         �[13;75H�[m�[m�[14;3H                                                                         �[14;75H�[m�[m�[15;3H                                                                         �[15;75H�[m�[m�[16;3H                                                                         �[16;75H�[m�[16;78H �[5;75H�[H�[J  Booting 'CoreOS GRUB2'



root    (hd0,0)

 Filesystem type is fat, partition type 0xc

kernel  /xen/pvboot-x86_64.elf



============= Init TPM Front ================
Tpmfront:Error Unable to read device/vtpm/0/backend-id during tpmfront initialization! error = ENOENT
Tpmfront:Info Shutting down tpmfront
close blk: backend=/local/domain/0/backend/vbd/3204/2048 node=device/vbd/2048
close blk: backend=/local/domain/0/backend/vbd/3204/2064 node=device/vbd/2064
�[30m�[47mWelcome to GRUB!


�[37m�[40m�[37m�[40m�[37m�[40mcolor_highlight=black/light-gray

color_normal=light-gray/black

feature_200_final=y

feature_all_video_module=y

feature_chainloader_bpb=y

feature_default_font_path=y

feature_menuentry_id=y

feature_menuentry_options=y

feature_nativedisk_cmd=y

feature_ntldr=y

feature_platform_search_hint=y

feature_timeout_style=y

grub_cpu=x86_64

grub_platform=xen

lang=

locale_dir=

pager=

prefix=(memdisk)

root=xen/sda,gpt1

secondary_locale_dir=

�[?25l�[37m�[40m�[H�[J�[1;1H�[2;27HGNU GRUB  version 2.02~beta2


�[37m�[40m�[4;2H+----------------------------------------------------------------------------+�[5;2H|�[5;79H|�[6;2H|�[6;79H|�[7;2H|�[7;79H|�[8;2H|�[8;79H|�[9;2H|�[9;79H|�[10;2H|�[10;79H|�[11;2H|�[11;79H|�[12;2H|�[12;79H|�[13;2H|�[13;79H|�[14;2H|�[14;79H|�[15;2H|�[15;79H|�[16;2H|�[16;79H|�[17;2H+----------------------------------------------------------------------------+�[37m�[40m�[18;2H�[19;2H�[37m�[40m     Use the ^ and v keys to select which entry is highlighted.          

      Press enter to boot the selected OS, `e' to edit the commands       

      before booting or `c' for a command-line.                           �[5;80H �[30m�[47m�[5;3H*CoreOS default                                                             �[37m�[40m�[5;78H�[37m�[40m�[37m�[40m�[6;3H CoreOS USR-A                                                               �[37m�[40m�[6;78H�[37m�[40m�[37m�[40m�[7;3H CoreOS USR-B                                                               �[37m�[40m�[7;78H�[37m�[40m�[37m�[40m�[8;3H                                                                            �[37m�[40m�[8;78H�[37m�[40m�[37m�[40m�[9;3H                                                                            �[37m�[40m�[9;78H�[37m�[40m�[37m�[40m�[10;3H                                                                            �[37m�[40m�[10;78H�[37m�[40m�[37m�[40m�[11;3H                                                                            �[37m�[40m�[11;78H�[37m�[40m�[37m�[40m�[12;3H                                                                            �[37m�[40m�[12;78H�[37m�[40m�[37m�[40m�[13;3H                                                                            �[37m�[40m�[13;78H�[37m�[40m�[37m�[40m�[14;3H                                                                            �[37m�[40m�[14;78H�[37m�[40m�[37m�[40m�[15;3H                                                                            �[37m�[40m�[15;78H�[37m�[40m�[37m�[40m�[16;3H                                                                            �[37m�[40m�[16;78H�[37m�[40m�[16;80H �[5;78H�[22;1H   The highlighted entry will be executed automatically in 1s.                 �[5;78H�[22;1H   The highlighted entry will be executed automatically in 0s.                 �[5;78H�[?25h�[H�[J�[1;1H�[H�[J�[1;1H  Booting `CoreOS default'


[    0.000000] Linux version 4.9.0-coreos (jenkins@localhost) (gcc version 4.9.3 (Gentoo Hardened 4.9.3 p1.5, pie-0.6.4) ) #1 SMP Wed Dec 14 23:31:02 UTC 2016
[    0.000000] Command line: mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132 rootflags=rw mount.usrflags=ro consoleblank=0 root=LABEL=ROOT console=hvc0 coreos.first_boot=1 coreos.randomize_disk_guid=00000000-0000-0000-0000-000000000001 coreos.oem.id=ec2 modprobe.blacklist=xen_fbfront net.ifnames=0 verity.usrhash=50eef95b216020082c7b1bb7261fd4817c6eb7bf2dfc2d0201a8ced5b854600a
[    0.000000] x86/fpu: Legacy x87 FPU detected.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] Released 0 page(s)
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x00000001e07fffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] MPS support code is not built-in.
[    0.000000] Using acpi=off or acpi=noirq or pci=noacpi may have problem
[    0.000000] DMI not present or invalid.
[    0.000000] Hypervisor detected: Xen
[    0.000000] e820: last_pfn = 0x1e0800 max_arch_pfn = 0x400000000
[    0.000000] MTRR: Disabled
[    0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC  
[    0.000000] e820: last_pfn = 0x100000 max_arch_pfn = 0x400000000
[    0.000000] NUMA turned off
[    0.000000] Faking a node at [mem 0x0000000000000000-0x00000001e07fffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x1df121000-0x1df126fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x00000001e07fffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x00000001e07fffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x00000001e07fffff]
[    0.000000] p2m virtual area at ffffc90000000000, size is 40000000
[    0.000000] Remapped 0 page(s)
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] APIC: switched to apic NOOP
[    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] e820: cannot find a gap in the 32bit address range
[    0.000000] e820: PCI devices with unassigned 32bit BARs may break!
[    0.000000] e820: [mem 0x1e0900000-0x1e0cfffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.2.amazon (preserve-AD)
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.000000] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:2 nr_node_ids:1
[    0.000000] percpu: Embedded 35 pages/cpu @ffff8801d6400000 s103768 r8192 d31400 u1048576
[    0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes)
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1937257
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: rootflags=rw mount.usrflags=ro mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132 rootflags=rw mount.usrflags=ro consoleblank=0 root=LABEL=ROOT console=hvc0 coreos.first_boot=1 coreos.randomize_disk_guid=00000000-0000-0000-0000-000000000001 coreos.oem.id=ec2 modprobe.blacklist=xen_fbfront net.ifnames=0 verity.usrhash=50eef95b216020082c7b1bb7261fd4817c6eb7bf2dfc2d0201a8ced5b854600a
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 7664600K/7872124K available (5949K kernel code, 1213K rwdata, 2456K rodata, 30472K init, 832K bss, 207524K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	Build-time adjustment of leaf fanout to 64.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=2.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=2
[    0.000000] Using NULL legacy PIC
[    0.000000] NR_IRQS:8448 nr_irqs:48 0
[    0.000000] xen:events: Using 2-level ABI
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [hvc0] enabled
[    0.000000] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] installing Xen timer for CPU 0
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2500.013 MHz processor
[    0.002000] Calibrating delay loop (skipped), value calculated using timer frequency.. 5000.04 BogoMIPS (lpj=2500020)
[    0.002000] pid_max: default: 32768 minimum: 301
[    0.002000] Security Framework initialized
[    0.002000] SELinux:  Initializing.
[    0.002000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.003807] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.004540] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.004563] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.004931] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    0.004938] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
[    0.004945] CPU: Physical Processor ID: 0
[    0.004948] CPU: Processor Core ID: 0
[    0.004954] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: ffff CPUID: 22
[    0.004959] [Firmware Bug]: CPU0: Using firmware package id 2047 instead of 0
[    0.004965] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
[    0.004969] Last level dTLB entries: 4KB 512, 2MB 0, 4MB 0, 1GB 4
[    0.032246] ftrace: allocating 23589 entries in 93 pages
[    0.037146] cpu 0 spinlock event irq 1
[    0.037161] smpboot: Max logical packages: 1
[    0.037168] VPMU disabled by hypervisor.
[    0.037182] Performance Events: unsupported p6 CPU model 62 no PMU driver, software events only.
[    0.037717] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.037723] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.037862] installing Xen timer for CPU 1
[    0.037888] SMP alternatives: switching to SMP code
[    0.002000] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: ffff CPUID: 22
[    0.002000] [Firmware Bug]: CPU1: Using firmware package id 2047 instead of 0
[    0.002000] smpboot: APIC(ffff) Converting physical 2047 to logical package 0
[    0.059039] cpu 1 spinlock event irq 13
[    0.060007] x86: Booted up 1 node, 2 CPUs
[    0.060615] devtmpfs: initialized
[    0.060615] x86/mm: Memory block size: 128MB
[    0.062101] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.062115] pinctrl core: initialized pinctrl subsystem
[    0.062171] NET: Registered protocol family 16
[    0.062187] xen:grant_table: Grant tables using version 1 layout
[    0.062199] Grant table initialized
[    0.063130] dca service started, version 1.12.1
[    0.063432] PCI: setting up Xen PCI frontend stub
[    0.067049] ACPI: Interpreter disabled.
[    0.067049] xen:balloon: Initialising balloon driver
[    0.071017] xen_balloon: Initialising balloon driver
[    0.071051] random: fast init done
[    0.071051] vgaarb: loaded
[    0.071051] dmi: Firmware registration failed.
[    0.071051] PCI: System does not support PCI
[    0.071051] PCI: System does not support PCI
[    0.071294] clocksource: Switched to clocksource xen
[    0.077804] VFS: Disk quotas dquot_6.6.0
[    0.077825] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.077846] hugetlbfs: disabling because there are no supported hugepage sizes
[    0.077877] pnp: PnP ACPI: disabled
[    0.080568] NET: Registered protocol family 2
[    0.080781] TCP established hash table entries: 65536 (order: 7, 524288 bytes)
[    0.080961] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.081135] TCP: Hash tables configured (established 65536 bind 65536)
[    0.081198] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[    0.081237] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[    0.081298] NET: Registered protocol family 1
[    0.484465] BUG: unable to handle kernel paging request at 000000000001e2cb
[    0.484480] IP: [<ffffffff81014ec3>] rapl_cpu_online+0x63/0x80
[    0.484491] PGD 0 [    0.484493] 
[    0.484497] Oops: 0002 [#1] SMP
[    0.484501] Modules linked in:
[    0.484507] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.9.0-coreos #1
[    0.484511] task: ffff8801d55e8000 task.stack: ffffc90040cac000
[    0.484515] RIP: e030:[<ffffffff81014ec3>]  [<ffffffff81014ec3>] rapl_cpu_online+0x63/0x80
[    0.484523] RSP: e02b:ffffc90040cafdf8  EFLAGS: 00010216
[    0.484526] RAX: 0000000000000080 RBX: 0000000000000000 RCX: 0000000000000000
[    0.484530] RDX: 0000000000000080 RSI: 0000000000000080 RDI: 0000000000000080
[    0.484534] RBP: ffffc90040cafe10 R08: 0000000000000001 R09: 0000000000000000
[    0.484539] R10: 0000000000007ff0 R11: 0000000000000000 R12: 000000000001e2c3
[    0.484543] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.484551] FS:  0000000000000000(0000) GS:ffff8801d6400000(0000) knlGS:0000000000000000
[    0.484556] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.484560] CR2: 000000000001e2cb CR3: 0000000001a07000 CR4: 0000000000002660
[    0.484565] Stack:
[    0.484568]  0000000000000000 000000000000006a ffffffff81014e60 ffffc90040cafe70
[    0.484576]  ffffffff8107b1c9 ffffc90040cafe78 ffffffff00000001 ffffffff81a410d0
[    0.484583]  ffff8801d640da40 00000000815ca2aa ffff8801d640da40 0000000000000000
[    0.484590] Call Trace:
[    0.484596]  [<ffffffff81014e60>] ? rapl_cpu_offline+0xa0/0xa0
[    0.484603]  [<ffffffff8107b1c9>] cpuhp_invoke_callback+0x89/0x3f0
[    0.484608]  [<ffffffff8107b7e6>] cpuhp_thread_fun+0x46/0xf0
[    0.484613]  [<ffffffff8109e6aa>] smpboot_thread_fn+0x10a/0x160
[    0.484618]  [<ffffffff8109e5a0>] ? sort_range+0x30/0x30
[    0.484623]  [<ffffffff8109a019>] kthread+0xd9/0xf0
[    0.484627]  [<ffffffff81099f40>] ? kthread_park+0x60/0x60
[    0.484634]  [<ffffffff815ca995>] ret_from_fork+0x25/0x30
[    0.484638] Code: b7 8e 02 4c 8b a4 ca 10 01 00 00 48 c7 c2 00 a1 00 00 48 01 c2 e8 4e 39 30 00 3b 05 ec 0e b1 00 7c 0e f0 4c 0f ab 2d 0d b7 8e 02 <45> 89 6c 24 08 5b 31 c0 41 5c 41 5d 5d c3 66 2e 0f 1f 84 00 00 
[    0.484680] RIP  [<ffffffff81014ec3>] rapl_cpu_online+0x63/0x80
[    0.484685]  RSP <ffffc90040cafdf8>
[    0.484689] CR2: 000000000001e2cb
[    0.484696] ---[ end trace 366b19d11bca2521 ]---
[    0.484700] Kernel panic - not syncing: Fatal exception
@krobertson

This comment has been minimized.

Show comment
Hide comment
@krobertson

krobertson Dec 23, 2016

I have been experiencing this on vSphere as well with 1235.2.0.

I have been experiencing this on vSphere as well with 1235.2.0.

@crawford crawford referenced this issue in coreos/coreos-overlay Dec 30, 2016

Merged

sys-kernel/coreos-*: revert to 4.7.3 #2337

@crawford crawford removed their assignment Jan 5, 2017

@crawford crawford modified the milestones: Container Linux Alpha 1313.0.0, CoreOS Alpha 1300.0.0 Jan 25, 2017

@crawford crawford modified the milestones: Container Linux Alpha 1313.0.0, Container Linux Alpha 1326.0.0 Feb 2, 2017

@bgilbert

This comment has been minimized.

Show comment
Hide comment
Member

bgilbert commented Feb 14, 2017

Fixed by coreos/linux#40.

@bgilbert bgilbert closed this Feb 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment