Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node VMs (1.9.6) "NotReady" after restart: update-grub corrupts /boot/grub/grub.cfg #412

Closed
svollath opened this issue Jun 6, 2018 · 6 comments

Comments

@svollath
Copy link

svollath commented Jun 6, 2018

When deploying a 3 node (kubernetes 1.9.6) cluster with "az aks create ..." we need to modify and reboot every kubernetes node for enabling "swapaccount=1".
When doing that every single node will enter "NotReady" state - my suspision is that VMs got stuck at GRUB bootloader not booting.

I'm able to work around that, with just modifying /boot/grub/grub.cfg and restarting the VMs. So it's likely there's an issue within "update-grub".

Here's what we do to modify and restart each VM (worked ~4 weeks before):

> az vm run-command invoke -g <RGNAME> -n <VMNAME> --command-id RunShellScript --scripts "sudo sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT=\"console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300\"/GRUB_CMDLINE_LINUX_DEFAULT=\"console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1\"/g' /etc/default/grub.d/50-cloudimg-settings.cfg"

> az vm run-command invoke -g <RGNAME> -n <VMNAME> --command-id RunShellScript --scripts "sudo update-grub"

> az vm restart -g <RGNAME> -n <VMNAME>

Here's the resulting /boot/grub/grub.cfg:

opio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  dfb884db-c6ad-4c7a-8075-98cf640a2892
                else
                  search --no-floppy --fs-uuid --set=root dfb884db-c6ad-4c7a-8075-98cf640a2892
                fi
                echo    'Loading Linux 4.13.0-1018-azure ...'
                linux   /boot/vmlinuz-4.13.0-1018-azure root=/dev/sda1 ro  console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
        }
        menuentry 'Ubuntu, with Linux 4.13.0-1018-azure (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.13.0-1018-azure-recovery-dfb884db-c6ad-4c7a-8075-98cf640a2892' {
                recordfail
                load_video
                insmod gzio
                if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  dfb884db-c6ad-4c7a-8075-98cf640a2892
                else
                  search --no-floppy --fs-uuid --set=root dfb884db-c6ad-4c7a-8075-98cf640a2892
                fi
                echo    'Loading Linux 4.13.0-1018-azure ...'
                linux   /boot/vmlinuz-4.13.0-1018-azure root=/dev/sda1 ro recovery nomodeset
        }
        menuentry 'Ubuntu, with Linux 4.13.0-1016-azure' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.13.0-1016-azure-advanced-dfb884db-c6ad-4c7a-8075-98cf640a2892' {
                recordfail
                load_video
                gfxmode $linux_gfx_mode
                insmod gzio
                if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  dfb884db-c6ad-4c7a-8075-98cf640a2892
                else
                  search --no-floppy --fs-uuid --set=root dfb884db-c6ad-4c7a-8075-98cf640a2892
                fi
                echo    'Loading Linux 4.13.0-1016-azure ...'
                linux   /boot/vmlinuz-4.13.0-1016-azure root=UUID=dfb884db-c6ad-4c7a-8075-98cf640a2892 ro  console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
                echo    'Loading initial ramdisk ...'
                initrd  /boot/initrd.img-4.13.0-1016-azure
        }
        menuentry 'Ubuntu, with Linux 4.13.0-1016-azure (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.13.0-1016-azure-recovery-dfb884db-c6ad-4c7a-8075-98cf640a2892' {
                recordfail
                load_video
                insmod gzio
                if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  dfb884db-c6ad-4c7a-8075-98cf640a2892
                else
                  search --no-floppy --fs-uuid --set=root dfb884db-c6ad-4c7a-8075-98cf640a2892
                fi
                echo    'Loading Linux 4.13.0-1016-azure ...'
                linux   /boot/vmlinuz-4.13.0-1016-azure root=UUID=dfb884db-c6ad-4c7a-8075-98cf640a2892 ro recovery nomodeset
                echo    'Loading initial ramdisk ...'
                initrd  /boot/initrd.img-4.13.0-1016-azure
        }
}

### END /etc/grub.d/10_linux ###

### BEGIN /etc/grub.d/20_linux_xen ###
### END /etc/grub.d/20_linux_xen ###

### BEGIN /etc/grub.d/30_os-prober ###
### END /etc/grub.d/30_os-prober ###

### BEGIN /etc/grub.d/30_uefi-firmware ###
### END /etc/grub.d/30_uefi-firmware ###

### BEGIN /etc/grub.d/40_custom ###
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.
### END /etc/grub.d/40_custom ###

### BEGIN /etc/grub.d/41_custom ###
if [ -f  ${config_directory}/custom.cfg ]; then
  source ${config_directory}/custom.cfg
elif [ -z \"${config_directory}\" -a -f  $prefix/custom.cfg ]; then
  source $prefix/custom.cfg;
fi
### END /etc/grub.d/41_custom ###
@tomconte
Copy link
Member

tomconte commented Jun 6, 2018

Below is the boot log from serial console after having modified the 50-cloudimg-settings.cfg file and running update-grub as described by Sebastian.

In the end it seems the eth0 network interface is not found, which means the VM is unreachable, services don't start, etc.

[FAILED] Failed to start Raise network interfaces.

Full log:

[    0.000000] random: get_random_bytes called from start_kernel+0x42/0x507 with crng_init
=0
[    0.000000] Linux version 4.13.0-1018-azure (buildd@lcy01-amd64-014) (gcc version 5.4.0
 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)) #21-Ubuntu SMP Thu May 17 13:58:38 UTC 2018 (Ub
untu 4.13.0-1018.21-azure 4.13.16)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0-1018-azure root=/dev/sda1 ro
console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'sta
ndard' format.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001ffeffff] usable
[    0.000000] BIOS-e820: [mem 0x000000001fff0000-0x000000001fffefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000001ffff000-0x000000001fffffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000029fffffff] usable
[    0.000000] bootconsole [earlyser0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] random: fast init done
[    0.000000] SMBIOS 2.3 present.
[    0.000000] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06
/02/2017
[    0.000000] Hypervisor detected: Microsoft Hyper-V
[    0.000000] Hyper-V: features 0x2e7f, hints 0xc2c
[    0.000000] Hyper-V Host Build:14393-10.0-0-0.230
[    0.000000] Hyper-V: LAPIC Timer Frequency: 0xc3500
[    0.000000] tsc: Marking TSC unstable due to running on Hyper-V
[    0.000000] Hyper-V: Using ext hypercall for remote TLB flush
[    0.000000] e820: last_pfn = 0x2a0000 max_arch_pfn = 0x400000000
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
Memory KASLR using RDRAND RDTSC...
[    0.000000] e820: last_pfn = 0x1fff0 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at [ffff94ca000ff7
80]
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Using GB pages for direct mapping
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F5BF0 000014 (v00 ACPIAM)
[    0.000000] ACPI: RSDT 0x000000001FFF0000 000040 (v01 VRTUAL MICROSFT 06001702 MSFT 000
00097)
[    0.000000] ACPI: FACP 0x000000001FFF0200 000081 (v02 VRTUAL MICROSFT 06001702 MSFT 000
00097)
[    0.000000] ACPI: DSDT 0x000000001FFF1D24 003CBE (v01 MSFTVM MSFTVM02 00000002 INTL 020
02026)
[    0.000000] ACPI: FACS 0x000000001FFFF000 000040
[    0.000000] ACPI: WAET 0x000000001FFF1A80 000028 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] ACPI: SLIC 0x000000001FFF1AC0 000176 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] ACPI: OEM0 0x000000001FFF1CC0 000064 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] ACPI: SRAT 0x000000001FFF0800 000140 (v02 VRTUAL MICROSFT 00000001 MSFT 00000001)
[    0.000000] ACPI: APIC 0x000000001FFF0300 000452 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] ACPI: OEMB 0x000000001FFFF040 000064 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x1fffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x29fffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x2a0200000-0xfdfffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0xffffffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x10000200000-0x1ffffffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x20000200000-0x3ffffffffff] hotplug
[    0.000000] NUMA: Node 0 [mem 0x00000000-0x1fffffff] + [mem 0x100000000-0x29fffffff] -> [mem 0x00000000-0x29fffffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x29ffd5000-0x29fffffff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000029fffffff]
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x000000001ffeffff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x000000029fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000029fffffff]
[    0.000000] ACPI: PM-Timer IO Port: 0x408
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] smpboot: Allowing 128 CPUs, 126 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
[    0.000000] PM: Registered nosave memory: [mem 0x1fff0000-0x1fffefff]
[    0.000000] PM: Registered nosave memory: [mem 0x1ffff000-0x1fffffff]
[    0.000000] PM: Registered nosave memory: [mem 0x20000000-0xffffffff]
[    0.000000] e820: [mem 0x20000000-0xffffffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:1
[    0.000000] percpu: Embedded 44 pages/cpu @ffff94cc96600000 s141848 r8192 d30184 u262144
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1806201
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0-1018-azure root=/dev/sda1 ro console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
[    0.000000] log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] log_buf_len total cpu_extra contributions: 520192 bytes
[    0.000000] log_buf_len min size: 262144 bytes
[    0.000000] log_buf_len: 1048576 bytes
[    0.000000] early log buf free: 252048(96%)
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 7109508K/7339576K available (12300K kernel code, 2317K rwdata, 3588K rodata, 2204K init, 2340K bss, 230068K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=128, Nodes=1
[    0.000000] Kernel/User page tables isolation: enabled
[    0.000000] ftrace: allocating 34707 entries in 136 pages
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=128.
[    0.000000]  Tasks RCU enabled.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=128
[    0.000000] NR_IRQS: 524544, nr_irqs: 1448, preallocated irqs: 16
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty1] enabled
[    0.000000] console [ttyS0] enabled
[    0.000000] console [ttyS0] enabled
[    0.000000] bootconsole [earlyser0] disabled
[    0.000000] bootconsole [earlyser0] disabled
[    0.000000] tsc: Detected 2294.684 MHz processor
[    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency..4589.36 BogoMIPS (lpj=9178736)
[    0.008027] pid_max: default: 131072 minimum: 1024
[    0.012073] ACPI: Core revision 20170531
[    0.021454] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    0.024178] Security Framework initialized
[    0.028028] Yama: becoming mindful.
[    0.032194] AppArmor: AppArmor initialized
[    0.037732] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.040828] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.044100] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.048058] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.052494] CPU: Physical Processor ID: 0
[    0.060025] CPU: Processor Core ID: 0
[    0.064098] FEATURE SPEC_CTRL Not Present
[    0.068037] mce: CPU supports 1 MCE banks
[    0.072129] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[    0.076032] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[    0.080025] Spectre V2 mitigation: Mitigation: Full generic retpoline
[    0.084025] Spectre V2 mitigation: Speculation control IBPB not-supported IBRS not-supported
[    0.084026] Speculative Store Bypass: Vulnerable
[    0.092261] Freeing SMP alternatives memory: 32K
[    0.100981] smpboot: Max logical packages: 64
[    0.104127] Switched APIC routing to physical flat.
[    0.108053] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
[    0.143285] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.144153] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz (family: 0x6, model: 0x4f, stepping: 0x1)
[    0.148187] Performance Events: unsupported p6 CPU model 79 no PMU driver, software events only.
[    0.152079] Hierarchical SRCU implementation.
[    0.161528] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.168006] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.172018] smp: Bringing up secondary CPUs ...
[    0.176138] x86: Booting SMP configuration:
[    0.180003] .... node  #0, CPUs:          #1
[    0.180591] smp: Brought up 1 node, 2 CPUs
[    0.188004] smpboot: Total of 2 processors activated (9178.73 BogoMIPS)
[    0.198691] devtmpfs: initialized
[    0.200060] x86/mm: Memory block size: 128MB
[    0.208469] evm: security.selinux
[    0.212002] evm: security.SMACK64
[    0.216001] evm: security.SMACK64EXEC
[    0.220006] evm: security.SMACK64TRANSMUTE
[    0.224002] evm: security.SMACK64MMAP
[    0.228004] evm: security.ima
[    0.232002] evm: security.capability
[    0.236137] PM: Registering ACPI NVS region [mem 0x1ffff000-0x1fffffff] (4096 bytes)
[    0.240137] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.244095] futex hash table entries: 32768 (order: 9, 2097152 bytes)
[    0.248741] pinctrl core: initialized pinctrl subsystem
[    0.273897] RTC time: 14:43:21, date: 06/06/18
[    0.276244] NET: Registered protocol family 16
[    0.280230] cpuidle: using governor ladder
[    0.284015] cpuidle: using governor menu
[    0.288006] PCCT header not found.
[    0.292097] ACPI: bus type PCI registered
[    0.296002] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.300635] PCI: Using configuration type 1 for base access
[    0.305115] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    0.308004] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.312217] ACPI: Added _OSI(Module Device)
[    0.316007] ACPI: Added _OSI(Processor Device)
[    0.320003] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.324008] ACPI: Added _OSI(Processor Aggregator Device)
[    0.335168] ACPI: Interpreter enabled
[    0.336018] ACPI: (supports S0 S5)
[    0.340002] ACPI: Using IOAPIC for interrupt routing
[    0.344028] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.348194] ACPI: Enabled 1 GPEs in block 00 to 0F
[    0.374344] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.376010] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    0.380009] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.384012] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    0.392154] PCI host bridge to bus 0000:00
[    0.396005] pci_bus 0000:00: root bus resource [mem 0xfe0000000-0xfffffffff window]
[    0.400007] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.404006] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.408008] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.412007] pci_bus 0000:00: root bus resource [mem 0x20000000-0xfffbffff window]
[    0.416010] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.430060] pci 0000:00:07.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.432006] pci 0000:00:07.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.436058] pci 0000:00:07.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.440008] pci 0000:00:07.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.444568] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
[    0.444568] * this clock source is slow. Consider trying other clock sources
[    0.453638] pci 0000:00:07.3: quirk: [io  0x0400-0x043f] claimed by PIIX4 ACPI
[    0.477626] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 9 10 *11 12 14 15)
[    0.480250] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 9 10 11 12 14 15) *0, disabled.
[    0.484272] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 7 9 10 11 12 14 15) *0, disabled.
[    0.488329] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 9 10 11 12 14 15) *0, disabled.
[    0.492502] SCSI subsystem initialized
[    0.496099] pci 0000:00:08.0: vgaarb: setting as boot VGA device
[    0.500000] pci 0000:00:08.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.500007] pci 0000:00:08.0: vgaarb: bridge control possible
[    0.504001] vgaarb: loaded
[    0.508114] EDAC MC: Ver: 3.0.0
[    0.512544] hv_vmbus: Vmbus version:4.0
[    0.516598] PCI: Using ACPI for IRQ routing
[    0.520769] NetLabel: Initializing
[    0.524004] NetLabel:  domain hash size = 128
[    0.528009] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.532025] NetLabel:  unlabeled traffic allowed by default
[    0.536115] clocksource: Switched to clocksource hyperv_clocksource_tsc_page
[    0.588480] VFS: Disk quotas dquot_6.6.0
[    0.611378] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.648303] AppArmor: AppArmor Filesystem Enabled
[    0.675846] pnp: PnP ACPI init
[    0.697297] system 00:06: [io  0x01e0-0x01ef] has been reserved
[    0.731394] system 00:06: [io  0x0160-0x016f] has been reserved
[    0.766462] system 00:06: [io  0x0278-0x027f] has been reserved
[    0.800494] system 00:06: [io  0x0378-0x037f] has been reserved
[    0.835379] system 00:06: [io  0x0678-0x067f] has been reserved
[    0.869626] system 00:06: [io  0x0778-0x077f] has been reserved
[    0.899377] system 00:06: [io  0x04d0-0x04d1] has been reserved
[    0.933407] system 00:07: [io  0x0400-0x043f] has been reserved
[    0.967892] system 00:07: [io  0x0370-0x0371] has been reserved
[    1.002574] system 00:07: [io  0x0440-0x044f] has been reserved
[    1.037272] system 00:07: [mem 0xfec00000-0xfec00fff] could not be reserved
[    1.077732] system 00:07: [mem 0xfee00000-0xfee00fff] has been reserved
[    1.118139] system 00:08: [mem 0x00000000-0x0009ffff] could not be reserved
[    1.159370] system 00:08: [mem 0x000c0000-0x000dffff] could not be reserved
[    1.199200] system 00:08: [mem 0x000e0000-0x000fffff] could not be reserved
[    1.239406] system 00:08: [mem 0x00100000-0x1fffffff] could not be reserved
[    1.281371] system 00:08: [mem 0xfffc0000-0xffffffff] has been reserved
[    1.319632] pnp: PnP ACPI: found 9 devices
[    1.349720] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.401047] NET: Registered protocol family 2
[    1.427030] TCP established hash table entries: 65536 (order: 7, 524288 bytes)
[    1.468735] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    1.508119] TCP: Hash tables configured (established 65536 bind 65536)
[    1.546623] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[    1.581707] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[    1.619218] NET: Registered protocol family 1
[    1.644064] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    1.676813] pci 0000:00:08.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    1.721288] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    1.755352] software IO TLB [mem 0x1bff0000-0x1fff0000] (64MB) mapped at [ffff94ca1bff0000-ffff94ca1ffeffff]
[    1.808312] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x21139776891, max_idle_ns: 440795250948 ns
[    1.816033] Scanning for low memory corruption every 60 seconds
[    1.818737] audit: initializing netlink subsys (disabled)
[    1.819189] Initialise system trusted keyrings
[    1.819197] Key type blacklist registered
[    1.819212] audit: type=2000 audit(1528296200.818:1): state=initialized audit_enabled=0 res=1
[    1.819224] workingset: timestamp_bits=36 max_order=21 bucket_order=0
[    1.821634] zbud: loaded
[    1.823113] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    1.823534] fuse init (API version 7.26)
[    1.826952] Key type asymmetric registered
[    1.826953] Asymmetric key parser 'x509' registered
[    1.827110] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247)
[    1.827189] io scheduler noop registered (default)
[    2.253191] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    2.294826] ACPI: Power Button [PWRF]
[    2.316588] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    2.445156] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    2.579185] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
[    2.623177] Linux agpgart interface v0.103
[    2.649070] loop: module loaded
[    2.667836] hv_vmbus: registering driver hv_storvsc
[    2.696202] scsi host0: storvsc_host_t
[    2.721798] scsi 0:0:0:0: Direct-Access     Msft     Virtual Disk     1.0  PQ: 0 ANSI:5
[    2.771507] scsi host1: storvsc_host_t
[    2.798824] scsi 1:0:1:0: Direct-Access     Msft     Virtual Disk     1.0  PQ: 0 ANSI:5
[    2.848324] scsi host2: storvsc_host_t
[    2.877604] scsi host3: storvsc_host_t
[    2.905681] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    2.913625] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
[    2.913627] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    2.913666] sd 0:0:0:0: [sda] Write Protect is off
[    2.913832] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.915351]  sda: sda1
[    2.916614] sd 0:0:0:0: [sda] Attached SCSI disk
[    3.124672] sd 1:0:1:0: Attached scsi generic sg1 type 0
[    3.154492] sd 1:0:1:0: [sdb] 29360128 512-byte logical blocks: (15.0 GB/14.0 GiB)
[    3.154794] ata_piix 0000:00:07.1: Hyper-V Virtual Machine detected, ATA device ignoreset
[    3.199239] scsi host4: ata_piix
[    3.199325] scsi host5: ata_piix
[    3.199437] ata1: PATA max UDMA/33 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
[    3.199438] ata2: PATA max UDMA/33 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[    3.199539] libphy: Fixed MDIO Bus: probed
[    3.199540] tun: Universal TUN/TAP device driver, 1.6
[    3.249865] PPP generic driver version 2.4.2
[    3.249917] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
[    3.275939] serio: i8042 KBD port at 0x60,0x64 irq 1
[    3.276059] serio: i8042 AUX port at 0x60,0x64 irq 12
[    3.276148] mousedev: PS/2 mouse device common for all mice
[    3.276917] rtc_cmos 00:00: RTC can wake from S4
[    3.299681] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[    3.301667] rtc_cmos 00:00: alarms up to one month, 114 bytes nvram
[    3.301746] device-mapper: uevent: version 1.0.3
[    3.301952] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[    3.302653] NET: Registered protocol family 10
[    3.329222] Segment Routing with IPv6
[    3.329235] NET: Registered protocol family 17
[    3.329243] Key type dns_resolver registered
[    3.871524] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    3.873136] ata2.00: ATAPI: Virtual CD, , max MWDMA2
[    3.877374] ata2.00: configured for MWDMA2
[    3.991979] sd 1:0:1:0: [sdb] 4096-byte physical blocks
[    4.020389] RAS: Correctable Errors collector initialized.
[    4.021087] sd 1:0:1:0: [sdb] Write Protect is off
[    4.081183] registered taskstats version 1
[    4.081407] sd 1:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPOand FUA
[    4.157232] Loading compiled-in X.509 certificates
[    4.186891] Loaded X.509 cert 'Build time autogenerated kernel key: 7273dfcd51ce17c3a0acc6c926ef1c75fa43a7eb'
[    4.245176] zswap: loaded using pool lzo/zbud
[    4.273162] Key type big_key registered
[    4.300113] Key type trusted registered
[    4.324545]  sdb: sdb1
[    4.340055] Key type encrypted registered
[    4.364670] AppArmor: AppArmor sha1 policy hashing enabled
[    4.398302] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
[    4.435807] evm: HMAC attrs: 0x1
[    4.457283]   Magic number: 6:618:737
[    4.480269] acpi LNXCPU:66: hash matches
[    4.504411] sd 1:0:1:0: [sdb] Attached SCSI disk
[    4.532445] rtc_cmos 00:00: setting system clock to 2018-06-06 14:43:27 UTC (1528296207)
[    4.532588] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    4.532588] EDD information not available.
[    4.647109] Waiting 300 sec before mounting root device...
[    4.698924] scsi 5:0:0:0: CD-ROM            Msft     Virtual CD/ROM   1.0  PQ: 0 ANSI:5
[    4.762577] sr 5:0:0:0: [sr0] scsi3-mmc drive: 0x/0x tray
[    4.796251] cdrom: Uniform CD-ROM driver Revision: 3.20
[    4.828656] sr 5:0:0:0: Attached scsi generic sg2 type 5
[   65.149404] random: crng init done

It stays stuck there for a long time and then starts again:

[  242.652053] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  242.692515]       Not tainted 4.13.0-1018-azure #21-Ubuntu
[  242.726189] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.773035] swapper/0       D    0     1      0 0x80000000
[  242.804670] Call Trace:
[  242.819731]  __schedule+0x3d6/0x8b0
[  242.840169]  schedule+0x36/0x80
[  242.858942]  schedule_timeout+0x17a/0x360
[  242.882503]  ? call_timer_fn+0x140/0x140
[  242.905375]  ? printk+0x52/0x6e
[  242.924433]  msleep+0x2e/0x40
[  242.942405]  ? msleep+0x2e/0x40
[  242.961225]  prepare_namespace+0x2f/0x194
[  242.984905]  kernel_init_freeable+0x212/0x23b
[  243.010324]  ? rest_init+0xc0/0xc0
[  243.030406]  kernel_init+0xe/0x100
[  243.050270]  ret_from_fork+0x35/0x40
[  316.380120] md: Waiting for all devices to be available before autodetect
[  316.420720] md: If you don't use raid, use raid=noautodetect
[  316.455898] md: Autodetecting RAID arrays.
[  316.480434] md: autorun ...
[  316.497985] md: ... autorun DONE.
[  316.518345] EXT4-fs (sda1): couldn't mount as ext3 due to feature incompatibilities
[  316.561965] EXT4-fs (sda1): couldn't mount as ext2 due to feature incompatibilities
[  316.607491] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[  316.650540] VFS: Mounted root (ext4 filesystem) readonly on device 8:1.
[  316.688668] devtmpfs: mounted
[  316.709313] Freeing unused kernel memory: 2204K
[  316.736945] Write protecting the kernel read-only data: 18432k
[  316.771968] Freeing unused kernel memory: 2024K
[  316.800604] Freeing unused kernel memory: 508K
[  316.828909] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[  316.866924] x86/mm: Checking user space page tables
[  316.896878] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[  317.011150] systemd[1]: Failed to insert module 'autofs4': No such file or directory
[  317.071828] systemd[1]: systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA+APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)
[  317.176700] systemd[1]: Detected virtualization microsoft.
[  317.209778] systemd[1]: Detected architecture x86-64.

Welcome to Ubuntu 16.04.4 LTS!

[  317.257767] systemd[1]: Set hostname to <aks-nodepool1-39245783-0>.
[  317.535817] systemd[1]: Created slice System Slice.
[  OK  ] Created slice System Slice.
[  317.589463] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[  OK  ] Listening on Device-mapper event daemon FIFOs.
[  317.654823] systemd[1]: Reached target User and Group Name Lookups.
[  OK  ] Reached target User and Group Name Lookups.
[  317.718808] systemd[1]: Starting of Arbitrary Executable File Formats File System Automount Point not supported.
[UNSUPP] Starting of Arbitrary Executable Fi...m Automount Point not supported.
[  317.812413] systemd[1]: Reached target Encrypted Volumes.
[  OK  ] Reached target Encrypted Volumes.
[  317.866436] systemd[1]: Listening on fsck to fsckd communication Socket.
[  OK  ] Listening on fsck to fsckd communication Socket.
[  OK  ] Listening on Journal Socket.
         Starting Set console keymap...
         Mounting Huge Pages File System...
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Listening on Syslog Socket.
         Starting Remount Root and Kernel File Systems...
[  OK  ] Started Trigger resolvconf update for networkd DNS.
         Mounting Debug File System...
         Starting Create Static Device Nodes in /dev...
         Starting Uncomplicated firewall...
[  OK  ] Listening on udev Control Socket.[  318.214815] EXT4-fs (sda1): re-mounted. Opts: discard

         Starting Load Kernel Modules...
         Starting Nameserver information manager...
[  OK  ] Created slice system-systemd\x2dfsck.slice.
         Mounting POSIX Message Queue File System...
[  OK  ] Listening on LVM2 metadata daemon socket.
         Starting Monitoring of LVM2 mirrors... dmeventd or progress polling...
[  OK  ] Listening on Journal Audit Socket.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Created slice system-serial\x2dgetty.slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Reached target Slices.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on LVM2 poll daemon socket.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
         Starting Journal Service...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted Debug File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Started Set console keymap.
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Started Create Static Device Nodes in /dev.
[  OK  ] Started Uncomplicated firewall.
[FAILED] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
[  OK  ] Started Journal Service.
[  OK  ] Started Nameserver information manager.
[  OK  ] Started LVM2 metadata daemon.
         Starting Apply Kernel Variables...
         Mounting Configuration File System...
         Mounting FUSE Control File System...
         Starting udev Kernel Device Manager...
         Starting Load/Save Random Seed...
         Starting udev Coldplug all Devices...
         Starting Initial cloud-init job (pre-networking)...
         Starting Flush Journal to Persistent Storage...
[  OK  ] Mounted Configuration File System.
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Monitoring of LVM2 mirrors,...ng dmeventd or progress polling.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Set console font and keymap...
         Starting Tell Plymouth To Write Out Runtime Data...
         Starting LSB: ebtables ruleset management...
         Starting LSB: AppArmor initialization...
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Tell Plymouth To Write Out Runtime Data.
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  321.610606] cloud-init[223]: Cloud-init v. 18.2 running 'init-local' at Wed, 06 Jun 2018 14:48:43 +0000. Up 320.80 seconds.
[  OK  [  342.565578] ] Started Create Volatile Files and Directories.cloud-init[223]: 2018-06-06 14:48:44,583 - stages.py[WARNING]: Failed to rename devices: [nic not present] Cannot rename mac=00:0d:3a:2a:15:be to eth0, not available.

[  OK  ] Started Initial cloud-init job (pre-networking).
[  OK  ] Started LSB: ebtables ruleset management.
[  OK  ] Started LSB: AppArmor initialization.
[  OK  ] Found device /dev/ttyS0.
[  OK  ] Found device Virtual_Disk 1.
         Starting File System Check on /dev/disk/cloud/azure_resource-part1...
[  OK  ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
[  OK  ] Reached target Network (Pre).
         Starting Raise network interfaces...
         Starting Update UTMP about System Boot/Shutdown...
         Starting Network Time Synchronization...
[  OK  ] Started Network Time Synchronization.
[  OK  ] Started Set console font and keymap.
[  OK  ] Started File System Check on /dev/disk/cloud/azure_resource-part1.
[  OK  ] Started File System Check Daemon to report status.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Reached target System Time Synchronized.
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[FAILED] Failed to start Raise network interfaces.
See 'systemctl status networking.service' for details.
[  OK  ] Reached target Network.
         Starting Initial cloud-init job (metadata service crawler)...
[  371.248038] cloud-init[704]: Cloud-init v. 18.2 running 'init' at Wed, 06 Jun 2018 14:49:34 +0000. Up 371.04 seconds.
[  371.296939] cloud-init[704]: ci-info: +++++++++++++++++++++++Net device info++++++++++++++++++++++++
[  371.335293] cloud-init[704]: ci-info: +--------+------+-----------+-----------+-------+------------+
[  371.377439] cloud-init[704]: ci-info: | Device |  Up  |  Address  |    Mask   | Scope | Hw-Address |
[  371.417594] cloud-init[704]: ci-info: +--------+------+-----------+-----------+-------+------------+
[  371.455284] cloud-init[704]: ci-info: |   lo   | True | 127.0.0.1 | 255.0.0.0 |   .   |     .      |
[  371.497938] cloud-init[704]: ci-info: |   lo   | True |  ::1/128  |     .     |  host |     .      |
[  371.535779] cloud-init[704]: ci-info: +--------+------+-----------+-----------+-------+------------+
[  371.574200] cloud-init[704]: 2018-06-06 14:49:34,221 - stages.py[WARNING]: Failed to rename devices: [nic not present] Cannot rename mac=00:0d:3a:2a:15:be to eth0, not available.
[  OK  ] Started Initial cloud-init job (metadata service crawler).
[  OK  ] Reached target Network is Online.
         Starting iSCSI initiator daemon (iscsid)...
[  OK  ] Reached target System Initialization.
         Starting Docker Socket for the API.
         Starting Socket activation for snappy daemon.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Started Daily apt download activities.
[  OK  ] Listening on ACPID Listen Socket.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Reached target Timers.
[  OK  ] Started ACPI Events Check.
[  OK  ] Reached target Paths.
[  OK  ] Listening on D-Bus System Message Bus Socket.
         Starting LXD - unix socket.
[  OK  ] Listening on UUID daemon activation socket.
[  OK  ] Reached target Cloud-config availability.
         Mounting /mnt...
[  OK  ] Listening on Docker Socket for the API.
[  OK  ] Listening on Socket activation for snappy daemon.
[  OK  ] Listening on LXD - unix socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Hyper-V VSS Protocol Daemon.
         Starting Snappy daemon...
[  OK  ] Started Regular background program processing daemon.
[  OK  ] Started Deferred execution scheduler.
         Starting OMI CIM Server...
[  OK  ] Started Name Service Cache Daemon.
         Starting LSB: Record successful boot for GRUB...
         Starting LSB: MD monitoring daemon...
         Starting Accounts Service...
[  OK  ] Started ACPI event daemon.
         Starting System Logging Service...
         Starting /etc/rc.local Compatibility...[  373.008043]
         Starting Login Service...
         Starting Docker Application Container Engine...
rc.local[928]:          Starting OpenBSD Secure Shell server.../etc/rc.local: 13: /etc/rc.local: cannot create /sys/bus/vmbus/drivers/hv_util/unbind: Directory nonexistent

[  OK  ] Started Hyper-V File Copy Protocol Daemon.
         Starting LXD - container startup/shutdown...
[  OK  ] Started Azure Linux Agent.
[  OK  ] Started Azure Linux Diagnostic Extension.
[  OK  ] Started Unattended Upgrades Shutdown.
         Starting Kubelet...
[  OK  ] Started FUSE filesystem for LXC.
         Starting Apply the settings specified in cloud-config...
[  OK  ] Started Hyper-V KVP Protocol Daemon.
[  OK  ] Mounted /mnt.
[  OK  ] Started System Logging Service.
[  OK  ] Started OpenBSD Secure Shell server.
[  OK  ] Started Snappy daemon.
[FAILED] Failed to start iSCSI initiator daemon (iscsid).
See 'systemctl status iscsid.service' for details.
[FAILED] Failed to start /etc/rc.local Compatibility.
See 'systemctl status rc-local.service' for details.
[  OK  ] Started OMI CIM Server.
[  OK  ] Started Login Service.
[  OK  ] Started LXD - container startup/shutdown.
         Starting Authenticate and Authorize Users to Run Privileged Tasks...
         Starting Login to default iSCSI targets...
[FAILED] Failed to start Login to default iSCSI targets.
See 'systemctl status open-iscsi.service' for details.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
         Starting LSB: automatic crash report generation...
         Starting LSB: daemon to balance interrupts for SMP systems...
         Starting LSB: Set the CPU Frequency Scaling governor to "ondemand"...
         Starting Permit User Sessions...
[  OK  ] Started LSB: Set the CPU Frequency Scaling governor to "ondemand".
[  OK  ] Started Permit User Sessions.
[  OK  ] Started LSB: MD monitoring daemon.
[  OK  ] Started LSB: Record successful boot for GRUB.
[  OK  ] Started LSB: daemon to balance interrupts for SMP systems.
         Starting Terminate Plymouth Boot Screen...
         Starting Hold until boot process finishes up...
[  OK  ] Started Terminate Plymouth Boot Screen.
[  OK  ] Started Hold until boot process finishes up.
[  OK  ] Started Kubelet.
[  OK  ] Started LSB: automatic crash report generation.
[  OK  ] Started Authenticate and Authorize Users to Run Privileged Tasks.
[  OK  ] Started Accounts Service.
[  OK  ] Started Serial Getty on ttyS0.
         Starting Set console scheme...
[  OK  ] Started Getty on tty1.
[  OK  ] Reached target Login Prompts.
[  OK  ] Started Set console scheme.
2018/06/06 14:49:38.968673 INFO Azure Linux Agent Version:2.2.20
2018/06/06 14:49:38.994008 INFO OS: ubuntu 16.04
2018/06/06 14:49:39.019525 INFO Python: 3.5.2
2018/06/06 14:49:39.039976 INFO Run daemon
2018/06/06 14:49:39.063405 INFO Clean protocol
2018/06/06 14:49:39.091408 INFO Provisioning already completed, skipping.
2018/06/06 14:49:39.125127 INFO RDMA capabilities are not enabled, skipping
2018/06/06 14:49:39.170998 INFO Determined Agent WALinuxAgent-2.2.25 to be the latest agent
[  376.366044] cloud-init[998]: Cloud-init v. 18.2 running 'modules:config' at Wed, 06 Jun 2018 14:49:38 +0000. Up 375.38 seconds.
[  OK  ] Started Apply the settings specified in cloud-config.
         Starting Write warning to Azure ephemeral disk...
[  OK  ] Started Write warning to Azure ephemeral disk.
[  OK  ] Closed Syslog Socket.
         Stopping System Logging Service...
2018/06/06 14:49:39.825355 INFO Agent WALinuxAgent-2.2.25 is running as the goal state agent
2018/06/06 14:49:39.877901 INFO Detect protocol endpoints
2018/06/06 14:49:39.904565 INFO Clean protocol
2018/06/06 14:49:39.924020 INFO WireServer endpoint is not found. Rerun dhcp handler
2018/06/06 14:49:39.956053 INFO Test for route to 168.63.129.16
2018/06/06 14:49:39.993075 WARNING No route exists to 168.63.129.16
2018/06/06 14:49:40.019687 INFO Checking for dhcp lease cache
2018/06/06 14:49:40.046583 INFO looking for leases in path [/var/lib/dhcp/dhclient.*.leases]
2018/06/06 14:49:40.083278 INFO dhcp entry:168.63.129.16, 245:True, expired:True
2018/06/06 14:49:40.111679 INFO cached endpoint not found
2018/06/06 14:49:40.132472 INFO Cache exists [False]
2018/06/06 14:49:40.152060 INFO Send dhcp request
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
[DEPEND] Dependency failed for kubectl and kubelet extraction.
2018/06/06 14:49:40.329898 INFO Examine /proc/net/route for primary interface
2018/06/06 14:49:40.359284 WARNING Could not determine primary interface, please ensure /proc/net/route is correct
2018/06/06 14:49:40.398490 WARNING Contents of /proc/net/route:
Iface   Destination     Gateway         Flags   RefCnt  Use     Metric  Mask            MTU       Window  IRTT

2018/06/06 14:49:40.465284 WARNING Primary interface examination will retry silently
[  OK  ] Started Kubernetes systemd probe.
[  OK  ] Stopped System Logging Service.
         Starting Docker Application Container Engine...
[  OK  ] Stopped System Logging Service.
         Starting System Logging Service...
[  OK  ] Started System Logging Service.
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
         Starting Docker Application Container Engine...
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
         Starting Docker Application Container Engine...

Ubuntu 16.04.4 LTS aks-nodepool1-39245783-0 ttyS0

aks-nodepool1-39245783-0 login: [  382.958632] cloud-init[3467]: Cloud-init v. 18.2 running 'modules:final' at Wed, 06 Jun 2018 14:49:45 +0000. Up 382.66 seconds.
[  382.976037] cloud-init[3467]: Cloud-init v. 18.2 finished at Wed, 06 Jun 2018 14:49:45+0000. Datasource DataSourceAzure [seed=/dev/sr0].  Up 382.95 seconds

Ubuntu 16.04.4 LTS aks-nodepool1-39245783-0 ttyS0

@tomconte
Copy link
Member

tomconte commented Jun 6, 2018

I tried upgrading the kernel:

apt-get update
apt-get upgrade -f linux-azure

Then it re-boots OK. Not sure why :-| Will try the steps on another node.

@tomconte
Copy link
Member

I just tried again with a new AKS cluster in eastus and did not see the problem anymore. I think the issue might have been due to some changes in the underlying Ubuntu image.

@svollath
Copy link
Author

I also recreated an AKS cluster yesterday, and try to reproduce it. In fact it turned out again that esp. running update-grub isn't stable.
I ended up with Node-0 being NotReady after restart, while it worked for Node-1 - the output of the resulting /boot/grub/grub.cfg was different for the two nodes, which is really strange:

First lines of grub.cfg on Node-0 (getting stuck):

opio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  84ce5b56-30b1-4479-a864-7296bb549cec
                else
                  search --no-floppy --fs-uuid --set=root 84ce5b56-30b1-4479-a864-7296bb549cec
                fi
                echo    'Loading Linux 4.15.0-1013-azure ...'
                linux   /boot/vmlinuz-4.15.0-1013-azure root=/dev/sda1 ro  console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
        }

vs. first lines of grub.cfg on Node-1 (getting Ready):

ci0,msdos1  84ce5b56-30b1-4479-a864-7296bb549cec
                else
                  search --no-floppy --fs-uuid --set=root 84ce5b56-30b1-4479-a864-7296bb549cec
                fi
                echo    'Loading Linux 4.15.0-1013-azure ...'
                linux   /boot/vmlinuz-4.15.0-1013-azure root=UUID=84ce5b56-30b1-4479-a864-7296bb549cec ro  console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
                echo    'Loading initial ramdisk ...'
                initrd  /boot/initrd.img-4.15.0-1013-azure
        }

However, both files seem to be malformed (I read them with az vm run-command invoke -g $MCRGNAME -n aks-nodepool1-65006794-1 --command-id RunShellScript --scripts "sudo cat /boot/grub/grub.cfg").

@svollath
Copy link
Author

When I tried the initial described commands yesterday (that was with kubernetes 1.9.9) they worked for me.
I would like to change our documentation back to the safe solution of modifying "/etc/default/grub.d/50-cloudimg-settings.cfg" and run "update-grub" - however I'd somehow prefer a confirmation that images are considered fixed (from xy on) before.

Btw. here's a more safe way to add "swapaccount=1":

> az vm run-command invoke -g <RGNAME> -n <VMNAME> --command-id RunShellScript --scripts "sudo sed -i -r 's|^(GRUB_CMDLINE_LINUX_DEFAULT=)\"(.*.)\"|\1\"\2 swapaccount=1\"|' /etc/default/grub.d/50-cloudimg-settings.cfg"

> az vm run-command invoke -g <RGNAME> -n <VMNAME> --command-id RunShellScript --scripts "sudo update-grub"

> az vm restart -g <RGNAME> -n <VMNAME>

@jnoller
Copy link
Contributor

jnoller commented Apr 3, 2019

Closing as stale.

@jnoller jnoller closed this as completed Apr 3, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Aug 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants