Skip to content
This repository has been archived by the owner. It is now read-only.

CoreOS first boot after disk install fails: no "/dev/disk/by-diskuuid" created #1398

Closed
felixkrohn opened this issue Jun 13, 2016 · 5 comments
Closed

Comments

@felixkrohn
Copy link

@felixkrohn felixkrohn commented Jun 13, 2016

Issue Report

  • Installing coreOS on bare metal with coreos-install script, on first boot a rescue shell is launched after waiting 90sec for /dev/disk/by-diskuuid/00000000-0000-0000-0000-000000000001.
  • This is during initrd run/before switch_root to on-disk system
  • I verified that the partiton table's GUID is correct
  • /usr/lib64/udev/rules.d/60-persistent-storage.rules is set up to create /dev/disk/by-{id,label,partlabel,partuuid,uuid} but strangely by-diskuuid seems to be missing, which might explain the problem in finding the disk
  • message "initrd-cleanup.service: Main process exited, code=exited, status=4/NOPERMISSION" is shown, but in my opinion is related to the earlier error

Screenshots

screenshot 1: waiting for /dev/disk/by-
screenshot 2: boot error
screenshot 3: /dev/disk contents

Bug

CoreOS Version

installed version: CoreOS stable 1010.5.0

Environment

What hardware/cloud provider/hypervisor is being used to run CoreOS?

Bare metal: SuperMicro X9SRi board, 2x 3TB hard disks (first disk has been wiped before installation, second disk is free of any data, no partition tables or other metadata present)
I could reproduce this bug on several identical servers, in about a dozen installation attempts on each.

@crawford
Copy link
Member

@crawford crawford commented Jun 20, 2016

The initramfs has an alternate set of udev rules. 90-disk-uuid.rules is responsible for creating that symlink.

Can you provide the output of udevadm info --name sda?

@felixkrohn
Copy link
Author

@felixkrohn felixkrohn commented Jun 21, 2016

Thanks for this pointer. it looks as as if some ZFS metadata has withstood the extensive disk wiping, I will try to clear this and try again to install coreOS afterwards. Interestingly, Debian Jessie's "udevadm info" doesn't see these labels.

P: /devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
N: sda
S: disk/by-id/ata-HGST_HUS724030ALA640_PN2231P8HBGDBR
S: disk/by-id/wwn-0x5000cca22cd34da3
E: DEVLINKS=/dev/disk/by-id/ata-HGST_HUS724030ALA640_PN2231P8HBGDBR /dev/disk/by-id/wwn-0x5000cca22cd34da3
E: DEVNAME=/dev/sda
E: DEVPATH=/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
E: DEVTYPE=disk
E: ID_ATA=1
E: ID_ATA_DOWNLOAD_MICROCODE=1
E: ID_ATA_FEATURE_SET_APM=1
E: ID_ATA_FEATURE_SET_APM_ENABLED=0
E: ID_ATA_FEATURE_SET_HPA=1
E: ID_ATA_FEATURE_SET_HPA_ENABLED=1
E: ID_ATA_FEATURE_SET_PM=1
E: ID_ATA_FEATURE_SET_PM_ENABLED=1
E: ID_ATA_FEATURE_SET_PUIS=1
E: ID_ATA_FEATURE_SET_PUIS_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY=1
E: ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=510
E: ID_ATA_FEATURE_SET_SECURITY_FROZEN=1
E: ID_ATA_FEATURE_SET_SMART=1
E: ID_ATA_FEATURE_SET_SMART_ENABLED=1
E: ID_ATA_ROTATION_RATE_RPM=7200
E: ID_ATA_SATA=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E: ID_ATA_WRITE_CACHE=1
E: ID_ATA_WRITE_CACHE_ENABLED=1
E: ID_BUS=ata
E: ID_FS_LABEL=zroot
E: ID_FS_LABEL_ENC=zroot
E: ID_FS_TYPE=zfs_member
E: ID_FS_USAGE=raid
E: ID_FS_UUID=18207401274896884806
E: ID_FS_UUID_ENC=18207401274896884806
E: ID_FS_UUID_SUB=3721134718891417786
E: ID_FS_UUID_SUB_ENC=3721134718891417786
E: ID_FS_VERSION=5000
E: ID_MODEL=HGST_HUS724030ALA640
E: ID_MODEL_ENC=HGST\x20HUS724030ALA640\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E: ID_REVISION=MF8OAA70
E: ID_SERIAL=HGST_HUS724030ALA640_PN2231P8HBGDBR
E: ID_SERIAL_SHORT=PN2231P8HBGDBR
E: ID_TYPE=disk
E: ID_WWN=0x5000cca22cd34da3
E: ID_WWN_WITH_EXTENSION=0x5000cca22cd34da3
E: MAJOR=8
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=3851978

@felixkrohn
Copy link
Author

@felixkrohn felixkrohn commented Jun 21, 2016

Bingo. after clearing the ZFS metadata from the disk, the fresh coreOS installation now boots fine on the first attempt. I'll add an additional "dd" wipe of start/end of each disk to our setup, as "sgdisk -Z" and "wipefs" don't seem to do the job.
Thanks @crawford!

@crawford
Copy link
Member

@crawford crawford commented Jun 21, 2016

Nice. We should do this as part of coreos-install (/cc @dm0-). That installation script needs to be a bit more nuclear.

@felixkrohn
Copy link
Author

@felixkrohn felixkrohn commented Jun 22, 2016

Quick recipe of what works reproducibly for me to eliminate such metadata: (/cc @dm0- )

dd if=/dev/zero of=${DISK} bs=512k count=1
dd if=/dev/zero of=${DISK} seek=$(($(blockdev --getsz ${DISK}) - 512)) 2>/dev/null # will hit disks end with "No space left on device" message and $?=1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants