New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreOS first boot after disk install fails: no "/dev/disk/by-diskuuid" created #1398

Closed
felixkrohn opened this Issue Jun 13, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@felixkrohn

felixkrohn commented Jun 13, 2016

Issue Report

  • Installing coreOS on bare metal with coreos-install script, on first boot a rescue shell is launched after waiting 90sec for /dev/disk/by-diskuuid/00000000-0000-0000-0000-000000000001.
  • This is during initrd run/before switch_root to on-disk system
  • I verified that the partiton table's GUID is correct
  • /usr/lib64/udev/rules.d/60-persistent-storage.rules is set up to create /dev/disk/by-{id,label,partlabel,partuuid,uuid} but strangely by-diskuuid seems to be missing, which might explain the problem in finding the disk
  • message "initrd-cleanup.service: Main process exited, code=exited, status=4/NOPERMISSION" is shown, but in my opinion is related to the earlier error

Screenshots

screenshot 1: waiting for /dev/disk/by-
screenshot 2: boot error
screenshot 3: /dev/disk contents

Bug

CoreOS Version

installed version: CoreOS stable 1010.5.0

Environment

What hardware/cloud provider/hypervisor is being used to run CoreOS?

Bare metal: SuperMicro X9SRi board, 2x 3TB hard disks (first disk has been wiped before installation, second disk is free of any data, no partition tables or other metadata present)
I could reproduce this bug on several identical servers, in about a dozen installation attempts on each.

@crawford

This comment has been minimized.

Member

crawford commented Jun 20, 2016

The initramfs has an alternate set of udev rules. 90-disk-uuid.rules is responsible for creating that symlink.

Can you provide the output of udevadm info --name sda?

@felixkrohn

This comment has been minimized.

felixkrohn commented Jun 21, 2016

Thanks for this pointer. it looks as as if some ZFS metadata has withstood the extensive disk wiping, I will try to clear this and try again to install coreOS afterwards. Interestingly, Debian Jessie's "udevadm info" doesn't see these labels.

P: /devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
N: sda
S: disk/by-id/ata-HGST_HUS724030ALA640_PN2231P8HBGDBR
S: disk/by-id/wwn-0x5000cca22cd34da3
E: DEVLINKS=/dev/disk/by-id/ata-HGST_HUS724030ALA640_PN2231P8HBGDBR /dev/disk/by-id/wwn-0x5000cca22cd34da3
E: DEVNAME=/dev/sda
E: DEVPATH=/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
E: DEVTYPE=disk
E: ID_ATA=1
E: ID_ATA_DOWNLOAD_MICROCODE=1
E: ID_ATA_FEATURE_SET_APM=1
E: ID_ATA_FEATURE_SET_APM_ENABLED=0
E: ID_ATA_FEATURE_SET_HPA=1
E: ID_ATA_FEATURE_SET_HPA_ENABLED=1
E: ID_ATA_FEATURE_SET_PM=1
E: ID_ATA_FEATURE_SET_PM_ENABLED=1
E: ID_ATA_FEATURE_SET_PUIS=1
E: ID_ATA_FEATURE_SET_PUIS_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY=1
E: ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=510
E: ID_ATA_FEATURE_SET_SECURITY_FROZEN=1
E: ID_ATA_FEATURE_SET_SMART=1
E: ID_ATA_FEATURE_SET_SMART_ENABLED=1
E: ID_ATA_ROTATION_RATE_RPM=7200
E: ID_ATA_SATA=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E: ID_ATA_WRITE_CACHE=1
E: ID_ATA_WRITE_CACHE_ENABLED=1
E: ID_BUS=ata
E: ID_FS_LABEL=zroot
E: ID_FS_LABEL_ENC=zroot
E: ID_FS_TYPE=zfs_member
E: ID_FS_USAGE=raid
E: ID_FS_UUID=18207401274896884806
E: ID_FS_UUID_ENC=18207401274896884806
E: ID_FS_UUID_SUB=3721134718891417786
E: ID_FS_UUID_SUB_ENC=3721134718891417786
E: ID_FS_VERSION=5000
E: ID_MODEL=HGST_HUS724030ALA640
E: ID_MODEL_ENC=HGST\x20HUS724030ALA640\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E: ID_REVISION=MF8OAA70
E: ID_SERIAL=HGST_HUS724030ALA640_PN2231P8HBGDBR
E: ID_SERIAL_SHORT=PN2231P8HBGDBR
E: ID_TYPE=disk
E: ID_WWN=0x5000cca22cd34da3
E: ID_WWN_WITH_EXTENSION=0x5000cca22cd34da3
E: MAJOR=8
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=3851978

@felixkrohn

This comment has been minimized.

felixkrohn commented Jun 21, 2016

Bingo. after clearing the ZFS metadata from the disk, the fresh coreOS installation now boots fine on the first attempt. I'll add an additional "dd" wipe of start/end of each disk to our setup, as "sgdisk -Z" and "wipefs" don't seem to do the job.
Thanks @crawford!

@crawford

This comment has been minimized.

Member

crawford commented Jun 21, 2016

Nice. We should do this as part of coreos-install (/cc @dm0-). That installation script needs to be a bit more nuclear.

@felixkrohn

This comment has been minimized.

felixkrohn commented Jun 22, 2016

Quick recipe of what works reproducibly for me to eliminate such metadata: (/cc @dm0- )

dd if=/dev/zero of=${DISK} bs=512k count=1
dd if=/dev/zero of=${DISK} seek=$(($(blockdev --getsz ${DISK}) - 512)) 2>/dev/null # will hit disks end with "No space left on device" message and $?=1

@crawford crawford added this to the CoreOS 1109.0.0 milestone Jul 5, 2016

@dm0- dm0- self-assigned this Jul 6, 2016

dm0- added a commit to dm0-/init that referenced this issue Jul 7, 2016

coreos-install: wipe labels at the end of the disk
ZFS uses the last half-MiB for redundant labels, which confuses
udev on the first boot after install.

This fixes coreos/bugs#1398.

dm0- added a commit to dm0-/init that referenced this issue Jul 8, 2016

coreos-install: wipe labels at the end of the disk
ZFS uses the last half-MiB for redundant labels, which confuses
udev on the first boot after install.

This fixes coreos/bugs#1398.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment