New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs root drives fail on OCI #2287

Closed
arithx opened this Issue Dec 13, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@arithx

arithx commented Dec 13, 2017

Issue Report

Bug

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1618.0.0
VERSION_ID=1618.0.0
COREOS_BOARD="amd64-usr"

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux?

OCI

Expected Behavior

Machine boots when the root drive is btrfs

Actual Behavior

The iSCSI root device seems to go down unexpectedly and never recovers.

Reproduction Steps

  1. Boot a machine on OCI with a config similar to:
{
  "ignition": {
    "version": "2.0.0"
  },
  "storage": {
    "filesystems": [
      {
        "mount": {
          "device": "/dev/disk/by-label/ROOT",
          "format": "btrfs",
          "create": {
            "force": true,
            "options": [
              "--label=ROOT",
              "--uuid=9aa5237a-ab6b-458b-a7e8-f25e2baef1a3"
            ]
          }
        }
      }
    ]
  }
}

Other Information

 [^[[0;32m  OK  ^[[0m] Reached target Load user-provided cloud configs.^M
          Starting Garbage Collection for rkt...^M
          Starting Configure Oracle OCI Root Disk...^M
          Starting Oracle OCI Firewall Rules...^M
          Starting CoreOS Metadata Agent (SSH Keys)...^M
          Starting Generate sshd host keys...^M
          Starting Update Engine...^M
          Starting Install an ssh key from /proc/cmdline...^M
 [^[[0;32m  OK  ^[[0m] Started Login Service.^M
 [^[[0;32m  OK  ^[[0m] Started Install an ssh key from /proc/cmdline.^M
 [^[[0;32m  OK  ^[[0m] Started Generate /run/coreos/motd.^M
 [^[[0;32m  OK  ^[[0m] Started Configure Oracle OCI Root Disk.^M
 [^[[0;32m  OK  ^[[0m] Created slice system-sshd.slice.^M
 [^[[0;1;31mFAILED^[[0m] Failed to start Update Engine.^M
 See 'systemctl status update-engine.service' for details.^M
 [^[[0;32m  OK  ^[[0m] Started Cluster reboot manager.^M
 [^[[0;32m  OK  ^[[0m] Stopped Update Engine.^M
          Starting Update Engine...^M
 [  132.063028]  session1: session recovery timed out after 120 secs^M
 [  132.063650] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.064146] EXT4-fs error (device dm-0): ext4_find_entry:1432: inode #131156: comm ssh-keygen: reading directory      lblock 0^M
 [  132.065349] Buffer I/O error on dev dm-0, logical block 0, lost sync page write^M
 [  132.072154] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.072907] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.081675] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.082446] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.087151] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.091733] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.092442] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.093142] EXT4-fs error (device dm-0): ext4_find_entry:1432: inode #69582: comm modprobe: reading directory lblock  0^M
 [  132.094651] EXT4-fs (dm-0): previous I/O error to superblock detected^M
 [  132.095562] Buffer I/O error on dev dm-0, logical block 0, lost sync page write^M
 [^[[0;1;31mFAILED^[[  132.096569] sd 2:0:0:1: rejecting I/O to offline device^M
 [0m] Failed to start Garbage Collection for rkt.^M
 See 'systemctl status rkt-gc.ser[  132.098230] sd 2:0:0:1: rejecting I/O to offline device^M
 vice' for detail[  132.098918] sd 2:0:0:1: rejecting I/O to offline device^M
 s.^M
 [  132.100398] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.101487] extend-filesystems[849]: /usr/lib/coreos/extend-filesystems: line 34: /usr/bin/cgpt: Input/output[  132.  102417] sd 2:0:0:1: rejecting I/O to offline device^M
  error^M
 [  132.103679] EXT4-fs error (device dm-0): ext4_find_entry:1432: inode #65130: comm (cksmithd): reading directory       lblock 0^M
 [  132.105214] EXT4-fs (dm-0): previous I/O error to superblock detected^M
 [^[[0;1;31mFAILED^[[0m[  132.106629] Buffer I/O error on dev dm-0, logical block 0, lost sync page write^M
 ] Failed to start Extend Filesystems.^M
 See 'systemctl status extend-filesystems.service' for details.^M
 [  132.111128] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.115834] EXT4-fs error (device dm-0): ext4_find_entry:1432: inode #69582: comm modprobe: reading directory lblock  0^M
 [  132.117057] EXT4-fs (dm-0): previous I/O error to superblock detected^M
 [  132.117800] Buffer I/O error on dev dm-0, logical block 0, lost sync page write^M
 [  132.118783] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.119554] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.120315] BTRFS error (device sda9): bdev /dev/sda9 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0^M
 [  132.121498] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.122260] JBD2: Error -5 detected when updating journal superblock for sda6-8.^M
 [  132.123274] Aborting journal on device sda6-8.^M
 [  132.123874] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.125155] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.125781] BTRFS error (device sda9): bdev /dev/sda9 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0^M
 [  132.126761] BTRFS error (device sda9): bdev /dev/sda9 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0^M
 [  132.127734] BTRFS error (device sda9): bdev /dev/sda9 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0^M
 [  132.128710] BTRFS error (device sda9): bdev /dev/sda9 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0^M
 [  132.129704] BTRFS error (device sda9): bdev /dev/sda9 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0^M
 [  132.130714] sd 2:0:0:1: rejecting I/O to offline device^M
 [  132.131305] BTRFS error (device sda9): bdev /dev/sda9 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0^M
@arithx

This comment has been minimized.

Show comment
Hide comment
@arithx

arithx Dec 13, 2017

cc @bgilbert for more information

arithx commented Dec 13, 2017

cc @bgilbert for more information

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Dec 14, 2017

Member

Masking iscsid.service from the Ignition config makes the problem go away, so it has to do with the session recovery iscsid performs at startup.

Member

bgilbert commented Dec 14, 2017

Masking iscsid.service from the Ignition config makes the problem go away, so it has to do with the session recovery iscsid performs at startup.

@bgilbert bgilbert closed this Apr 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment