New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppVM sometimes hangs at start #289

Closed
marmarek opened this Issue Mar 8, 2015 · 7 comments

Comments

Projects
None yet
2 participants
@marmarek
Member

marmarek commented Mar 8, 2015

Reported by joanna on 17 Jul 2011 23:41 UTC
I get this in the VM's dmesg (accessed via xl console):

[    0.498887]  xvdb: unknown partition table
Waiting for /dev/xvda* devices...
Qubes: Doing COW setup for AppVM...
Qubes: done.
[    0.568209] udev: starting version 153
[    0.568286] udevd (75): /proc/75/oom_adj is deprecated, please use /proc/75/oom_score_adj instead.
[    0.568957] dracut: rd_NO_MDIMSM: no MD RAID for imsm/isw raids
[    3.747707] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[    3.781017] dracut: Remounting /dev/mapper/dmroot with -o noatime,ro
[    3.785924] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[    3.789573] dracut: Mounted root filesystem /dev/mapper/dmroot
Waiting for /dev/xvdd device...
[    3.819315] kjournald starting.  Commit interval 15 seconds
[    3.819349] EXT3-fs (xvdd): mounted filesystem with ordered data mode
[    3.842438] dracut: Loading SELinux policy
[    3.889656] dracut: /sbin/load_policy: Can't load policy: No such device
[    4.045869] dracut: Switching root
[    4.181967] readahead-collector: starting
[    4.223966] usbcore: registered new interface driver usbfs
[    4.224023] usbcore: registered new interface driver hub
[    4.224189] usbcore: registered new device driver usb
                Welcome to Fedora 
Starting udev: [    4.385344] udev[starting version 161
[    4.502469](237]:) netfront: Initialising virtual ethernet driver.
}

(I closed the console, and then reconnect after some time and saw this:)
{
udevadm settle - timeout of 180 seconds reached, the event queue contains:
  /sys/devices/virtual/net/lo (551)
  /sys/devices/virtual/tty/console (553)
  /sys/devices/virtual/tty/hvc0 (554)
  /sys/devices/virtual/tty/tty0 (557)
  /sys/devices/xen/vif-0/net/eth0 (641)
  /sys/devices/xen/vif-0/net/eth0/queues/rx-0 (642)
  /sys/devices/xen/vif-0/net/eth0/queues/tx-0 (643)
[FAILED]

}



}

Migrated-From: https://wiki.qubes-os.org/ticket/289
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 18 Jul 2011 10:15 UTC
Seems like the repeatable thing is:

Waiting for /dev/xvdd device...
[    3.938104] kjournald starting.  Commit interval 15 seconds
[    3.938134] EXT3-fs (xvdd): mounted filesystem with ordered data mode
[    3.941180] dracut: Loading SELinux policy
[    3.950803] dracut: /sbin/load_policy: Can't load policy: No such device
[    4.104712] dracut: Switching root

I got it already twice since yesterday! (increasing severity level of this ticket to critical)

In the xl-appvm.log there is only:

Waiting for domain vault (domid 9) to die [6392](pid)

And in the xen-hotplug.log:

stat: cannot stat `/dev/loop3\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop22': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop22': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop17\n/dev/loop23': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop23': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop23': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop23\n/dev/loop26': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop4\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop18': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop18': No such file or directory

(Although not sure if this all has been generated due to this VM's start -- although very likely because of the timestamp).

Member

marmarek commented Mar 8, 2015

Comment by joanna on 18 Jul 2011 10:15 UTC
Seems like the repeatable thing is:

Waiting for /dev/xvdd device...
[    3.938104] kjournald starting.  Commit interval 15 seconds
[    3.938134] EXT3-fs (xvdd): mounted filesystem with ordered data mode
[    3.941180] dracut: Loading SELinux policy
[    3.950803] dracut: /sbin/load_policy: Can't load policy: No such device
[    4.104712] dracut: Switching root

I got it already twice since yesterday! (increasing severity level of this ticket to critical)

In the xl-appvm.log there is only:

Waiting for domain vault (domid 9) to die [6392](pid)

And in the xen-hotplug.log:

stat: cannot stat `/dev/loop3\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop22': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop22': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop17\n/dev/loop23': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop23': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop23': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop23\n/dev/loop26': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop4\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop18': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop18': No such file or directory

(Although not sure if this all has been generated due to this VM's start -- although very likely because of the timestamp).

@marmarek marmarek added P: critical and removed P: major labels Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by rafal on 19 Jul 2011 14:17 UTC
Could you add
PS4=+"$$"+
to /etc/xen/scripts/block, and
PS4=-"$$"-
to /etc/xen/scripts/block-snapshot, just before HOTPLUG_STORE=xxx line ? And resend xen-hotplug.log.

Member

marmarek commented Mar 8, 2015

Comment by rafal on 19 Jul 2011 14:17 UTC
Could you add
PS4=+"$$"+
to /etc/xen/scripts/block, and
PS4=-"$$"-
to /etc/xen/scripts/block-snapshot, just before HOTPLUG_STORE=xxx line ? And resend xen-hotplug.log.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by rafal on 19 Jul 2011 14:50 UTC
And of course add "set -x" just below PS4= line.

Member

marmarek commented Mar 8, 2015

Comment by rafal on 19 Jul 2011 14:50 UTC
And of course add "set -x" just below PS4= line.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by rafal on 21 Jul 2011 11:25 UTC
Managed to reproduce at least part of it after 134 VM starts.
There seems to be 2 separate issues:

  1. stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12'
    In my repro case, I do not see it in the xen-hotplug.log. It is worrying, so I would suggest leaving "set -x" in hotplug scripts during testing, to reproduce it. It is not advisable to keep it permanently, due to performance hit, especially for dispvm case.
  2. hanged udev
    I get the exact same warning from udev-settle. After "xl sysrq domainname t", some processes in D state are seen, including udev, but not only. Their backtrace always include attempt to access console; they are hung grabbing console mutexes.
    This looks like a kernel bug. The culprit seems to be xencons_close() function, which wants to grab tty_mutex while tty_lock is held; and core tty functions, e.g. tty_open(), do it in the other order (tty_mutex first, then tty_lock) - there is a possible deadlock race then.
    The solution is to patch xencons_close() function to release tty_lock first. Being implemented/tested now.
Member

marmarek commented Mar 8, 2015

Comment by rafal on 21 Jul 2011 11:25 UTC
Managed to reproduce at least part of it after 134 VM starts.
There seems to be 2 separate issues:

  1. stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12'
    In my repro case, I do not see it in the xen-hotplug.log. It is worrying, so I would suggest leaving "set -x" in hotplug scripts during testing, to reproduce it. It is not advisable to keep it permanently, due to performance hit, especially for dispvm case.
  2. hanged udev
    I get the exact same warning from udev-settle. After "xl sysrq domainname t", some processes in D state are seen, including udev, but not only. Their backtrace always include attempt to access console; they are hung grabbing console mutexes.
    This looks like a kernel bug. The culprit seems to be xencons_close() function, which wants to grab tty_mutex while tty_lock is held; and core tty functions, e.g. tty_open(), do it in the other order (tty_mutex first, then tty_lock) - there is a possible deadlock race then.
    The solution is to patch xencons_close() function to release tty_lock first. Being implemented/tested now.
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by rafal on 21 Jul 2011 13:48 UTC
With the patched kernel, I do not see the above xencons_close() problem for 250 vm starts. Considering resolved. Patch included at
http://git.qubes-os.org/?p=rafal/kernel-dom0.git;a=commit;h=05fa87eadb4687f95c3f6452c95d6c9dcc1dbdec

Unfortunately I see now a different crash, #306.

Member

marmarek commented Mar 8, 2015

Comment by rafal on 21 Jul 2011 13:48 UTC
With the patched kernel, I do not see the above xencons_close() problem for 250 vm starts. Considering resolved. Patch included at
http://git.qubes-os.org/?p=rafal/kernel-dom0.git;a=commit;h=05fa87eadb4687f95c3f6452c95d6c9dcc1dbdec

Unfortunately I see now a different crash, #306.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by marmarek on 24 Jul 2011 23:58 UTC
xen-hotplug problem (finding existing device connected with given file) fixed here:
http://git.qubes-os.org/gitweb/?p=marmarek/xen.git;a=commit;h=898fee18c1de8cc13db7c6143310f92449dc5a1d

For 350 VM starts #306 problem didn't occurred.

Member

marmarek commented Mar 8, 2015

Comment by marmarek on 24 Jul 2011 23:58 UTC
xen-hotplug problem (finding existing device connected with given file) fixed here:
http://git.qubes-os.org/gitweb/?p=marmarek/xen.git;a=commit;h=898fee18c1de8cc13db7c6143310f92449dc5a1d

For 350 VM starts #306 problem didn't occurred.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Modified by rafal on 25 Jul 2011 07:41 UTC

Member

marmarek commented Mar 8, 2015

Modified by rafal on 25 Jul 2011 07:41 UTC

@marmarek marmarek closed this Mar 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment