Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upAppVM sometimes hangs at start #289
Comments
marmarek
assigned
rootkovska
Mar 8, 2015
marmarek
added this to the Release 1 Beta 2 milestone
Mar 8, 2015
marmarek
added
bug
C: core
P: major
labels
Mar 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by joanna on 18 Jul 2011 10:15 UTC
Seems like the repeatable thing is:
Waiting for /dev/xvdd device...
[ 3.938104] kjournald starting. Commit interval 15 seconds
[ 3.938134] EXT3-fs (xvdd): mounted filesystem with ordered data mode
[ 3.941180] dracut: Loading SELinux policy
[ 3.950803] dracut: /sbin/load_policy: Can't load policy: No such device
[ 4.104712] dracut: Switching root
I got it already twice since yesterday! (increasing severity level of this ticket to critical)
In the xl-appvm.log there is only:
Waiting for domain vault (domid 9) to die [6392](pid)
And in the xen-hotplug.log:
stat: cannot stat `/dev/loop3\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop22': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop22': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop17\n/dev/loop23': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop23': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop23': No such file or directory
stat: cannot stat `/dev/loop3\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop17\n/dev/loop23\n/dev/loop26': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop4\n/dev/loop7': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop18': No such file or directory
loop: can't delete device /dev/loop0: Device or resource busy
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15': No such file or directory
stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12\n/dev/loop15\n/dev/loop18': No such file or directory
(Although not sure if this all has been generated due to this VM's start -- although very likely because of the timestamp).
|
Comment by joanna on 18 Jul 2011 10:15 UTC
I got it already twice since yesterday! (increasing severity level of this ticket to critical) In the xl-appvm.log there is only:
And in the xen-hotplug.log:
(Although not sure if this all has been generated due to this VM's start -- although very likely because of the timestamp). |
marmarek
added
P: critical
and removed
P: major
labels
Mar 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by rafal on 19 Jul 2011 14:17 UTC
Could you add
PS4=+"$$"+
to /etc/xen/scripts/block, and
PS4=-"$$"-
to /etc/xen/scripts/block-snapshot, just before HOTPLUG_STORE=xxx line ? And resend xen-hotplug.log.
|
Comment by rafal on 19 Jul 2011 14:17 UTC |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by rafal on 19 Jul 2011 14:50 UTC
And of course add "set -x" just below PS4= line.
|
Comment by rafal on 19 Jul 2011 14:50 UTC |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by rafal on 21 Jul 2011 11:25 UTC
Managed to reproduce at least part of it after 134 VM starts.
There seems to be 2 separate issues:
- stat: cannot stat `/dev/loop4\n/dev/loop7\n/dev/loop10\n/dev/loop12'
In my repro case, I do not see it in the xen-hotplug.log. It is worrying, so I would suggest leaving "set -x" in hotplug scripts during testing, to reproduce it. It is not advisable to keep it permanently, due to performance hit, especially for dispvm case. - hanged udev
I get the exact same warning from udev-settle. After "xl sysrq domainname t", some processes in D state are seen, including udev, but not only. Their backtrace always include attempt to access console; they are hung grabbing console mutexes.
This looks like a kernel bug. The culprit seems to be xencons_close() function, which wants to grab tty_mutex while tty_lock is held; and core tty functions, e.g. tty_open(), do it in the other order (tty_mutex first, then tty_lock) - there is a possible deadlock race then.
The solution is to patch xencons_close() function to release tty_lock first. Being implemented/tested now.
|
Comment by rafal on 21 Jul 2011 11:25 UTC
|
marmarek
unassigned
rootkovska
Mar 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by rafal on 21 Jul 2011 13:48 UTC
With the patched kernel, I do not see the above xencons_close() problem for 250 vm starts. Considering resolved. Patch included at
http://git.qubes-os.org/?p=rafal/kernel-dom0.git;a=commit;h=05fa87eadb4687f95c3f6452c95d6c9dcc1dbdec
Unfortunately I see now a different crash, #306.
|
Comment by rafal on 21 Jul 2011 13:48 UTC Unfortunately I see now a different crash, #306. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 8, 2015
Member
Comment by marmarek on 24 Jul 2011 23:58 UTC
xen-hotplug problem (finding existing device connected with given file) fixed here:
http://git.qubes-os.org/gitweb/?p=marmarek/xen.git;a=commit;h=898fee18c1de8cc13db7c6143310f92449dc5a1d
For 350 VM starts #306 problem didn't occurred.
|
Comment by marmarek on 24 Jul 2011 23:58 UTC For 350 VM starts #306 problem didn't occurred. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Modified by rafal on 25 Jul 2011 07:41 UTC |
marmarek commentedMar 8, 2015
Reported by joanna on 17 Jul 2011 23:41 UTC
I get this in the VM's dmesg (accessed via xl console):