New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qa: use hard_reset to reboot kclient #28825
Conversation
The teuthology Is that okay for your purposes? |
Hmm, I think we should probably defer the wait if possible. I'll fix that. |
@djgalloway please have another look |
See an unexpected error where the old mount point is busy: /ceph/teuthology-archive/pdonnell-2019-07-11_22:56:09-kcephfs-wip-pdonnell-testing-20190711.203149-distro-basic-smithi/4112066/teuthology.log My instinct would be that the reset didn't actually happen somehow so I'm adding a call to |
@djgalloway do you see what happened? |
From: /ceph/teuthology-archive/pdonnell-2019-07-15_17:05:25-kcephfs-master-distro-basic-smithi/4121449/teuthology.log hard reset doesn't appear to work... |
The job didn't wait for the machine to die.
But you can see a connection failure a couple minutes later in the job:
Which makes me think the machine was on its way to booting back up when teuthology tried to clean up some artifacts there. I'm not sure where you'd need to add a |
/ceph/teuthology-archive/pdonnell-2019-07-26_06:38:30-kcephfs-wip-pdonnell-testing-20190726.021409-distro-basic-smithi/4152120/teuthology.log /ceph/teuthology-archive/pdonnell-2019-07-26_06:38:30-kcephfs-wip-pdonnell-testing-20190726.021409-distro-basic-smithi/4151970/teuthology.log puzzling failure there, looks like the machine just never came back up |
another: /ceph/teuthology-archive/pdonnell-2019-07-26_06:38:30-kcephfs-wip-pdonnell-testing-20190726.021409-distro-basic-smithi/4151951/teuthology.log |
Looking at the teuthology.log, it appears a keystroke got sent that disrupted the automatic GRUB menu countdown. The system got reset and started to boot from the HDD but sat at the GRUB menu. Maybe try scrapping these lines? ceph/qa/tasks/cephfs/kernel_mount.py Lines 191 to 196 in 54e6163
|
power_off may allow the mounts to gracefully unmount. We don't want this if the kclient is stuck or we desire the client to "disappear" and come back. Fixes: http://tracker.ceph.com/issues/37681 Depends-on: ceph/teuthology#1296 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
After sending the reboot command, we need to wait briefly for it to be rebooted so that the kernel client doesn't voluntarily give up its Fb cap. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
I moved that to an except block for debugging. Really appreciate your help @djgalloway ! |
* refs/pull/28825/head: qa: wait for kernel client death qa: use hard_reset to reboot kclient Reviewed-by: David Galloway <dgallowa@redhat.com>
power_off may allow the mounts to gracefully unmount. We don't want this if the
kclient is stuck or we desire the client to "disappear" and come back.
Fixes: http://tracker.ceph.com/issues/37681
Signed-off-by: Patrick Donnelly pdonnell@redhat.com