-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kola: rawhide: 20220823: hung reboot after simulated disk failure in boot mirror tests #1282
Comments
Nice work digging into this! |
Added a kernel fast-track over in coreos/fedora-coreos-config#1926 |
This one might be more complicated. I wasn't able to reproduce locally with |
OK. More information. I think this is a kernel regression that still isn't fixed in latest upstream kernel git tree. The reason I thought I did a kernel bisect and believe the offending kernel commit is
On a related note I think we should stop testing debug kernels in rawhide. I've spent a decent amount of time in recent months either chasing down issues that only occur on debug kernels OR getting false positive or negative test results because of some behavior that happens on debug kernels and doesn't happen on nodebug kernels. |
Awesome work bisecting! |
They are failing because of a kernel regression and we are actively working on getting a fix upstream and backported. Let's snooze these tests for now. See coreos/fedora-coreos-tracker#1282
They are failing because of a kernel regression and we are actively working on getting a fix upstream and backported. Let's snooze these tests for now. See coreos/fedora-coreos-tracker#1282
With the introduction of Opened a pin request here: coreos/fedora-coreos-config#1936 |
OK. Now that rawhide is on a We'll snooze these tests for rawhide too while we wait for a fix for https://bugzilla.redhat.com/show_bug.cgi?id=2121791 |
The revert for the offending kernel commit landed upstream in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4c66a326b5ab784cddd72de07ac5b6210e9e1b06 |
That commit made it into |
The revert of the problematic commit that caused the hung reboots in our `coreos.boot-mirror` tests landed upstream in the 6.0 development cycle and was backported to 5.19.12. This means the Fedora kernel maintainer has dropped his manual revert of the commit and also that rawhide should no longer see the issue (since it's already on 6.0 rc kernels. Closes coreos/fedora-coreos-tracker#1282
This problem never made it to our production streams. |
They are failing because of a kernel regression and we are actively working on getting a fix upstream and backported. Let's snooze these tests for now. See coreos/fedora-coreos-tracker#1282
The revert of the problematic commit that caused the hung reboots in our `coreos.boot-mirror` tests landed upstream in the 6.0 development cycle and was backported to 5.19.12. This means the Fedora kernel maintainer has dropped his manual revert of the commit and also that rawhide should no longer see the issue (since it's already on 6.0 rc kernels. Closes coreos/fedora-coreos-tracker#1282
They are failing because of a kernel regression and we are actively working on getting a fix upstream and backported. Let's snooze these tests for now. See coreos/fedora-coreos-tracker#1282
The revert of the problematic commit that caused the hung reboots in our `coreos.boot-mirror` tests landed upstream in the 6.0 development cycle and was backported to 5.19.12. This means the Fedora kernel maintainer has dropped his manual revert of the commit and also that rawhide should no longer see the issue (since it's already on 6.0 rc kernels. Closes coreos/fedora-coreos-tracker#1282
The
coreos.boot-mirror.luks
andcoreos.boot-mirror
tests are intermittently failing inrawhide
(see build#72). They appear to be getting hung up during reboot after detaching the primary disk (simulating a disk failure):I think I have tracked this down to some issue in the
6.0.0-0.rc2.19.fc38
kernel. Advancing the kernel the the newer version that was just built today seems to resolve the issue.Here are some logs for the failure:
console.txt
journal.txt
The text was updated successfully, but these errors were encountered: