Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rawhide: 6.2 kernel fails to boot on AWS Xen instances #1371

Closed
dustymabe opened this issue Jan 6, 2023 · 5 comments
Closed

rawhide: 6.2 kernel fails to boot on AWS Xen instances #1371

dustymabe opened this issue Jan 6, 2023 · 5 comments

Comments

@dustymabe
Copy link
Member

dustymabe commented Jan 6, 2023

Our AWS Kola tests that run on the i3.large instance type started failing with 38.20230105.91.0 (kernel-6.2.0-0.rc2.18.fc38).

The last known good kernel is kernel-6.1.0-65.fc38 (note that we don't test debug kernels so none of the kernels in between have been tested). I have verified that the kernel is the culprit by reverting only that package and allow all others to be updated.

The boot logs looks something like:

[    4.523515] udevadm[458]: systemd-udev-settle.service is deprecated. Please fix multipathd-configure.service not to pull it in.^M            
[    4.904874] Invalid max_queues (4), will use default max: 1.^M                                                                               
[    4.911101] ena 0000:00:03.0: ENA device version: 0.10^M                                                                                     
[    4.913730] ena 0000:00:03.0: ENA controller version: 0.0.1 implementation version 1^M                                                       
[    4.933060] blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled^M  
[    4.939967] ena 0000:00:03.0: LLQ is not supported Fallback to host mode policy.^M                                                           
[    4.940309] ena 0000:00:03.0 (unnamed net_device) (uninitialized): Failed to enable MSI-X. irq_cnt -524^M                                    
[    4.940311] ena 0000:00:03.0: Can not reserve msix vectors^M                                                                                 
[    4.940313] ena 0000:00:03.0: Failed to enable and set the admin interrupts^M                                                                
[    4.941460] ena: probe of 0000:00:03.0 failed with error -28

The more complete log is at i-0b236cef570dfad9c.log.txt

@dustymabe dustymabe changed the title rawhide: 6.2 kernel fails to boot on Xen instances rawhide: 6.2 kernel fails to boot on AWS Xen instances Jan 6, 2023
@dustymabe
Copy link
Member Author

For now since this test doesn't block builds I'm thinking maybe we don't revert the kernel in rawhide so that we get testing on new kernel updates and just in case a future update fixes the problem we'll know sooner.

@davdunc
Copy link
Contributor

davdunc commented Jan 9, 2023

Investigating through internal ticket at AWS

@davdunc
Copy link
Contributor

davdunc commented Jan 16, 2023

David Woodhouse pointed to his patch upstream https://lore.kernel.org/all/4bffa69a949bfdc92c4a18e5a1c3cbb3b94a0d32.camel@infradead.org/ for the kernel. This is likely the fix for the issue.

@dustymabe
Copy link
Member Author

Looks like a few patches landed upstream to address this that are in v6.2-rc6.

which should hit rawhide soon.

@dustymabe
Copy link
Member Author

dustymabe commented Feb 4, 2023

This landed in our rawhide stream in 38.20230204.91.0 (kernel-6.2.0-0.rc6.20230202git9f266ccaa2f5.46.fc38) and the AWS xen tests passed!

[2023-02-04T02:34:55.944Z] --- PASS: non-exclusive-test-bucket-0 (200.51s)
[2023-02-04T02:34:55.944Z]     --- PASS: non-exclusive-test-bucket-0/ext.config.platforms.aws.assert-xen (2.07s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants