Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux boot hangs on arm64 Chromebooks running kernels with kselftest cfg #234

Open
hardboprobot opened this issue Aug 14, 2023 · 0 comments
Projects

Comments

@hardboprobot
Copy link

Multiple arm64 Chromebooks are failing to boot, causing tests to fail. This is noticeable in the simple baseline.login test and it's seen at least in kevin, jacuzzi and asurada.

Examples with kernel v6.5-rc5-387-g4c75bf7e4a0e5:

The test result seems to be affected by the kernel configuration used. Specifically, it fails when using the kselftest config fragment but it passes otherwise. For instance, same kernel version on kevin configured with defconfig+arm64-chromebook+videodec: https://linux.kernelci.org/test/case/id/64d6a79d4e3186aded35b230/ (https://lava.collabora.dev/scheduler/job/11267308)

After repeated testing to narrow down the problem, I found that for a fixed kernel version, configuration variant and device, the test may either pass or fail depending on the KernelCI-core version used to build the kernel image. Here are some results on kevin using the same kernel version (v6.2-rc7) and config variant (defconfig+arm64-chromebook+kselftest) in all cases:

Kernel image and modules built with kernelci-core tag kernelci-20230206.0:

Kernel image and modules built with kernelci-core tag kernelci-20230213.0:

The only noticeable difference in the logs is that the one that passes shows this:

<3>[    8.377861] IP-Config: Failed to open gretap0
<3>[    8.383330] IP-Config: Failed to open erspan0

which is missing in the one that fails. My theory is that both runs would fail to configure the network interfaces via DHCP, but in the case that passes the network auto configuration times out faster and the boot keeps going, while in the other case the auto configuration keeps trying for longer and it causes the LAVA job to time out and fail. But the only differences in the kernel config from the version that passes to the one that fails are these:

 DRM_MSM m -> y
 NFS_V3_ACL n -> y
 QCOM_OCMEM m -> y
+NFS_ACL_SUPPORT y

And I don't know how this would make any difference in this case.

@hardboprobot hardboprobot added this to Discovered in Kernel Bugs Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Kernel Bugs
Discovered
Development

No branches or pull requests

1 participant