-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INIT_STACK_ALL_ZERO - Framework Laptop system freezes (kernel oops?) on boot #1626
Comments
Not sure if I am just very confused, but... the dmesg output doesn't fit the config you posted. Now, using Arch's system clang 13.0.1 and |
I have Just to confirm, changing |
Hmmm, I cannot reproduce this in QEMU:
That could be due to a difference in hardware, meaning I do not load a driver that is problematic, although I would expect to see problems sooner, as the modules should be loaded during the initrd stage. |
Sorry, about that... Grabbed the wrong config. I've updated the original report with the correct one.
I installed the (INIT_STACK_ALL_ZERO) kernel on my desktop (AMD Ryzen 7 1700) and an older desktop I have sitting around (AMD FX8350), and both boot without issue. Perhaps the issue is specific to the hardware configuration of Framework Laptops? That seems a bit more likely to me than it being Tigerlake specific... |
The next steps would be to find the object file affected by this. First steps, you can use subdir-ccflags-y += -ftrivial-auto-var-init=uninitialized in various directories' Makefiles. Then, once we find which directory, we can pare this down further to just: CFLAGS_<object file name> += -ftrivial-auto-var-init=uninitialized to isolate the translation unit. Then finally, we can sprinkle This will probably take multiple builds+boots, @0x647262 . |
@0x647262 If you want help doing that with Arch's PKGBUILD, let me know, as I have done it before. |
After making those changes to a subdir's Makefile, what specifically will I be looking for during compilation / on boot?
No problem! It might not be the fastest turnaround, but I'm happy to work though this issue. @nathanchance Your help would be appreciated, since my PKGBUILD skills are a tad rusty. |
I'll do a write up on this process tomorrow then! |
Sorry, forgot to answer this. You are specifically looking for the issue to be resolved, as that is basically turning off |
Turns out I had some time today :) https://nathanchance.dev/posts/package-standalone-linux-kernel-with-abs/ @0x647262 that is how to actually generate the |
Only a day late: https://nathanchance.dev/posts/bisect-compiler-flag-problem-linux-kernel/ Honestly, I am not super happy with how I described the process but I could not come up with anything that I liked better, it is better than nothing, and I am willing to update it based on any questions that might come up during the process! |
So I've narrowed the bug down to a single directory:
To the top of the directory's Makefile, the issue is resolved. However, when I add individual cflags for the all of the resulting object files the module compiles, but the kernel panic is reintroduced. I'm a bit stumped on this one, because it seems like If someone could help point me back in the right direction here I'd greatly appreciate it! 😃 |
Awesome!
Ah In this particular case, it is because of the way that
then work off of For me, that produces: diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index d2b18f03a33c..fe0c5720ebe5 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -355,3 +355,25 @@ quiet_cmd_hdrtest = HDRTEST $(patsubst %.hdrtest,%.h,$@)
$(obj)/%.hdrtest: $(src)/%.h FORCE
$(call if_changed_dep,hdrtest)
+CFLAGS_gvt/aperture_gm.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/cfg_space.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/cmd_parser.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/debugfs.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/display.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/dmabuf.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/edid.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/execlist.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/fb_decoder.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/firmware.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/gtt.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/handlers.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/interrupt.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/kvmgt.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/mmio_context.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/mmio.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/opregion.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/page_track.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/sched_policy.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/scheduler.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/trace_points.o := -ftrivial-auto-var-init=uninitialized
+CFLAGS_gvt/vgpu.o := -ftrivial-auto-var-init=uninitialized and it works:
|
TIL! That's perfect, I'll poke at this some more tonight! |
Had some time this weekend to continue work on this. It seems that when WORKING | subdir-cflags-firmware.o.cmd.txt NONWORKING | file-cflags-firmware.o.cmd.txt Is there a way to manually place the EDIT: A diff for quick reference:
|
You mean "firmware" not "filename", yes? It seems firmware.o needs "uninitialized" after the -I args?? |
I used "filename" since it happens with all of the files in the i915/gvt directory.
That's correct. I tried modifying the |
The
That's what I'd expect. The last instance of |
I agree with what Nick said. All that means is that |
Right. I neglected to explain that this is occurring with all the files. I tried bisecting the Here's the patch I'm using on the i915 directory's Makefile: |
So |
The only difference is the order in which the flag is added:
That's what spurred my intial question regarding "manually overriding" the flag's placement. As far as I can tell, that's the only difference between the two methods. |
That is incredibly odd... Perhaps it might be better to try just removing
instead of |
So after a bit of finessing I've narrowed down the file to:
The |
I don't know a good way to do this, but if it helps, here's a diff with them all marked uninit, and you can just carve out portions of it...: I generated this with a Coccinelle script:
|
You should be able to take that monolithic patch, apply it, commit it, use |
Had some time to continue hacking on things again today and I've narrowed the issue down to the following line in
Using the patch:
Following the code a bit further I found the definition of the
Although I'm not certain whether or not this is the correct next step, I tried applying the following patch:
In an attempt to further narrow down the issue, but due to my limited knowledge of C, was met with the following warnings during compilation (truncated for berevity):
Which indicate to me that So I guess I'm ready for some more help on this one! (and a huge thanks to everyone who has helped me along with this issue so far! 😃) |
Aha! So this is why my Intel test box needs this patch to work properly. I never thought about it enough to realize this is the same problem. If the first if statement is not taken, |
Oh, neat! Here's to hoping a fix gets mainlined! 😃 |
Patch pinged: https://lore.kernel.org/YwPoCqvQ02kUl9tP@dev-arch.thelio-3990X/ (might take a bit to show up) |
Patch has been applied for 6.1: https://cgit.freedesktop.org/drm/drm-intel/commit/?id=c247cd03898c4c43c3bce6d4014730403bc13032 |
Looks like this snuck into 5.19.8 😃 https://lwn.net/ml/linux-kernel/1662629579112246@kroah.com/ If it's not an issue I'll close this out! |
Ah nice, I did not realize it had been applied to a fixes branch as well! https://git.kernel.org/linus/458ec0c8f35963626ccd51c3d50b752de5f1b9d4 Thank you again for the report and the time spent helping us get to the bottom of this! |
If this is not the correct place to report this bug, would someone be able to point me to the correct mailing list it should be reported to? I'm not 100% sure if this is Clang specific, since GCC 11.2.0 (as far as I am aware) does not trigger
CC_HAS_AUTO_VAR_INIT_ZERO
when building a kernel config.Hardware:
Framework Laptop (BIOS: 3.07)
OS:
Arch Linux
System Information:
Kernel Command Line:
Issue:
System boots from initrd completely fine, but after switching root freezes. When booted with the kernel command line option
oops=panic
the device's caps lock LED flashes after the freeze, indicating a kernel panic(?).Files:
init-stack-all-zero-config.txtEDIT: Corrected config init-stack-all-zero-config.txt
dmesg.log
The text was updated successfully, but these errors were encountered: