Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBP14,1 not bootable on kernel 4.17 #62

Open
peterychuang opened this Issue Jun 6, 2018 · 62 comments

Comments

Projects
None yet
@peterychuang
Copy link
Contributor

peterychuang commented Jun 6, 2018

I'm not sure if I am alone here, but I haven't been able to boot my 14,1 with the mainline kernel since a few release candidates ago. I thought it would be somehow fixed by the time the stable kernel was out, but alas, both the latest Fedora rawhide image and the Arch Linux kernel from the staging repository aren't bootable as of today. My machine is stuck at a blank screen at boot, and there are no error messages whatsoever, so there isn't any clue here.

@ClashTheBunny

This comment has been minimized.

Copy link
Contributor

ClashTheBunny commented Jun 6, 2018

@peterychuang

This comment has been minimized.

Copy link
Contributor Author

peterychuang commented Jun 6, 2018

The failure seems to happen pretty early in the boot process. I can't even get to the point of decrypting the hard-drive, so I don't think X or wayland has anything to do with it. I've also tried removing quiet, and probably adding nomodeset too, but I don't see anything. Basically, the only thing I can see is the backlight from the moment I turn on the machine.

@chadberg

This comment has been minimized.

Copy link

chadberg commented Jun 6, 2018

Same for me with a 14,1 unit. I use refind, and selecting a 4.17 kernel now goes nowhere. I have had to revert back to using a 4.16 kernel. I've turned off graphics mode so it doesn't go to the blank 'backlight only' screen, but it also doesn't proceed any further. Even with debugging turned all the way up in the kernel parameters being passed I don't see anything. It's as if the handoff isn't completing for some reason.

@Dunedan

This comment has been minimized.

Copy link
Owner

Dunedan commented Jun 6, 2018

In Debian recently a similar bug showed up, related to the fix for CVE-2018-1108: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897572

I think in the end fontconfig was the culprit, but in any case it got fixed in Debian in the meanwhil.
While the bug wasn't fixed, a workaround for me was to simply swipe a finger a while over the touchpad to collect enough entropy to continue the boot process. That of course only works if applespi is already loaded.

I'm not sure if you're encountering the same bug, but it's at least worth a try.

@peterychuang

This comment has been minimized.

Copy link
Contributor Author

peterychuang commented Jun 8, 2018

I don't think that's the same bug. In any case, I tried your workaround, and it didn't work for me.

@Dunedan

This comment has been minimized.

Copy link
Owner

Dunedan commented Jun 9, 2018

I just confirmed that my MacBookPro13,2 still works fine with mainline 4.17 kernel. I'm using grub and no refind.

@risen

This comment has been minimized.

Copy link
Contributor

risen commented Jun 11, 2018

Same issue on my MacBookPro14,1 (booting from systemd-boot). Even without the "quiet" option, it shows just a blank screen instantly.

@risen

This comment has been minimized.

Copy link
Contributor

risen commented Jun 11, 2018

Anybody tried to git bisect? Was there any rc that was still working, or another known good commit?

@Strafos

This comment has been minimized.

Copy link

Strafos commented Jun 20, 2018

Same issue on my MBP14.1 with Arch Linux - stuck on boot with nothing showing up.
I downgraded to the 4.16.4 linux kernel and it fixed the issue for me.

@chadberg

This comment has been minimized.

Copy link

chadberg commented Jun 20, 2018

I've tested back to rc1 and can't boot. Going to do some further testing.

@bemeurer

This comment has been minimized.

Copy link

bemeurer commented Jul 5, 2018

I can confirm the same issue on my MacBool 14,1. Neither Arch nor Gentoo work with Linux 4.17, waiting for 4.18 and hoping that fixes it.

@chadberg

This comment has been minimized.

Copy link

chadberg commented Jul 5, 2018

From what I've seen in the 4.18 mainline kernel so far it hasn't been fixed.

@melentye

This comment has been minimized.

Copy link

melentye commented Jul 8, 2018

Not sure if helpful at all, but MacBookPro14,3 works fine with 4.17

@Dunedan

This comment has been minimized.

Copy link
Owner

Dunedan commented Jul 9, 2018

I just read on a german blog about somebody having the problem booting 4.17 kernel as well. The culprit for him was the kernel compression done with xz. After changing it to gzip it worked again for him. I don't know how your kernels are compressed, but could be worth a shot.

@chadberg

This comment has been minimized.

Copy link

chadberg commented Jul 9, 2018

Mine are compressed with gzip, so I don't believe that to be the issue.

@rwuwon

This comment has been minimized.

Copy link

rwuwon commented Jul 10, 2018

Fedora 28 / MBP 13,1: I'm still having no luck with booting kernel-4.17.3-200.fc28 - same behaviour as kernel-4.17.2-200.fc28 - I haven't looked at the xz thing (I don't really know how or whether it's applicable to updates via Fedora DNF).

Grub2, no quiet and no splash entries. Booting off external SSD through USB 3.0.

Interestingly, the status light on my cheap M.2-USB 3.0 enclosure completely stops blinking immediately around 7 seconds after I hit return on the 4.17 grub selection - could this be a clue?

@risen

This comment has been minimized.

Copy link
Contributor

risen commented Jul 16, 2018

I've been bisecting to figure out where it went wrong. First bad commit is supposedly this one:

torvalds/linux@1ea4fe8

commit 1ea4fe84973854a7302e4d1c479f10ae25a93e4a
Merge: ef61f8a340fd 4440977be134
Author: Ingo Molnar <mingo@kernel.org>
Date:   Mon Feb 26 08:45:20 2018 +0100

    Merge branch 'x86/boot' into x86/mm, to unify branches
    
    Both x86/mm and x86/boot contain 5-level paging related patches,
    unify them to have a single tree to work against.
    
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

Of course this is a merge; both branches seem to be "good", but the merge seems to be "bad".
So both torvalds/linux@ef61f8a and torvalds/linux@4440977 seem to be good.

I still want to double check this, and see if I can revert some of these.

@bemeurer

This comment has been minimized.

Copy link

bemeurer commented Aug 13, 2018

@risen Any updates on this? Does 4.17 and/or 4.18 work for you now?

@chadberg

This comment has been minimized.

Copy link

chadberg commented Aug 13, 2018

@bemeurer As of 4.18-rc8 this was still an issue for me.

@bemeurer

This comment has been minimized.

Copy link

bemeurer commented Aug 19, 2018

@chadberg, any news now that 4.18 is stable?

@chadberg

This comment has been minimized.

Copy link

chadberg commented Aug 25, 2018

@bemeurer It is not resolved as of the latest in the 4.18 line.

@bemeurer

This comment has been minimized.

Copy link

bemeurer commented Aug 26, 2018

@chadberg Sigh, what a shame. Thank you for keeping me updated, I've been dreaming about getting Linux on this machine since the day I got it; the wait continues.

@roadrunner2

This comment has been minimized.

Copy link
Contributor

roadrunner2 commented Aug 26, 2018

Somebody here (i.e. with a machine exhibiting this issue) needs to further bisect the problem. I suggest taking one of the branch parents and rebasing/cherry-picking the commits from the other branch on top of it, so that you get a nice linear history which you can bisect properly. E.g. something like (taking the commits @risen reported as good/bad):

git checkout -b debug ef61f8a340fd
git cherry-pick ef61f8a340fd..4440977be1
git bisect start HEAD ef61f8a340fd

There appear to be only a couple commits that need to be checked this way.

@risen

This comment has been minimized.

Copy link
Contributor

risen commented Aug 29, 2018

@roadrunner2 I haven't had time to continue digging into this, but from what I remember there was some issue with torvalds/linux@1ea4fe8 failing on one compile run, and working on another… so I wanted to verify if that one is indeed the first bad commit.

It'd be nice if someone with some time on their hands could do a bisection themselves and see if that commit truly is where the issue starts happening. Even if my bisect log is not completely accurate, I should've narrowed it down quite a bit.

git bisect bad 29dcea88779c856c7dc92040a0c01233263101d4
git bisect bad 29dcea88779c856c7dc92040a0c01233263101d4
git bisect bad 29dcea88779c856c7dc92040a0c01233263101d4
git bisect good 0adb32858b0bddf4ada5f364a84ed60b196dbcda
git bisect bad 97b1255cb27c551d7c3c5c496d787da40772da99
git bisect bad bb2407a7219760926760f0448fddf00d625e5aec
git bisect good 1c7095d2836baafd84e596dd34ba1a1293a4faa9
git bisect bad 2fcd2b306aa80771e053275ed74b2dfe7e3d1434
git bisect good 1159e09476536250c2a0173d4298d15114df7a89
git bisect good 8747a29173c6eb6f4b3e8d3b3bcabc0fa132678a
git bisect good e68b4bad71e8739d79f3c9580c719aa70c42fb96
git bisect bad 51c7eeba7975c1d2a02eefd00ece6de25176f5f3
git bisect good 672c0ae09b33a11d8f31fc61526632e96301164c
git bisect bad 0a1756bd2897951c03c1cb671bdfd40729ac2177
git bisect bad 3548e131ec6a82208f36e68d31947b0fe244c7a7
git bisect bad 1ea4fe84973854a7302e4d1c479f10ae25a93e4a
git bisect good 4440977be1347d43503f381716e4918413b5a6f0
git bisect good ef61f8a340fd6d49df6b367785743febc47320c1
# first bad commit: [1ea4fe84973854a7302e4d1c479f10ae25a93e4a] Merge branch 'x86/boot' into x86/mm, to unify branches
@christophgysin

This comment has been minimized.

Copy link
Contributor

christophgysin commented Oct 7, 2018

I have the same issue on my MBP 13,1

@aranega

This comment has been minimized.

Copy link

aranega commented Oct 24, 2018

@gdaddar I just saw in #29 that you are using a MBP 14.1 and your can boot with a 4.18.10 kernel? Does this kernel just works fine finally?

@chadberg

This comment has been minimized.

Copy link

chadberg commented Oct 24, 2018

It doesn't work on my MBP. Neither does the 4.19 mainline kernel....

@bemeurer

This comment has been minimized.

Copy link

bemeurer commented Oct 25, 2018

I just attempted to boot an Arch Linux liveusb, which is on kernel 4.18.9, and it failed. I will try Ubuntu 18.10 and report back.

@bemeurer

This comment has been minimized.

Copy link

bemeurer commented Oct 25, 2018

I am commenting this from the Ubuntu 18.10 LiveUSB!

Everything seems to work, except for the mousepad, keyboard, and bluetooth. I will diff the configs for Arch & Ubuntu and see if I can pinpoint where the issue is. Does anyone know whether ubuntu sources are patched? That could also be an interesting place to look.

@bemeurer

This comment has been minimized.

Copy link

bemeurer commented Oct 25, 2018

@roadrunner2

This comment has been minimized.

Copy link
Contributor

roadrunner2 commented Oct 25, 2018

Does anyone know whether ubuntu sources are patched? That could also be an interesting place to look.

Yes, they are patched. Info about their git trees is on their wiki, from which one can see that the git repo for 18.10 is at git://kernel.ubuntu.com/ubuntu/ubuntu-cosmic.git .

@bemeurer

This comment has been minimized.

Copy link

bemeurer commented Oct 25, 2018

@roadrunner2 One main difference I've found in the kernel configs is that the ubuntu config has all framebuffer hardware drivers (almost) enabled, while the arch config has them mostly disabled. Could it be that is what is keeping the kernel from appearing to boot? C.f. L5828<->L6130. (I'm using Meld to diff the files)

@roadrunner2

This comment has been minimized.

Copy link
Contributor

roadrunner2 commented Oct 26, 2018

@bemeurer I don't think so: AFAIK it uses the efifb (CONFIG_FB_EFI).

Maybe one thing to do is try compiling the Arch kernel with the Ubuntu config (or the result thereof after running "make oldconfig" on it) to see if the difference is due to the config or any patches.

@nickbooties

This comment has been minimized.

Copy link

nickbooties commented Oct 26, 2018

Everything seems to work, except for the mousepad, keyboard, and bluetooth.

@bemeurer Hey any chance you can confirm if audio is working?

@bemeurer

This comment has been minimized.

Copy link

bemeurer commented Oct 31, 2018

@nickbooties I'm sorry, I just exchanged my MacBook for a Lenovo P1, I got tired of not having Linux support. Hope this eventually gets fixed. Good luck!

@cyrusmg

This comment has been minimized.

Copy link

cyrusmg commented Nov 11, 2018

MacbookPro13,1 + ArchLinux + linux 4.20-rc1 + 4.18.0-10-generic config (from an answer above) + make oldconfig (with default values) + git revert e03fd3f300f6184c1264186a4c815e93bf658abb (to fix black screen mentioned in another issue).

No luck, screen is still black.

@ClashTheBunny

This comment has been minimized.

Copy link
Contributor

ClashTheBunny commented Nov 12, 2018

As a data point, I had to turn off graphical start by setting multi-user.target:
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt amdgpu.dc=0 systemd.unit=multi-user.target"
and starting a display manager in rc.local:
(sleep 2; /bin/systemctl restart lightdm.service) &

@bockjoo

This comment has been minimized.

Copy link

bockjoo commented Nov 16, 2018

It looks the kernel became not bootable exactly at 4.17, but it was bootable until 4.16.18.
After checking the difference,
I was able to hack linux-4.17 ( https://cdn.kernel.org/pub/linux/kernel/v4.x/inux-4.17.tar.gz ) and downgrade
it to make it look like linux-4.16.18 ( https://cdn.kernel.org/pub/linux/kernel/v4.x/inux-4.16.18.tar.gz )
by replacing only 3 files ( head_64.S , pgtable_64.c , misc.c ).

Here's the hacking recipe:
/bin/cp linux-4.16.18/.confg linux-4.17/
cp linux-4.17/arch/x86/boot/compressed/head_64.S linux-4.17/arch/x86/boot/compressed/head_64.S.original
cp linux-4.17/arch/x86/boot/compressed/pgtable_64.c linux-4.17/arch/x86/boot/compressed/pgtable_64.c.original
cp linux-4.17/arch/x86/boot/compressed/misc.c linux-4.17/arch/x86/boot/compressed/misc.c.original
cp linux-4.16.18/arch/x86/boot/compressed/head_64.S linux-4.17/arch/x86/boot/compressed/head_64.S # because that's the first difference in the start code See Fig.3 in https://www.ibm.com/developerworks/library/l-linuxboot/
cp linux-4.16.18/arch/x86/boot/compressed/pgtable_64.c linux-4.17/arch/x86/boot/compressed/pgtable_64.c # to get rid of trampoline_32bit_src
cp linux-4.16.18/arch/x86/boot/compressed/misc.c linux-4.17/arch/x86/boot/compressed/misc.c # to get rid of trampoline_32bit
cd linux-4.17
make menuconfig # save and exit without doing anything
make -j4 ; make modules_install ; make install ; systemctl reboot

I can boot the mbp14,1 using this hacked kernel (vmlinuz-4.17-with-4.16.18-head_64). But I don't know how to read the assembler code (yet).

@fzdarsky

This comment has been minimized.

Copy link

fzdarsky commented Nov 16, 2018

Same issue for me with F28 on MacBook Pro 14,1 (13", 2017, 2 Thunderbolt 3 ports).

I've ran a git bisect between kernel v4.16.15 and v4.17.3 (vanilla kernel with the kernel config from F28's v4.16.15 kernel), and the first bad commit is torvalds/linux@3548e131:

# first bad commit: [3548e131ec6a82208f36e68d31947b0fe244c7a7] x86/boot/compressed/64: Find a place for 32-bit trampoline

So this confirms @bockjoo's observation.

@christophgysin

This comment has been minimized.

Copy link
Contributor

christophgysin commented Nov 23, 2018

Has anyone managed to create a patch that reverts this on a recent kernel?

@christophgysin

This comment has been minimized.

Copy link
Contributor

christophgysin commented Nov 29, 2018

Thanks to @bockjoo, I managed to boot 4.19.5 by reverting the trampoline code back to v4.16.18, using:

$ git checkout v4.16.18 arch/x86/boot/compressed/{misc.c,head_64.S,pgtable_64.c}

Here's the patch:
https://github.com/christophgysin/linux/commit/8a8d97f90ddf5661ebceeef405e92e12a4aecf31.patch

@quasd

This comment has been minimized.

Copy link

quasd commented Nov 29, 2018

Can confirm the patch above works on arch. Can finally boot new kernel. tty for entering the encryption password is all scrambled but it boots if I enter the password blindly!

@cyrusmg

This comment has been minimized.

Copy link

cyrusmg commented Nov 29, 2018

I too can confirm this works on MacbookPro13,1 with Arch latest and kernel v4.20-rc1. Even LVM encryption password input screen works here. Bluetooth also works on this kernel (with set_device_wakeup commented out).

Kernel + patches:
https://github.com/cyrusmg/linux/tree/torvalds_v4.20-rc1

Edit: It seems the patch is not needed to get bluetooth working on v4.20 for MacbookPro13,1. I have posted to the other issue to make sure somebody else can confirm.

@chadberg

This comment has been minimized.

Copy link

chadberg commented Nov 29, 2018

Confirmed. With the trampoline regression reverted on 4.20-rc1, using Arch linux, bluetooth works with no additional patching needed. I'd experienced it working under an earlier release candidate before the regression was introduced as well, so I'm actually not surprised. Very, very, very pleased to have 4.20rc1 booting and bluetooth back.

@quasd

This comment has been minimized.

Copy link

quasd commented Dec 18, 2018

Any idea is this being worked on mainline?

  • meaning any idea will this ever show up on mainline or proper trampoline fix
@bockjoo

This comment has been minimized.

Copy link

bockjoo commented Dec 19, 2018

For my own education, I have been looking at head_64.S, but I am still trying to understand what's going on. If you read my previous posting, I had ~zero-knowledge on assembly. But now, I am
able to read some sections of the assembly code. But pgtable_64.c may be the issue, I don't know.

Also, I contacted a few expert developers ( these guys: kirill, glx, and hpa ), but they did not respond to my questions.

I could not find the issue option in the linux github.
It seems the only way to put a fix in the mainline could be creating a pull request that can not be
done until the issue is fully understood.
I am wondering if anybody other than a very tiny group of people in this issue would care, though.

@Dunedan

This comment has been minimized.

Copy link
Owner

Dunedan commented Dec 28, 2018

What's the status of this? Is somebody working on getting that fixed upstream?

@xtachx

This comment has been minimized.

Copy link

xtachx commented Jan 6, 2019

Unfortunately no - I cannot even find a bug report in the mainline kernel bugzilla for this. I will try to file one.

@xtachx

This comment has been minimized.

Copy link

xtachx commented Jan 20, 2019

Ok I am really sorry it took me a while to report this - life happens and I was quite busy. However the bug has been filed. Please add your observations and anything else that may help the devs help us. Please participate and CC yourself in the bugzilla page, especially if you have a MacBook Pro 14,1 and can reproduce the bug / test changes and solutions posted.

https://bugzilla.kernel.org/show_bug.cgi?id=202351

@xtachx

This comment has been minimized.

Copy link

xtachx commented Feb 19, 2019

The patch, submitted: https://lkml.org/lkml/2019/2/19/108

Huge thanks to everyone especially Kirill for helping us and Bockjoo who would test the patches before some of us on the west coast even woke up :P

This issue can now be closed - the patch will appear in the upstream kernel.

@Dunedan

This comment has been minimized.

Copy link
Owner

Dunedan commented Mar 9, 2019

Really great work. 👍
In the meanwhile Linus merged the patch into master, so it'll be part of Linux 5.1. I'll leave this issue open until then and close it once I added relevant the changes of 5.1 to the README.

Is anybody here motivated to tackle the Bluetooth issue (#29) with the non-TouchBar models as well and get that fixed upstream? That should be even easier than getting this bug fixed, as there are already patches available to get it working.

@risen

This comment has been minimized.

Copy link
Contributor

risen commented Mar 12, 2019

It's merged in Linux stable update v5.0.1, so I guess this issue can be closed now…

@aranega

This comment has been minimized.

Copy link

aranega commented Mar 18, 2019

The fix is part of the archlinux kernel 5.0.2. The kernel boots on my side (MBP 14.1), no problem, but the X server refuses to boot. I don't know if there is options that should be passed to the kernel to avoid that?

@xtachx

This comment has been minimized.

Copy link

xtachx commented Mar 18, 2019

@aranega Can you post the log files for Xorg so we can have a look? It may be a different issue in which case you should open a new thread so people can work on it with you.

@aranega

This comment has been minimized.

Copy link

aranega commented Mar 18, 2019

@xtachx Actually I don't even have a Xorg log file which is created. The startx command hangs with waiting for X server to begin accepting connections and it fails

@christophgysin

This comment has been minimized.

Copy link
Contributor

christophgysin commented Mar 18, 2019

I can confirm the same issue with archlinux on kernel 5.x. I haven't had time to investigate any further yet.

@Dunedan

This comment has been minimized.

Copy link
Owner

Dunedan commented Mar 18, 2019

Please open a separate issue if you have problems not related to the unbootable kernel discussed in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.