New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tps6598x_powerup failed for /arm-io/i2c0/hpmBusManager/hpm1 #101
Comments
Does it only happen on first boot, or also on chainloads? How often does it happen? What power supply are you using? |
I just did 25 boots, long-pressing the power-button between each, and got 6 failures. In four of the failure cases, no ttyACM* interface showed up; in the other two, it did, and five subsequent chainloads were successful. Chainloads were also always successful after an initial success. I don't have numbers yet, but issuing the "reboot" command to m1n1 sometimes results in a failure after an initial success. I've not seen it produce successes after an initial failure, but that may be trying to see a pattern where there is none... I'm using the original Apple power supply, directly connected to the MacBook with the original Apple cable. All appear undamaged, and I have not noticed problems with them except when using m1n1. I have basic electrical equipment but no second comparable Type-C power supply. |
I just did 25 boots, long-pressing the power-button between each, and got 6 failures. In four of the failure cases, no ttyACM* interface showed up; in the other two, it did, and five subsequent chainloads were successful.
My gut says "race condition".
|
I'll try to repro later. Just for completeness, what is the highest macOS/stub version that has ever been installed on that machine? (i.e. what is your SFR version?) |
sorry for dismissing this earlier. I've never seen this on my machine and also never heard about it and just assumed it happened due to some previous payload putting the TPS chip into a strange state. These delays in the tps code are already very suspicious though: https://github.com/AsahiLinux/m1n1/blob/main/src/tps6598x.c#L19 |
I also wonder if there is some magic bit to make the i2c hardware listen to clock stretching, which is a requirement for these chips. IIRC there's a VDM mode to put that I²C bus on debug pins; if I can repro I'll try to see what's going on at the hardware level there. |
@svenpeter42 I'm the one who has to apologize, because I did do things like that without fully understanding the consequences, just not in this specific case. @marcan I'm afraid I don't know how to find out, or at least I don't I've since upgraded, and it's the same problem. New version numbers as soon as I get back into macOS. System Firmware Version: 6723.140.2 |
Pretty sure this is fixed by d20a794. Please reopen if you see this issue again. |
I get a very similar behavior on my end, also shows up inconsistently. In my case it timeouts when talking over I2C to the hpm0. I saw it when either the USB to linux was connected or the power supply was connected during booting. |
I do have this behaviour on a yesterday built m1n1 directly after boot. I am running a M1 Max t6001,j314c where m1n1 reports the same issue but from addr 56 on hpm0. My other machine also has a lot of issues with the m1n1 gadget from time to time.
|
Argh, this issue just won't go away... Does the problem persist if you chainload m1n1 (e.g. via another, working port), or does it only happen directly after boot? |
I didnt try yet if it happens after chainload, I only noticed it when debugging and testing my kernel build because it happend every so often without touching anything on the setup |
I just started noticing this one after installing a new stub (12.1) and m1n1 (8887493). Note that the install didn't went smoothly and required a MacBookAir. MacOS 12.2 installed earlier today, but it had the public beta 12.2 RC in a separate partition for over a week. Ports config:
On the screen (manually copied - there might be typos):
At this point, /dev/ttyACM0 doesn't exist, so chainloading fails. I guess hpm1 is the front port. If I swap the two cables (data at the back, power/ethernet at the front), I get:
I haven't seen either of those errors in chainloads, but I didn't try that many times. Trying to narrow it down, I tried a few more things with the second cable config. With the changes cumulative, in the order they were done (with each tried up to 3 times):
Is there something that stays powered and configured as long as the power adapter is connected, and only resets once it gets disconnected? |
I spent a bit more time on this one today, trying to gather more data. Main observations:
The same sequence without the AV adapter is always successful. |
Welp, just ordered an AV adapter. Hope I can repro :) |
Fingers crossed. |
This may be worth trying:
|
Tested on top of fbc15bd, on a 12.0 stub. The procedure that was close to 100% reliable no longer is - but maybe it wasn't that reliable in the first place. In any case, I'm still seeing the bug easily. The new |
Just did one more experiment, based on pipcet's original hypothesis that a delay was too short. So:
When chainloading, I get:
with no significant variation. Upon cold boot/restart, I get something like
with only the 3rd one varying significantly, roughly between 480 and 710. I didn't try to understand why there's 4 calls in a case and 8 in the other. But... I can't reproduce the issue anymore, unless I change stub and go back to an official m1n1. |
So at least in some cases, poll32(1000) isn't long enough. Smallest I saw was 995887. In my tests, the 3rd one is the only one I saw below 999000. |
Repro'd fairly reliably with the AV adapter. |
Hopefully this will cover all corner cases of the Ace2s being slow. Issue: #101 Signed-off-by: Hector Martin <marcan@marcan.st>
1000 (1ms) does indeed sound low for the write timeout. I think that was copied and pasted from the read path, but in the read path we're just waiting for the STOP condition while in the write path we're waiting for data transfer from the FIFO. Either way, I've increased all low-level timeouts to 50ms. Hopefully that covers all possible clock-stretching delays introduced by the Ace2. This still does not quite explain the first case in this comment though... |
Thanks for the fix. I'll install it soon. |
(Split off from #97)
On my MacBook Pro, which has run other kernels and boot loaders but is currently loading vanilla m1n1, commit c2c6da3, directly from iBoot, I occasionally see messages like:
I've been unable to reproduce this at will. I've never seen the problem except when the power supply was connected to either one of the USB-C ports. However, it's always hpm1 that's mentioned in the error message.
In #72, @jannau mentions he had to add delays to the i2c code in order to avoid similar-sounding issues. Do you want me to try doing that?
The text was updated successfully, but these errors were encountered: