Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tps6598x_powerup failed for /arm-io/i2c0/hpmBusManager/hpm1 #101

Open
pipcet opened this issue Aug 24, 2021 · 23 comments
Open

tps6598x_powerup failed for /arm-io/i2c0/hpmBusManager/hpm1 #101

pipcet opened this issue Aug 24, 2021 · 23 comments

Comments

@pipcet
Copy link
Contributor

pipcet commented Aug 24, 2021

(Split off from #97)

On my MacBook Pro, which has run other kernels and boot loaders but is currently loading vanilla m1n1, commit c2c6da3, directly from iBoot, I occasionally see messages like:

TTY> i2c: want to read 1 bytes from addr 63 but can only read 0
TTY> usb: tps6598x_powerup failed for /arm-io/i2c0/hpmBusManager/hpm1.
TTY> usb: failed to init hpm1

I've been unable to reproduce this at will. I've never seen the problem except when the power supply was connected to either one of the USB-C ports. However, it's always hpm1 that's mentioned in the error message.

In #72, @jannau mentions he had to add delays to the i2c code in order to avoid similar-sounding issues. Do you want me to try doing that?

@marcan
Copy link
Member

marcan commented Aug 24, 2021

Does it only happen on first boot, or also on chainloads? How often does it happen? What power supply are you using?

@pipcet
Copy link
Contributor Author

pipcet commented Aug 24, 2021

I just did 25 boots, long-pressing the power-button between each, and got 6 failures. In four of the failure cases, no ttyACM* interface showed up; in the other two, it did, and five subsequent chainloads were successful.

Chainloads were also always successful after an initial success.

I don't have numbers yet, but issuing the "reboot" command to m1n1 sometimes results in a failure after an initial success. I've not seen it produce successes after an initial failure, but that may be trying to see a pattern where there is none...

I'm using the original Apple power supply, directly connected to the MacBook with the original Apple cable. All appear undamaged, and I have not noticed problems with them except when using m1n1.

I have basic electrical equipment but no second comparable Type-C power supply.

@alyssarosenzweig
Copy link
Member

alyssarosenzweig commented Aug 24, 2021 via email

@marcan
Copy link
Member

marcan commented Aug 24, 2021

I'll try to repro later. Just for completeness, what is the highest macOS/stub version that has ever been installed on that machine? (i.e. what is your SFR version?)

@svenpeter42
Copy link
Member

sorry for dismissing this earlier. I've never seen this on my machine and also never heard about it and just assumed it happened due to some previous payload putting the TPS chip into a strange state.

These delays in the tps code are already very suspicious though: https://github.com/AsahiLinux/m1n1/blob/main/src/tps6598x.c#L19
My first guess is the i2c read/write code. Especially the write part just pushes bytes into a FIFO without any kind of full/empty check.

@marcan
Copy link
Member

marcan commented Aug 24, 2021

I also wonder if there is some magic bit to make the i2c hardware listen to clock stretching, which is a requirement for these chips.

IIRC there's a VDM mode to put that I²C bus on debug pins; if I can repro I'll try to see what's going on at the hardware level there.

@pipcet
Copy link
Contributor Author

pipcet commented Aug 24, 2021

@svenpeter42 I'm the one who has to apologize, because I did do things like that without fully understanding the consequences, just not in this specific case.

@marcan I'm afraid I don't know how to find out, or at least I don't
know for sure which if any of these numbers you want:

I've since upgraded, and it's the same problem. New version numbers as soon as I get back into macOS.

System Firmware Version: 6723.140.2
System Version: macOS 11.5.2 (20G95)
Kernel Version: Darwin 20.6.0

@marcan
Copy link
Member

marcan commented Oct 10, 2021

Pretty sure this is fixed by d20a794. Please reopen if you see this issue again.

@marcan marcan closed this as completed Oct 10, 2021
@radnvlad
Copy link

I get a very similar behavior on my end, also shows up inconsistently. In my case it timeouts when talking over I2C to the hpm0. I saw it when either the USB to linux was connected or the power supply was connected during booting.

@fionera
Copy link

fionera commented Dec 19, 2021

I do have this behaviour on a yesterday built m1n1 directly after boot. I am running a M1 Max t6001,j314c where m1n1 reports the same issue but from addr 56 on hpm0.

My other machine also has a lot of issues with the m1n1 gadget from time to time.

[ 1219.881839] usb usb1-port5: attempt power cycle
[ 1220.325772] usb 1-5: new high-speed USB device number 36 using xhci_hcd
[ 1225.126606] usb 1-5: Device not responding to setup address.
--- SNIP ---
[ 1359.912438] usb 1-5: USB disconnect, device number 41
[ 1368.944728] usb 1-5: new high-speed USB device number 42 using xhci_hcd
[ 1370.500882] usb 1-5: New USB device found, idVendor=1209, idProduct=316d, bcdDevice= 1.00
[ 1370.500888] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1370.500890] usb 1-5: Product: m1n1 uartproxy f4e11a7
[ 1370.500892] usb 1-5: Manufacturer: Asahi Linux
[ 1370.581935] usb 1-5: can't set config #1, error -71

@marcan
Copy link
Member

marcan commented Jan 1, 2022

Argh, this issue just won't go away...

Does the problem persist if you chainload m1n1 (e.g. via another, working port), or does it only happen directly after boot?

@marcan marcan reopened this Jan 1, 2022
@fionera
Copy link

fionera commented Jan 2, 2022

I didnt try yet if it happens after chainload, I only noticed it when debugging and testing my kernel build because it happend every so often without touching anything on the setup

@jmranger
Copy link
Contributor

jmranger commented Jan 28, 2022

I just started noticing this one after installing a new stub (12.1) and m1n1 (8887493). Note that the install didn't went smoothly and required a csrutil clear at one point - I'm assuming it's unrelated.

MacBookAir. MacOS 12.2 installed earlier today, but it had the public beta 12.2 RC in a separate partition for over a week.

Ports config:

  • rear port connected to Apple AV adapter (USB-C to USB-C power + HDMI + USB-3), with power connected to the official MB Air charger, HDMI empty and USB-3 connected to a TP-Link UE300 Ethernet adapter)
  • front port connected to my development computer via a USB-C to USB-3 cable .

On the screen (manually copied - there might be typos):

Initialization complete.
Checking for payloads...
Devicetree compatible value: apple,j313
Unknown payload at 0x803854000 (magic: 30837271)
No valid payload found 
i2c: want to read 1 bytes from addr 63 but can only read 0
usb: tps6598x_powerup failed for /arm-io/i2c0/hpmBusManager/hpm1
.
usb: failed to init hpm1
dart: dart /arm-io/dart-usb0 at 0x382f80000 is a t8020
USB0: initialized at 0x803fac070
dart: dart /arm-io/dart-usb1 at 0x502f80000 is a t8020
USB1: initialized at 0x803fac560
Running proxy...

At this point, /dev/ttyACM0 doesn't exist, so chainloading fails. I guess hpm1 is the front port.

If I swap the two cables (data at the back, power/ethernet at the front), I get:

Initialization complete.
Checking for payloads...
Devicetree compatible value: apple,j313
Unknown payload at 0x8032ac000 (magic: 30837271)
No valid payload found
i2c: timeout while waiting for PASEMI_STATUS_XFER_BUSY to clear
after write xfer
tps6598x: writing TPS_REG_INT_CLEAR1 failed, written:-1
usb: unable to disable IRQ masks for hpm1
dart: dart /arm-io/dart-usb0 at 0x382f80000 is a t8020
USB0: initialized at 0x803a04070
dart: dart /arm-io/dart-usb1 at 0x502f80000 is a t8020
USB1: initialized at 0x803a04560
Running proxy...

I haven't seen either of those errors in chainloads, but I didn't try that many times.
The "Unknown payload" is there all the time - separate issue, I guess. [Edit: I created #158 for this.]
All errors are on hpm1, never on hpm0. So far, I get them on 30-50% of my attempts.

Trying to narrow it down, I tried a few more things with the second cable config. With the changes cumulative, in the order they were done (with each tried up to 3 times):

  • booted recoveryOS and changed the startup disk to my old 12.0 stub + mini f4e11a7: bad
  • disconnect the ethernet cable from the UE300: bad
  • disconnect the UE300: bad
  • disconnect the AV adapter, and reconnect the power source directly: ok
  • reconnect the AV adapter between the Mac and the power source: ok
  • reconnect the UE300: ok
  • reconect the ethernet cable: bad

Is there something that stays powered and configured as long as the power adapter is connected, and only resets once it gets disconnected?

@jmranger
Copy link
Contributor

jmranger commented Feb 1, 2022

I spent a bit more time on this one today, trying to gather more data. Main observations:

  • I'm only getting the PASEMI_STATUS_XFER_BUSY version today. Maybe because I've installed 12.3 beta since my previous comment.
  • Hard to say precisely for an intermittent issue, but I'm seeing roughly the same behavior regardless whether I use the official Apple power supply or an official Raspberry Pi 4 one (5V, 3A).
  • With either power supply connected directly to the Mac and nothing else connected, I'm unable to reproduce the issue.
  • I do have a very close to 100% failure rate with the following sequence:
  1. Ensure that the Mac is fully charged and powered down
  2. Connect the power supply to the AV adapter (A2119)
  3. Connect the adapter to the Mac (either port). The connection will trigger a powerup of the Mac.

The same sequence without the AV adapter is always successful.

@marcan
Copy link
Member

marcan commented Feb 2, 2022

Welp, just ordered an AV adapter. Hope I can repro :)

@jmranger
Copy link
Contributor

jmranger commented Feb 2, 2022

Fingers crossed.
I should have also mentioned that this was with m1n1 fbc15bd, and that it doesn't seem to matter if it's started from a 12.0 or 12.1 stub.

@povik
Copy link
Member

povik commented Feb 2, 2022

This may be worth trying:

diff --git a/src/i2c.c b/src/i2c.c
index 4998868..abd7f5d 100644
--- a/src/i2c.c
+++ b/src/i2c.c
@@ -20,6 +20,7 @@
 #define PASEMI_STATUS_XFER_ENDED BIT(27)
 
 #define PASEMI_CONTROL          0x1c
+#define PASEMI_CONTROL_ENABLE   BIT(11)
 #define PASEMI_CONTROL_CLEAR_RX BIT(10)
 #define PASEMI_CONTROL_CLEAR_TX BIT(9)
 
@@ -48,6 +49,13 @@ i2c_dev_t *i2c_init(const char *adt_node)
         return NULL;
     }
 
+    if (pmgr_adt_reset(adt_node)) {
+        printf("i2c: Error resetting %s\n", adt_node);
+        return NULL;
+    }
+    set32(base + PASEMI_CONTROL, PASEMI_CONTROL_ENABLE);
+    set32(base + PASEMI_STATUS, 0x6);
+
     i2c_dev_t *dev = malloc(sizeof(*dev));
     if (!dev)
         return NULL;
diff --git a/src/pmgr.c b/src/pmgr.c
index 9538e32..75a383d 100644
--- a/src/pmgr.c
+++ b/src/pmgr.c
@@ -229,6 +229,27 @@ int pmgr_reset(const char *name)
     return 0;
 }
 
+int pmgr_adt_reset(const char *path)
+{
+    const struct pmgr_device *device;
+    const u32 *devices;
+    u32 n_devices;
+    int ret = 0;
+
+    if (pmgr_adt_find_devices(path, &devices, &n_devices) < 0)
+        return -1;
+
+    for (u32 i = 0; i < n_devices; ++i) {
+        if (pmgr_find_device(devices[i], &device))
+            return -1;
+
+        if (pmgr_reset(device->name))
+            ret = -1;
+    }
+
+    return ret;
+}
+
 int pmgr_init(void)
 {
     pmgr_offset = adt_path_offset_trace(adt, "/arm-io/pmgr", pmgr_path);
diff --git a/src/pmgr.h b/src/pmgr.h
index c410c65..8a69177 100644
--- a/src/pmgr.h
+++ b/src/pmgr.h
@@ -14,5 +14,6 @@ int pmgr_adt_power_enable(const char *path);
 int pmgr_adt_power_disable(const char *path);
 
 int pmgr_reset(const char *name);
+int pmgr_adt_reset(const char *path);
 
 #endif


@jmranger
Copy link
Contributor

jmranger commented Feb 4, 2022

Tested on top of fbc15bd, on a 12.0 stub. The procedure that was close to 100% reliable no longer is - but maybe it wasn't that reliable in the first place. In any case, I'm still seeing the bug easily. The new i2c: Error resetting %s\n doesn't appear.
In today's test, having the AV adapter connected still significantly increased the chances of seeing the bug. Having the "serial port" USB cable connected significantly decreased the odds. And I did saw it on hpm0 too.

@jmranger
Copy link
Contributor

jmranger commented Feb 5, 2022

Just did one more experiment, based on pipcet's original hypothesis that a delay was too short. So:

diff --git a/src/i2c.c b/src/i2c.c
index 4998868..a4030b2 100644
--- a/src/i2c.c
+++ b/src/i2c.c
@@ -103,6 +103,7 @@ static size_t i2c_xfer_read(i2c_dev_t *dev, u8 *bfr, size_t len)
 
 static int i2c_xfer_write(i2c_dev_t *dev, u8 addr, u32 start, u32 stop, const u8 *bfr, size_t len)
 {
+    int waittime;
     if (start)
         write32(dev->base + PASEMI_FIFO_TX, PASEMI_TX_FLAG_START | (addr << 1));
 
@@ -117,7 +118,11 @@ static int i2c_xfer_write(i2c_dev_t *dev, u8 addr, u32 start, u32 stop, const u8
     if (!stop)
         return 0;
 
-    if (poll32(dev->base + PASEMI_STATUS, PASEMI_STATUS_XFER_BUSY, 0, 1000)) {
+    waittime = wait32(dev->base + PASEMI_STATUS, PASEMI_STATUS_XFER_BUSY, 0, 1000000);
+    if (waittime){
+        printf("i2c: PASEMI_STATUS_XFER_BUSY write wait was %d\n", waittime);
+    }
+    else {
         printf(
             "i2c: timeout while waiting for PASEMI_STATUS_XFER_BUSY to clear after write xfer\n");
         return -1;
diff --git a/src/utils.h b/src/utils.h
index 37f3e3e..f030cf5 100644
--- a/src/utils.h
+++ b/src/utils.h
@@ -403,15 +403,21 @@ void spin_unlock(spinlock_t *lock);
         flush_and_reboot();                                                                        \
     } while (0)
 
-static inline int poll32(u64 addr, u32 mask, u32 target, u32 timeout)
+static inline int wait32(u64 addr, u32 mask, u32 target, u32 timeout)
 {
     while (--timeout > 0) {
         u32 value = read32(addr) & mask;
         if (value == target)
-            return 0;
+            return timeout;
         udelay(1);
     }
 
+    return 0;
+}
+static inline int poll32(u64 addr, u32 mask, u32 target, u32 timeout)
+{
+    if(wait32(addr, mask, target, timeout))
+        return 0;
     return -1;
 }

When chainloading, I get:

TTY> i2c: PASEMI_STATUS_XFER_BUSY write wait was 999712
TTY> i2c: PASEMI_STATUS_XFER_BUSY write wait was 999712
TTY> i2c: PASEMI_STATUS_XFER_BUSY write wait was 999712
TTY> i2c: PASEMI_STATUS_XFER_BUSY write wait was 999712

with no significant variation. Upon cold boot/restart, I get something like

i2c: PASEMI_STATUS_XFER_BUSY write wait was 999999
i2c: PASEMI_STATUS_XFER_BUSY write wait was 999999
i2c: PASEMI_STATUS_XFER_BUSY write wait was 999500
i2c: PASEMI_STATUS_XFER_BUSY write wait was 999712
i2c: PASEMI_STATUS_XFER_BUSY write wait was 999999
i2c: PASEMI_STATUS_XFER_BUSY write wait was 999999
i2c: PASEMI_STATUS_XFER_BUSY write wait was 999712
i2c: PASEMI_STATUS_XFER_BUSY write wait was 999712

with only the 3rd one varying significantly, roughly between 480 and 710.

I didn't try to understand why there's 4 calls in a case and 8 in the other.

But... I can't reproduce the issue anymore, unless I change stub and go back to an official m1n1.

@jmranger
Copy link
Contributor

jmranger commented Feb 5, 2022

diff --git a/src/i2c.c b/src/i2c.c
index 4998868..958fb07 100644
--- a/src/i2c.c
+++ b/src/i2c.c
@@ -27,6 +27,28 @@ struct i2c_dev {
     uintptr_t base;
 };
 
+
+#define CACHESIZE 100
+u32 controlcache[CACHESIZE];
+u32 statuscache[CACHESIZE];
+int generalcache[CACHESIZE];
+int cacheindex = 0;
+static inline void addcache(u32 control, u32 status, int value)
+{
+    if(cacheindex < CACHESIZE) {
+        controlcache[cacheindex] = control;
+        statuscache[cacheindex] = status;
+        generalcache[cacheindex] = value;
+        ++cacheindex;
+    }
+}
+void printcache(void)
+{
+    printf("cachedump: %d\n", cacheindex);
+    for (int i=0; i<cacheindex; ++i)
+        printf("cachedump: %08x %08x %d\n", controlcache[i], statuscache[i], generalcache[i]);
+}
+
 i2c_dev_t *i2c_init(const char *adt_node)
 {
     int adt_path[8];
@@ -103,6 +125,9 @@ static size_t i2c_xfer_read(i2c_dev_t *dev, u8 *bfr, size_t len)
 
 static int i2c_xfer_write(i2c_dev_t *dev, u8 addr, u32 start, u32 stop, const u8 *bfr, size_t len)
 {
+    int waittime;
+    if(stop)
+        addcache(read32(dev->base + PASEMI_CONTROL), read32(dev->base + PASEMI_STATUS), -11);
     if (start)
         write32(dev->base + PASEMI_FIFO_TX, PASEMI_TX_FLAG_START | (addr << 1));
 
@@ -117,7 +142,9 @@ static int i2c_xfer_write(i2c_dev_t *dev, u8 addr, u32 start, u32 stop, const u8
     if (!stop)
         return 0;
 
-    if (poll32(dev->base + PASEMI_STATUS, PASEMI_STATUS_XFER_BUSY, 0, 1000)) {
+    waittime = wait32(dev->base + PASEMI_STATUS, PASEMI_STATUS_XFER_BUSY, 0, 1000000);
+    addcache(read32(dev->base + PASEMI_CONTROL), read32(dev->base + PASEMI_STATUS), waittime);
+    if (!waittime){
         printf(
             "i2c: timeout while waiting for PASEMI_STATUS_XFER_BUSY to clear after write xfer\n");
         return -1;
diff --git a/src/main.c b/src/main.c
index 5936ae4..81a756e 100644
--- a/src/main.c
+++ b/src/main.c
@@ -70,6 +70,7 @@ void run_actions(void)
     usb_init();
     usb_iodev_init();
 
+    printcache();
     printf("Running proxy...\n");
 
     uartproxy_run(NULL);
diff --git a/src/utils.h b/src/utils.h
index 37f3e3e..469e013 100644
--- a/src/utils.h
+++ b/src/utils.h
@@ -403,17 +403,24 @@ void spin_unlock(spinlock_t *lock);
         flush_and_reboot();                                                                        \
     } while (0)
 
-static inline int poll32(u64 addr, u32 mask, u32 target, u32 timeout)
+static inline int wait32(u64 addr, u32 mask, u32 target, u32 timeout)
 {
     while (--timeout > 0) {
         u32 value = read32(addr) & mask;
         if (value == target)
-            return 0;
+            return timeout;
         udelay(1);
     }
 
+    return 0;
+}
+static inline int poll32(u64 addr, u32 mask, u32 target, u32 timeout)
+{
+    if(wait32(addr, mask, target, timeout))
+        return 0;
     return -1;
 }
+void printcache(void);
 
 typedef u64(generic_func)(u64, u64, u64, u64);
TTY> Checking for payloads...
TTY> Devicetree compatible value: apple,j313
TTY> No more payloads at 0x8031f0000
TTY> No valid payload found
TTY> dart: dart /arm-io/dart-usb0 at 0x382f80000 is a t8020
TTY> USB0: initialized at 0x807954070
TTY> dart: dart /arm-io/dart-usb1 at 0x502f80000 is a t8020
TTY> USB1: initialized at 0x807954560
TTY> cachedump: 16
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 10010106 999999
TTY> cachedump: 00000904 10010106 -11
TTY> cachedump: 00000904 08010106 999808
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 08010106 996096
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 08010106 999713
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 10010106 999999
TTY> cachedump: 00000904 10010106 -11
TTY> cachedump: 00000904 08010106 999807
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 08010106 999713
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 08010106 999694
TTY> Running proxy...
TTY> Checking for payloads...
TTY> Devicetree compatible value: apple,j313
TTY> No more payloads at 0x80394c000
TTY> No valid payload found
TTY> dart: dart /arm-io/dart-usb0 at 0x382f80000 is a t8020
TTY> USB0: initialized at 0x8080b0070
TTY> dart: dart /arm-io/dart-usb1 at 0x502f80000 is a t8020
TTY> USB1: initialized at 0x8080b0560
TTY> cachedump: 16
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 10010106 999999
TTY> cachedump: 00000904 10010106 -11
TTY> cachedump: 00000904 08010106 999806
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 08010106 999712
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 08010106 999396
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 10010106 999999
TTY> cachedump: 00000904 10010106 -11
TTY> cachedump: 00000904 08010106 999806
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 08010106 999712
TTY> cachedump: 00000904 00010106 -11
TTY> cachedump: 00000904 08010106 999679
TTY> Running proxy...

So at least in some cases, poll32(1000) isn't long enough. Smallest I saw was 995887. In my tests, the 3rd one is the only one I saw below 999000.

@marcan
Copy link
Member

marcan commented Feb 6, 2022

Repro'd fairly reliably with the AV adapter.

marcan added a commit that referenced this issue Feb 6, 2022
Hopefully this will cover all corner cases of the Ace2s being slow.

Issue: #101
Signed-off-by: Hector Martin <marcan@marcan.st>
@marcan
Copy link
Member

marcan commented Feb 6, 2022

1000 (1ms) does indeed sound low for the write timeout. I think that was copied and pasted from the read path, but in the read path we're just waiting for the STOP condition while in the write path we're waiting for data transfer from the FIFO.

Either way, I've increased all low-level timeouts to 50ms. Hopefully that covers all possible clock-stretching delays introduced by the Ace2. This still does not quite explain the first case in this comment though...

@jmranger
Copy link
Contributor

Thanks for the fix. I'll install it soon.
Regarding the remaining i2c: want to read 1 bytes from addr 63 but can only read 0, I still see it once in a while (like now). No idea how to reproduce. I still mostly have the same hardware setup as last time, except that the UE300 has been replaced by a USB 2.0 thumbdrive.
If there's any debug code you'd like me to add to m1n1, let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants