Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB Device getting stuck in .waitDma loop after rapid transfers #482

Closed
LukeBorowy opened this issue Mar 22, 2024 · 6 comments
Closed

USB Device getting stuck in .waitDma loop after rapid transfers #482

LukeBorowy opened this issue Mar 22, 2024 · 6 comments

Comments

@LukeBorowy
Copy link

When sending and receiving a lot of transfers sequentially, usb_HandleEvents can freeze and never return. This can be demonstrated using the attached project, which is a slightly modified version of the link_library example to send a bunch of transfers. To test this program you will need 2 calculators and the appropriate cable. This only ever happens on the "device" calculator, not the host.

link_library.zip

I added some debugging logs to usbdrvce and recompiled the toolchain to figure out where it was freezing. It is in the _ExecuteDma function, specifically getting stuck in the .waitDma loop. It does exit with an error if you exit on the host calculator, which is kind of weird to me since the device is the one frozen.

I discovered this bug when adding multiplayer support to my game. Normally, I am not sending nearly this much data. However, after a few minutes have passed (anywhere between 1-20), this happens. I believe it has something to do with the exact timing of send and receive transfers finishing, and it just takes a while to get unlucky. It happens very quickly when I spam transfers like this example, since it's much more likely to hit at the bad time. It also occasionally gives me bad/corrupted data on read instead of freezing, but that is harder to reproduce.

Video of the issue: Note when the device stops blinking. Notably, the host seems to think that the transfer of "H" was complete, and that it was now sending "Q". However, the device has frozen before it even returns from reading "H".

IMG_1315.MOV

Hopefully this is just a coding error on my part, but as of now it seems to be in the library.

@acagliano
Copy link

Do you know what the status code that happens is? In my issue which I thought I had fixed but apparently didn't, after a lot of sequential transfers all of a sudden something happens (either device or host, not sure) but any subsequent transfers queued up on the endpoint that is handling a lot start sending error code 80 (10100000 binary) and failing immediately. For the record that error is USB_TRANSFER_CANCELED | USB_TRANSFER_BUS_ERROR.

@LukeBorowy
Copy link
Author

LukeBorowy commented Jul 7, 2024

I don’t have access to calculators to test now, but I’m pretty sure the host didn’t get errors queuing a transfer, and the device didn’t either. For the host it just looked like the transfer was still in progress, not any error. The error that occurred (I think) was in when it was unplugged, at which point the device trying to read (understandably) got 003= USB_TRANSFER_STALLED | USB_TRANSFER_NO_DEVICE.

That’s what makes this so annoying. If the code got any sort of indication of an error when the issue actually happened, I could try to do something to reset the connection to make it respond again. However, the host thinks everything is fine and I can’t do anything on the device since it is frozen, so there’s no way to recover without physically disconnecting them. (I’m pretty sure that I checked for all the statuses on the host, but I can’t confirm that).

I noticed in your linked issue that it only happens with high traffic. In my case, it seems to freeze eventually even with low traffic, leading to my belief that it is something with the precise timings. High traffic just makes it more likely to hit at the “bad” time.

@acagliano
Copy link

Thanks for the response; I did source my issue and they are in fact not related; yours is actually in usbdrvce. Mine was me not doing a step in my driver code properly, though for a while there it was presenting as the same issue.

@mateoconlechuga
Copy link
Member

mateoconlechuga commented Oct 11, 2024

It looks like your main issue is calling link_Devices in the same loop you are performing your reading/writing - initially you should just do that to link the two devices, and then call link_Poll in the core loop. I'm also not sure if I'm running this correctly - if you have the time, a minimal reproducible example would be great, but based on my tests I can't see any issues with things freezing with the latest release of the toolchain and libraries. I'm going to close this issue for now, but if you still have issues after moving link_Devices, let me know and I can help you debug.

Update: I connected two calcs like your demo shows, but I used the supplied cable. Not sure about the cable setup you have there either.

@LukeBorowy
Copy link
Author

Thanks for taking a look at this. However, I am 100% sure that it is not because of link_Devices call. In the game code where I first encountered this problem, link_Devices is only called on one screen used when joining, and not in the main loop. The reason I kept it in the loop for this minimum example is because that is where it was located in the provided example file. I can place that call in an if that only triggers before a connection has been established and it will still freeze.

Also, if that was the issue you would expect it to freeze right away, but it only does rarely. The trigger for freezing consistently in this example is the scenario when both calcs are sending and receiving at a high rate. If you remove the section containing // both calcs will spam each other then there won't be a problem.

I'm not very familiar with how the calc handles USB interrupts and flags, but could it be because both sending and receiving is manipulating the same DMA finished flag? For example, the reading finishes and sets dma finished = 1, but then a sending starts and sets it to 0 before the code can notice and exit the waitDma loop? But maybe the order of events doesn't do that, I'm not sure.

I'm just trying to think why it appears to happen randomly, but more frequently when there is high traffic.

The cable setup is just a USB OTG adapter to make it function like a USB Mini A end for the host.
I had tested this with the most recent version and the issue still occurred.

@mateoconlechuga
Copy link
Member

Thanks for the reply, I'll try to build a minimal reproducible example that just spams the bus and see what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants