Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB freeze/halt when using NCM adapter for high-throughput #490

Closed
acagliano opened this issue Jul 4, 2024 · 6 comments
Closed

USB freeze/halt when using NCM adapter for high-throughput #490

acagliano opened this issue Jul 4, 2024 · 6 comments

Comments

@acagliano
Copy link

acagliano commented Jul 4, 2024

Issue: While the stack is running and the calculator is processing TX, RX, and Interrupt frames from an NCM device, out of nowhere there is a freeze/stall. It occurs within usb_HandleEvents() after execution of the user-defined callback function on processing event 13 (USB_DEVICE_INTERRUPT) in 100% of my reproduction attempts.
Code triggering: https://github.com/cagstech/lwip-ce/blob/master/src/drivers/usb-ethernet.c#L724

Possible dupe of #482

My ability to debug/disassemble is limited as I can only test this on hardware and thus cannot use breakpoints, step through a disassembly, etc. I have added some print statements to try to work out where the freeze occurs, which is how I've figured out what I have. Trying to work on some minimum reproducible code but the tricky part with that is I'm not sure how reliant this issue is on the IP stack's timings/throughput to trigger. I may modify usbdrvce directly for some additional debug prints. Any other debugging suggestions welcome.

I'm still treating this as an NCM bug for now and working on resolving it as such, but should a resolution for #482 resolve this as well, great.
cagstech/lwip-ce#5

@alessiodam
Copy link

Strange because it only happens on your end. For me it works fine.

@acagliano
Copy link
Author

One

Strange because it only happens on your end. For me it works fine.

One working device and one failing device is not enough to confirm if it is a bug or isn't, is an adapter quirk, or something else. So far I've gotten the same behavior off two NCM adapters. It's why I'd like to see if we can get others to try to reproduce.

@acagliano
Copy link
Author

Closing this as not a usbdrvce bug.
While not fixed entirely, increasing memory allotment to the stack makes the freeze take longer to occur, leading me to the belief its some sort of memory leak in my code or the pbuf allocator--or that it just needs more memory than it is being allotted.

@acagliano
Copy link
Author

May not be resolved by NCM fixes, but I'll wait for a fix for #482.
I'll work on something like Expanse that is lower latency and see if it still occurs.

@acagliano acagliano reopened this Jul 7, 2024
@acagliano
Copy link
Author

Reopening because I've identified at least what is happening but not why. Definitely not a dupe of the other bug. Enclosed a video of the issue. So it appears to occur after a period of a LOT of data hitting the endpoint (in this case the RX endpoint). After a while of this, the RX transfer callback return status is: USB_TRANSFER_CANCELED | USB_TRANSFER_BUS_ERROR. I retry that 3 times before the code assumes a stall and disables the device (to stop the leaking of memory that was previously occuring).

Unsure if this is a me issue or a usbdrvce issue. Can someone take a look please.

demo_lwip_ncm_bug.mov

@acagliano
Copy link
Author

So the issue is resolved with help from @commandblockguy.
NCM requires the usb transfer be at least NTB_MAX bytes, not ETHERNET_MTU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants