Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serial crash when receiving incoming data faster than handled #293

Open
pludov opened this issue Apr 27, 2023 · 13 comments
Open

Serial crash when receiving incoming data faster than handled #293

pludov opened this issue Apr 27, 2023 · 13 comments
Labels
Bug Something isn't working

Comments

@pludov
Copy link

pludov commented Apr 27, 2023

Operating System

Linux

Arduino IDE version

Platform.io with

Board

pico

ArduinoCore version

earlephilhower#master

TinyUSB Library version

2.1.0

Sketch as ATTACHED TXT

The attached program just returns input or output '.' when no output is available, as fast as possible.
It limits the rate to 1000 characters per second, by using a delay in the loop.

serial-crash.zip

Using that program writing data from the host will quickly crash the pico (like by pasting 10 lines of codes again and again)
When crashed, the pico won't reboot for sketch upload.

Removing the delay in the loop seems to avoid crash of the program.

I've tried running on the second core, with the same results.

Compiled Log as ATTACHED TXT

No log available

What happened ?

The pico becomes unresponsive. Serial seems dead.

How to reproduce ?

Run the attached platform io program
Watch the serial monitor
Send bunch of data

The output will stop at some point, the pico won't even reboot.

Debug Log

No response

Screenshots

No response

@pludov pludov added the Bug Something isn't working label Apr 27, 2023
@pludov pludov changed the title Serial crash when incoming data buffers Serial crash when receiving incoming data faster than handled Apr 27, 2023
@pludov
Copy link
Author

pludov commented Apr 28, 2023

I could plug a debugger and got the following stack-trace on a crashed pico. It seems the ISR hangs when trying to acquire a lock that is already acquired by the interrupted code.

Maybe TinyUSB_Device_FlushCDC should acquire __usb_mutex from Adafruit_TinyUSB_rp2040.cpp, like it's done for TinyUSB_Device_Task ?

Stacktrace of the stuck pico :

    __best_effort_wfe_or_timeout_veneer@0x20000788 (Unknown Source:0)
    mutex_enter_block_until@0x2000041c (/mutex_enter_block_until.dbgasm:34)
    mutex_enter_timeout_ms@0x20000458 (/mutex_enter_timeout_ms.dbgasm:24)
    osal_mutex_lock@0x10008308 (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/osal/osal_pico.h:89)
        mutex_hdl = 0x2000206c <_ubsd_mutexdef>
    tu_edpt_claim@0x10008308 (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/tusb.c:121)
    usbd_edpt_claim@0x10007240 (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/device/usbd.c:1216)
    _prep_out_transaction@0x10004a4e (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/class/cdc/cdc_device.c:93)
    cdcd_xfer_cb@0x10004ff2 (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/class/cdc/cdc_device.c:453)
    tud_task_ext@0x10007858 (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/device/usbd.c:556)
    tud_task@0x100048ee (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/device/usbd.h:55)
    usb_task_irq@0x100048ee (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/arduino/ports/rp2040/Adafruit_TinyUSB_rp2040.cpp:86)
    <signal handler called>@0xfffffff9 (Unknown Source:0)
    mutex_enter_block_until@0x20000406 (/mutex_enter_block_until.dbgasm:26)
    mutex_enter_timeout_ms@0x20000458 (/mutex_enter_timeout_ms.dbgasm:24)
    osal_mutex_lock@0x10008308 (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/osal/osal_pico.h:89)
        mutex_hdl = 0x2000206c <_ubsd_mutexdef>
    tu_edpt_claim@0x10008308 (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/tusb.c:121)
    usbd_edpt_claim@0x10007240 (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/device/usbd.c:1216)
    tud_cdc_n_write_flush@0x10004b8c (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/class/cdc/cdc_device.c:194)
    TinyUSB_Device_FlushCDC@0x10003f0e (/home/ludovic/.platformio/packages/framework-arduinopico/libraries/Adafruit_TinyUSB_Arduino/src/arduino/Adafruit_TinyUSB_API.cpp:54)
    yield@0x10008516 (/home/ludovic/.platformio/packages/framework-arduinopico/cores/rp2040/delay.cpp:52)
    __loop@0x10008796 (/home/ludovic/.platformio/packages/framework-arduinopico/cores/rp2040/main.cpp:66)

anrp-tri pushed a commit to anrp-tri/arduino-pico that referenced this issue Jun 12, 2023
@anrp-tri
Copy link

anrp-tri commented Jun 13, 2023

FYI I ran in to a similar looking issue, so I speculatively tried this on a similar (?) codebase, ^ and it appears to have significantly improved on the issue of serial getting stuck (note, it still appears to happen, but less often.)

@doctea
Copy link

doctea commented Jul 27, 2023

Hi, I'm also running into a similar problem, where the RP2040 appears to lock up/freeze when connected over USB (serial and MIDI device). If I put a debugger on it, I'm also seeing that it seems to get stuck forever in __best_effort_wfe_or_timeout. The time it takes before this happens seems to vary, but nonetheless happens a lot.

Serial isn't actually necessary for my app, as its mainly a MIDI thing, but nevertheless I find that if anything is sent on the serial connection but the host doesn't open the connection, the rp2040 will freeze; while it also seems to randomly freeze after a while anyway even with the host servicing the serial, but nothing being sent to it.

@anrp-tri, I've attempted to apply your patch hoping it would ameliorate the problem, but when trying to compile the latest TinyUSB version with your patch applied, I hit up against problems. Are there any changes necessary to make it work with the up-to-date TinyUSB lib?

@doctea
Copy link

doctea commented Aug 6, 2023

Well, I managed to get @anrp-tri 's patch to apply OK in the end, but it hasn't seemed to improve much for me.

I'm unclear if the problem I was running into is the same one reported here, but for me I was getting repeatedly and intermittently getting stuck in best_effort_wfe_or_timeout and finding the timeout set to OSAL_TIMEOUT_WAIT_FOREVER, so I guess I was stuck in a deadlock and never returning.

Applying the patch below seems to have worked around this issue for me -- no other side effects observed yet, but maybe this causes problems elsewhere?

index 95cb55a..7dccf4e 100644
--- a/src/tusb.c
+++ b/src/tusb.c
@@ -118,7 +118,7 @@ bool tu_edpt_claim(tu_edpt_state_t* ep_state, osal_mutex_t mutex)

   // pre-check to help reducing mutex lock
   TU_VERIFY((ep_state->busy == 0) && (ep_state->claimed == 0));
-  (void) osal_mutex_lock(mutex, OSAL_TIMEOUT_WAIT_FOREVER);
+  (void) osal_mutex_lock(mutex, OSAL_TIMEOUT_NORMAL); //OSAL_TIMEOUT_WAIT_FOREVER);^M

   // can only claim the endpoint if it is not busy and not claimed yet.
   bool const available = (ep_state->busy == 0) && (ep_state->claimed == 0);

@SeongGino
Copy link

Oh my god is this actually my issue the whole time?

I'm currently working on an open source lightgun project using the RP2040 as my primary target and reference, using TinyUSB. At a certain point, one of the feature requests I've gotten is implementing support for MAMEHOOKER, which is a PC-side program that, among other things, sends a constant stream of serial commands.

After some time, however, the same exact thing being reported here happens; the serial line dies, the board becomes unresponsive, whatever buttons that were depressed stay stuck on/off, and I have to physically pull and reinsert the USB connection to get it back up and running.

This only happens with serial, but not just with ingest - even with adding just a simple Serial.println("text"); in the main program loop, doing this and making the device send USB data at the same time (in this case, mouse/keyboard updates or presses) will cause it to crash randomly. It's completely unpredictable when, or even if this happens.

I've tried to limit how often either serial is read or the button signals are sent (at this point now, serial and USB data essentially takes turns), but no matter what I do the crashing will happen eventually. And in this case, it's a key cornerstone of a feature request I've been adamant on implementing lately.

@SeongGino
Copy link

SeongGino commented Nov 22, 2023

I did start hosting a fork with anrp-tri's fix combined with doctea's addendum in the hopes that it resolves the issue in the short term.

However, I did just receive a crash (lockup) report, even with the patched core. Is @doctea's patch applied on top of anrp-tri's modified core, or to upstream?

Now I'm wondering; does this happen when using Pico SDK's USB stack? Or is this really exclusive to TinyUSB? I would test, but some of the input libraries I use rely on TinyUSB.

I've also also noticed we're all using RP2040 hardware - is this exclusive to the Pico and its derivatives? Or has it also happened on other boards when using the TinyUSB stack?

It seems like #238 is related.

@doctea
Copy link

doctea commented Nov 23, 2023

I did start hosting a fork with anrp-tri's fix combined with doctea's addendum in the hopes that it resolves the issue in the short term.

However, I did just receive a crash (lockup) report, even with the patched core. Is @doctea's patch applied on top of anrp-tri's modified core, or to upstream?

My patch applies fine onto the upstream without needing to apply anrp-tri's patches first.

I've just ended up reinstalling my libraries and so lost all the patches I'd previously applied to it, and had the dreaded return of these intermittent freezes. Since applying my patch from up above, I haven't had a freeze so far. As the freezes were intermittent anyway its too early to say for sure, but this might have solved the problem entirely for me. Quite possible that @anrp-tri's patch fixes something important too, or that there are side-effects from my fix that I haven't noticed yet.

I'll post back if I start getting crashes again!

(edit -- I also seem to be able to print to the USB serial monitor without crashing now too -- not sure if this is because of this fix, or something else that's changed, but definitely not something I was able to do before!)

@SeongGino
Copy link

SeongGino commented Nov 24, 2023

I did start hosting a fork with anrp-tri's fix combined with doctea's addendum in the hopes that it resolves the issue in the short term.
However, I did just receive a crash (lockup) report, even with the patched core. Is @doctea's patch applied on top of anrp-tri's modified core, or to upstream?

My patch applies fine onto the upstream without needing to apply anrp-tri's patches first.

I've just ended up reinstalling my libraries and so lost all the patches I'd previously applied to it, and had the dreaded return of these intermittent freezes. Since applying my patch from up above, I haven't had a freeze so far. As the freezes were intermittent anyway its too early to say for sure, but this might have solved the problem entirely for me. Quite possible that @anrp-tri's patch fixes something important too, or that there are side-effects from my fix that I haven't noticed yet.

I'll post back if I start getting crashes again!

(edit -- I also seem to be able to print to the USB serial monitor without crashing now too -- not sure if this is because of this fix, or something else that's changed, but definitely not something I was able to do before!)

I see! It may have been a mere misunderstanding on my part, because my initial instinct was to merge both your (and anrp-tri's) patches, but doing so I still got a report about those freezes.

I've since made a second package, just based on the latest 3.6.1 core from upstream with only your one-line patch in the tusb.c files, and so far it doesn't seem to cause an issue? My tester hasn't gotten back to me yet, but I hope it will at least hold things over in the short term--this problem was driving me up a darn wall otherwise. (^^;

EDIT: Got another crash report. I don't know if it's just that my use case is much more severe or what, in terms of data being sent in both directions, but this remains a showstopping issue.

@doctea
Copy link

doctea commented Nov 25, 2023

EDIT: Got another crash report. I don't know if it's just that my use case is much more severe or what, in terms of data being sent in both directions, but this remains a showstopping issue.

Dang, sorry to hear that :(.

FWIW my patch was originally intended to be applied in conjunction with @anrp-tri 's, so you weren't mistaken there, but my patch didn't rely on anything in anrp-tri's.

Maybe I've just got lucky that I've not seen any crashes in my project lately. The only other things I can think that I've tweaked in my code since the last crashes is to be more careful about enabling/disabling interrupts in critical sections of code, perhaps that's contributed to the apparently increased stability.

@SeongGino
Copy link

SeongGino commented Nov 25, 2023

FWIW, my code doesn't use interrupts--minus one hardware timer that updates a camera tick flag every ~4ms, but that's all.

I've looked at my external libraries that I use, and nothing else in either those libraries or the main sketch uses interrupts; there's only one other while() loop that gets triggered (for clearing an I2C buffer just-in-case for the camera) - the major things the code does is receives then processes serial data, and sends commands for the Arduino mouse/keyboard down the USB cable.

Perhaps if either I or my sole tester had a different board that wasn't an RP2040, it might be even more helpful now, aha. But it just so happens we both only have Pico-adjacent boards, as does everyone in this thread. (^^;

I could probably try and see if the Pico SDK USB stack changes things? But that would require redoing my project's input handling because we use a custom library tailored to TinyUSB for the absmouse & keyboard handling (and I'd probably explore that more if I didn't have other life obligations rn. Adulting is hard).

@doctea
Copy link

doctea commented Jan 10, 2024

Just wanted comment back here again as I've just spent several days trying to figure out why my project was crashing when running it from USB MIDI clock again, until reading through relevant issues here reminded me of my above patch and that I hadn't re-applied it since reinstalling libraries. Doing so seems again to have solved the problem with crashes.

I also note that the same workaround/fix (changing OSAL_TIMEOUT_WAIT_FOREVER to OSAL_TIMEOUT_NORMAL) solved the problem for them too in #238 .

What are the ramifications of this? Is there a more proper fix?

@SeongGino
Copy link

I can say I haven't run into crashes personally when using a forked version of Earle's Pico core with the workaround applied (I've heard one conflicting report otherwise, but have yet to hear back from them), and at least for my purpose it hasn't affected the stability of what I'm trying to do either.

Still, not an ideal circumstance, for sure.

@SeongGino
Copy link

SeongGino commented Jan 13, 2024

Okay, update: I'm indeed running into the problem anew.
Turns out it's because I had the TinyUSB library somehow installed by a dependency, which was overriding the fixed version built-in to the core. Removing the separate library version caused a change in file size and made it not hang again.

palthedog added a commit to palthedog/Adafruit_TinyUSB_Arduino that referenced this issue Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants