-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serial USB: writing messages without reading causes the board to stop responding #182
Comments
I didn't try to reproduce using your code, but I've noticed this behaviour in my own projects as well. Writing to Serial without reading resulted in corrupted (overwritten) memory in other places. For that reason I allways use a '#ifdef DEBUG' to disable Serial debug-messages before using it in production (which saves a few kb as well). |
What I plan to do is to connect my Arduino with a Raspberry Pi over USB. I want to implement a simple JSON based communication protocol. So Arduino has to send messages to my Raspberry and the other way round. I cannot disable serial writes. I am not sure where exactly the bug is located. But the fact that something that obvious is not corrected tells me, that it is maybe the USB processor in the Arduino board and not a simple driver bug. |
This was btw my minimal example, which was deleted in stackoverflow, because it was not question related (I wonder why not all of the source is printed on this forum?! brackets after include don't work): I made a minimal example which shows one of the problems. As long as I use the serial monitor or a client reading the output everything is fine. If I close the serial monitor or my python script while the firmware/board is running I get a timeout. If I reduce the amount of text e.g. "FUCK IT" it takes very long to happen. If I send that much text it takes just seconds for the timeout to happen. I doubt this firmware is without obvious bugs and a mistake in the client (serial monitor or my script) causing the error is to exclude, because these program hinder the error to happen and the serial monitor is included in the Arduino SDK. I doubt I see just two possibilities atm: My board is defective #include <AP_Common.h>
#include <AP_Param.h>
#include <AP_Progmem.h>
#include <AP_HAL.h>
#include <AP_HAL_AVR.h>
// ArduPilot Hardware Abstraction Layer
const AP_HAL::HAL& hal = AP_HAL_AVR_APM2;
inline void foo_loop() {
static int timer = 0;
int time = hal.scheduler->millis() - timer;
if(time > 100) {
hal.console->printf("FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT! FUCK IT!\n");
timer = hal.scheduler->millis();
}
}
void setup() {
hal.uartA->begin(115200);
}
void loop() {
foo_loop();
}
AP_HAL_MAIN();
|
Is here btw a possibility to mark it as a severe bug? |
I guess only the developers do that. Also, given that you're the first to report this bug, I'm not sure it warrants being marked as "severe", whatever that would mean exactly. You seem to be using some HAL library instead of the regular Arduino Serial object and milis() function. Does the problem also occur when you remove this HAL and use Arduino directly? Furthermore, you say:
What firmware are you referring to? Also, you say:
What kind of timeout is this? Looking at the code, it only does serial output, so if you close the serial monitor, you'll have no way to see what is happening inside your Arduino. How can you then conclude there is any problem in there? Finally, it would be best if you could edit your comment above and indent the code you pasted by at least 4 spaces, so github will show it in a code block and not eat the |
I've edited the comment for you, BTW the consideration made by @matthijskooijman are valid, what board are you working on, which libraries are you using and what Arduino IDE version? C |
Just an empty one with: if(time > 100) {
Well I was writing my own programs for that. First problem I recognized was, when I was writing a program which is sending to the board without clearing the input buffer. After some time I got processing times in the order of several seconds for one command (in the order of 100bytes or so). After adding a command to clean the input buffer everything was fine and board continued to respond.
Well and this was the final reason for me to ask around for someone to reproduce it! I work with an APM 2.5 from DIY drones with modified SDK 1.03. https://github.com/diydrones/ardupilot/tree/master/ArduCopter |
Atm I use a server script like this to communicate with the board. |
millis() returns a unsigned long, you're storing this in an int. int goes up to 32768, so after 32 seconds, things will start to behave weirdly. I'm wondering if this might be responsible for perhaps even all of the problems you're seeing? Perhaps you can change these to unsigned long and test again? As for trying to reproduce, it would be easier to remove this HAL thing and just use the Arduino API directly. That removes code and thus complexity. Having said that, I'd consider including this ArduCopter thing and trying your sketch as-is, but I'm still not sure what to test. If I upload your code and then not open (or open and close) the serial monitor, how can I tell it is or isn't working? Please be more specific about this. |
Well this unsigned conversion stuff is not nice, I admit it, but you calculated it wrong (it is way more then 32s until unexpected behavior appr. 596h, its an 32 bit int). Unfortunately it is not influencing the observed problem anyway. Edit: At least in my case I deal with 32 bit ints :) |
Huh? "int" on AVR is 16-bit, not 32 like on x86/amd64. long is 32-bit. |
Hmm, I was wrong in that. I change code and test it again. |
I exchanged the int with uint32_t. The timeout problem is still there. |
God I really don't understand this talking around the problem. I posted some example code just for illustration. You can easily remove the timer or replace the int with an unsigned and it will be still there. And no one willing for reproduction, even if it's just 5 lines of code. |
Ok, let me be more clear, then.
Saying all this probably sounds harsh, but I'm just trying to help you understand how things normally work and why things appear as they do. For future reference, the APM2.5 board that dgrat is using is specified to use "Atmel’s ATMEGA2560 and ATMEGA32U-2 chips for processing and usb functions respectively". This means that an Arduino Mega is probably best suited for trying to reproduce this problem. |
@matthijskooijman : As I wrote before @dgrat is not the only one. For me this issue was not very important since my usecases used Serial for debugging only (and there I can easy turn it of for production, which is the reason I never inverstigated any further but just acceptet that it sometimes makes a difference whether the serial-data is read or not).
|
Well it is atm. not so easy for me to write a super minimal example because I use a DIY board with some changes in the SDK. I will try to write my own build script with standard library and look how it works. For someone else on the other side it would be easy to do. |
Ah forget what I wrote before. It is working with the standard SDK totally fine! ERROR IS STILL THERE
|
Huh? You're saying it works fine and there is an error? I'm sorry, but I'm not sure what you mean here... |
I was not sure whether upload to APM 2.5 works with standard Arduino SDK. Just to illustrate what I plan to do with this board in realitiy: https://code.google.com/p/rpicopter/ |
And to end this. I did everything. |
Right, so you mean that using the standard Arduino API/SDK things are working (compiling, uploading, roughly doing what you'd expect), but also that the "timeout problem" also occurs with the sketch you pasted. Did I get that right? Good that there is now a minimal Arduino API-only sketch that shows the problem. However, it is still not entirely clear how to use this sketch to reproduce the problem (or rather, I might not have understood the problem completely correctly yet...). If I'd upload the sketch and attach the serial monitor, I'd expect I would see the text printed in the monito every 100ms. IIUC, this works as expected. However, when I disconnect the monitor, how can I tell what's happening inside the Arduino? How can I tell this "timeout" is happening? |
And don't forget that I can also cause these timeouts (as I wrote) just by sending rapidly to my Arduino messages without deleting the input buffer. If I would set here "writeTimeout" to anything like 0.5s I would get a timeout after some time of sending. This is an example which should work to cause problem, but atm I am not at home to test it again, I just typed it now.
However with a "ser,flushInput()" at the end no timeout is happeing. This is the most strange. edit: I hope it is better understandable now |
I can reproduce it also with other APM boards btw. They all show similiar behavior if they get stressed like explained. |
Great, that looks like something I can use :-) I tried this on an Arduino Mega and I think I can reproduce the problem now. However, one remark about the TX/RX leds: You shouldn't be using those as an indication of wether the main MCU is still running. These leds are controlled by the 32u2 that does USB->Serial conversion. It seems that, if nobody is listening on the USB side and thus the TX buffer of the 32u2 is full, the led stops blinking and remains full-on (though it occasionally blinks off, not sure why). This does not mean that the main MCU has stopped running or sending serial data. There is no flow control between the main MCU and the 32u2, so the main MCU will keep sending data, which is dropped by the 32u2. For this reason, I added a pin 13 blink to the sketch. Here's what I tested with:
I also increased the time to 1000ms, to make the blinking a bit more Now, if I just run the Arduino with or without serial console open, the pin 13 So, there's definately something fishy going on there. I tested this both with 1.0.5 (Debian version) as well as ide-1.5.x Thinking on this a bit more, I think I understand what is happening. Specifically, it will be caught in this loop indefinately When the python program is stopped and the data stream stops, this The timeout should be a just a few seconds, but I suspect it takes 10 In any case, the fix seems to be easy, just wait for a second (or e.g.:
Gr. Matthijs |
Oh, seems our comments got crossposted. As for the firmware running on the 16u2 / USB chip, see the links in my previous comment. However, you'll have to check with the APM folks to see what sources/version they used exactly, the ones I link are for the Arduino Mega 2560, I think. |
I am pretty sure they use the same firmware as it is also Mega 2560 based.
I will try your approach later when I am home and I hope it will work. Thanks so far.
Your explanation sounds reliable. My remote control works like the following: I send every 20 ms to a server running on the RPi a command. The command is parsed there and the server is forwarding a (shorter) command to the Arduino. Here my firmware is parsing this command and then doing calculating stuff, sensor readout and controlling motors. Thanks so far |
Reading your comment, I think there might be an additional problem. When the Arduino is running with the serial console detached, the buffer inside the 32u2 will fill up. Now, when you open the serial port from your python program, the main MCU will reset, but (I think) the serial buffer in the 32u2 is not flushed, causing you to read out some old bytes. In your case, I suspect these old bytes might be causing problems. Flusing the buffer on startup is probably the solution here. So, I'd do:
Btw, the boot.c I linked to is the bootloader, which runs in the main MCU on a reset. You asked for the firmware running in the 32u2 I think, which I think are here: https://github.com/arduino/Arduino/tree/master/hardware/arduino/firmwares/atmegaxxu2 |
Btw, I agree that my analysis so far wouldn't explain why things work for a few seconds and then start to timeout. However, perhaps things only appear to work (due to buffered bytes?), or perhaps the 16 vs 32bit problem from before also caused this. Best to apply my previous suggestion to at least fix all the problems we've diagnosed so far and then see if any problems remain and if so, see if you can also find a reduced example to reproduce those problems. |
I will post them, when I have them ready.
I know it was dumb of me and caused for sure undefinable behavior as well. Such things always cause undefined behavior. |
@dgrat: I did investigate a bit. The HardwareSerial actually will hang on write (https://github.com/arduino/Arduino/blob/master/hardware/arduino/cores/arduino/HardwareSerial.cpp#L467) whenever data is not read from the tx_buffer and the tx_buffer is full. So if you happen to implement a protocol that stops reading the serial interface on the pc while transmitting data to the arduino you migh run into a deadlock where both sides try to send data waiting for the other side to pick it up.
|
@ntruchsess, I think your analysis is wrong. The UART hardware on the Arduino side will keep transmitting bytes, eventually always emptying the tx buffer, even when they're not being read at the remote end. There is not flow control, so the buffer inside the 16u2 that does the serial-to-usb conversion will just overflow in this case, dropping bytes. It should not deadlock. |
@matthijskooijman We don't know but that assumption may not hold true for the APM 2.5 board. E.g. this depends on the mode XCK is configured. @dgrat: I think you should try changing the HardwareSerial.write Line 467 to 'if (i == _tx_buffer->tail) return 0;' and see whether this makes a difference. |
I still wait for delivery of my programmer :( Edit: So far I also have no clue how to replace the bootloader on the 32u2. But maybe I don't need a programmer :) |
@matthijskooijman is that also true on SAM boards like the Due? I believe I am having this same issue: board runs fine as long as you are not connected, and if you connect Native USB port to a computer it works fine too (sort of). But if you unplug from the computer, the next time the Arduino tries to send data over USB, it freezes up. |
@odbol, I'm actually not sure, haven't dug into the Due design much. But given that it uses a native USB port, just like the Leonardo, I think things will be different for the Due. |
http://stackoverflow.com/questions/20360432/arduino-serial-timeouts-after-several-serial-writes/20382547#20382547
There I made an example and it would be nice if someone tries to reproduce it.
The text was updated successfully, but these errors were encountered: