Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB serial repeated write calls cause race condition with autoflush #583

Closed
dardeshna opened this issue Jul 8, 2021 · 27 comments
Closed

Comments

@dardeshna
Copy link

When repeatedly calling Serial.write() at an interval close to TRANSMIT_FLUSH_TIMEOUT, I've found that messages are occasionally lost and phantom bytes are transmitted in their place. These bytes are always a subset of bytes sent in a previous transmission.

I believe that the autoflush is triggering in the middle of Serial.write(). This would cause tx_head (in usb_serial.c) in to be incremented as new bytes are being written to the old buffer, and on the next autoflush, garbage data from the new buffer is sent instead.

I have not had the time to replicate this issue in a simpler program, but have worked around it by calling the following prior to calling Serial.write() to kill the autoflush timer. The delay was necessary otherwise the timer would still occasionally interrupt after having called Serial.write().

USB1_GPTIMER0CTRL = 0;
delayMicroseconds(2);

Issue occurred on a Teensy 4.1 running Teensyduino 1.53.

@PaulStoffregen
Copy link
Owner

PaulStoffregen commented Jul 9, 2021

Please confirm if the issue still happens on Teensyduino 1.54 ?

If so, can you give me a complete program I can copy into the Arduino IDE to quickly reproduce the bug without guesswork?

@dardeshna
Copy link
Author

This program appears to demonstrate the bug I experienced. The output is erratic when the delay is set to 73us (TRANSMIT_FLUSH_TIMEOUT=75us), but 100us and no delay both work correctly and repeatedly print the same string.

void setup() {
  // put your setup code here, to run once:

  Serial.begin(115200);
  
}

void loop() {
  // put your main code here, to run repeatedly:

  char msg[] = "abcdefghijklmnopqrstuvwxyz";

  int count = 0;
  
  while (count<20) {
    Serial.write(msg+count, 5);
    delayMicroseconds(73);
    count++;
  }
  Serial.println();

  delay(250);

}

I can verify this is an issue in 1.54 in the next few days, but seeing that usb_serial.c is the same between versions I suspect the bug is still there.

@FrankBoesing
Copy link
Contributor

I can confirm this. On Windows, output of serial monitor is:

08:10:24.576 -> abcdebcdefcdefgdefghefghifghijhijklijklmjklmnklmnolmnopfghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwnopqropqrspqrstqrsturstuvstuvwtuvwx
08:10:24.816 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvnopqropqrspqrstqrsturstuvtuvwx
08:10:25.096 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
08:10:25.336 -> abcdebcdefcdefgdefghefghifghijhijklijklmjklmnklmnolmnopmnopqnopqrhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
08:10:25.576 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnonopqropqrspqrstqrsturstuvmnopqnopqropqrspqrstqrsturstuv
08:10:25.576 -> pqropqrspqrstqrsturstuvtuvwx
08:10:25.576 -> vwx

@FrankBoesing
Copy link
Contributor

Tried with Coolterm. Same issue, so it is not TD serial monitor related.

@FrankBoesing
Copy link
Contributor

FrankBoesing commented Jul 9, 2021

If i add some DSB to schedule transfer and move the interrupt disable to the beginning, it changes its behavior:
Then, there a long periods (several seconds) of correct ouput, followd by a larger number of corrupted lines.
So, it seems not related to serial only.
Why is a MTP workaround in the core? I guess, if we're able to fix the issue, MTP will work without workaround, too.

Unfortunately, i have to pause now, can't dig deeper for today.

@FrankBoesing
Copy link
Contributor

output, then:

09:52:24.152 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:24.430 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:24.673 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:24.912 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:25.192 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:25.430 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:25.672 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:25.912 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrstuklmnostuvwtuvwx
09:52:26.193 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:26.432 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:26.673 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:26.952 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:27.192 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:27.432 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:27.672 -> abcdebcdefcdefgdefghefghifghijghijkhijklijklmjklmnklmnolmnopmnopqnopqropqrspqrstqrsturstuvstuvwtuvwx
09:52:27.952 -> abcdebcdefcdefgdefghefghifghijpqrstqrsturstuvdefghstuvwtuvwx
09:52:27.952 -> (many corrupted content, looks like a wrong pointer - Github does not display it.

@vjmuzik
Copy link
Contributor

vjmuzik commented Jul 9, 2021

I’m not sure if it’s related to this or not, but I ran into a similar issue just yesterday when using Serial.write and I wasn’t sure if it was just my setup or not. While trying to emulate a 9 byte protocol I noticed that the receiving application started getting extra bytes from seemingly nowhere thus shifting the 9 byte protocol by one or several bytes. Eventually it would add enough phantom bytes to wrap around back to valid data and the cycle would repeat itself.

@FrankBoesing
Copy link
Contributor

FrankBoesing commented Jul 9, 2021

Paul, I assume you can use a hardware debugger. Sometimes flash content is output to serial (wrong pointer). It should be easy / possible with a debugger to find the location where an address is assigned from the flash.

@FrankBoesing
Copy link
Contributor

With this program, you can see easyly what happens:

void setup() {
 Serial.begin(115200);
}
int packet = 0;
void loop() {
 // put your main code here, to run repeatedly:
 char msg[] = "abcdefghijklmnopqrstuvwxyz";
 int count = 0;
 while (count<sizeof(msg)) {
   delayMicroseconds(73);
   Serial.write(msg[count]);    
   count++;
 }
 Serial.print(" - ");
 Serial.println(packet++); 
 delay(500);
}

@FrankBoesing
Copy link
Contributor

FrankBoesing commented Jul 9, 2021

A working fix is to replace timer_start_oneshot(); with usb_serial_flush_output(); in usb_serial_write()

I know that's not ideal and not wanted.. but it may help to find the problem.

@FrankBoesing
Copy link
Contributor

@PaulStoffregen
Copy link
Owner

I am worried this fix may create new problems. It leaves tx_noautoflush set while waiting for buffers, even through a yield() call.

@FrankBoesing
Copy link
Contributor

Yup , I thought about yield, too. Perhaps switch autoflush on for yield?

@FrankBoesing
Copy link
Contributor

At least it shows what the problem is.
Maybe you find a better way to prevent autoflush.

@PaulStoffregen
Copy link
Owner

Does setting it at the first line inside the loop, and clearing it right before calling yield() still fix the problem?

@FrankBoesing
Copy link
Contributor

FrankBoesing commented Jul 9, 2021

Tried that.

Autoflush needs be disabled for the transmit after the inner while loop, too.

so,  disabling again  after the inner while loop works.

@FrankBoesing
Copy link
Contributor

Eh,. No.

After >50 lines i see still garbage.

@FrankBoesing
Copy link
Contributor

Please try with demo from here: #583 (comment)

@FrankBoesing
Copy link
Contributor

This works:

   tx_noautoflush = 0;
  yield();
  tx_noautoflush = 1;

@PaulStoffregen
Copy link
Owner

I've committed a fix.

fdfd18f

Please let me know if this fully solves all the problems you're seeing?

@dardeshna
Copy link
Author

Thanks for the quick responses -- I will not be able to test this fix until after the weekend but will update you then

@FrankBoesing
Copy link
Contributor

Looks good now.
The Windows-USB Bug is still existing (just mentioning here): In a tight, very fast print loop, a terminal software (TD monitor, but "CoolTerm" which is written in C too) it prints random PC-memory contents. But this has nothing to do with Teensyduino.

@PaulStoffregen
Copy link
Owner

PaulStoffregen commented Jul 10, 2021

I've seen the random Windows memory bug a couple times too. Kinda makes me wonder if we've stumbled upon a security vulnerability in Windows?

@FrankBoesing
Copy link
Contributor

Me too. But so far it appears to me it prints memory contents of the current program only.

@PaulStoffregen
Copy link
Owner

@vjmuzik - Any chance you could give fdfd18f a try? (and fwiw, I'm planning to dive into the FNET issue soon...)

@vjmuzik
Copy link
Contributor

vjmuzik commented Jul 13, 2021

It hasn't fixed my issue, but I believe it's more so a problem with how I'm interfacing to the receiving application than anything else. I'm using dual serial, so one is open in Teensyduino and the other is routed through Parallels Desktop to a serial port on a Windows virtual machine. The serial port is then opened in Trinamics TMCL-IDE, I haven't experienced this before so more than likely it's a product of the routing.

@dardeshna
Copy link
Author

The fix appears to have solved the original issue I was seeing. Thanks to everyone here for helping address it so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants