Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HardwareSerial::flush hangs on high speed communications. #1742

Closed
KurtE opened this issue Dec 16, 2013 · 11 comments
Closed

HardwareSerial::flush hangs on high speed communications. #1742

KurtE opened this issue Dec 16, 2013 · 11 comments
Labels
Type: Duplicate Another item already exists for this topic

Comments

@KurtE
Copy link

KurtE commented Dec 16, 2013

For the fun of it I started experimenting running some Dynamixel AX-12 servos connected up to an Arduino Mega (1280), with the standard Hardware Serial class, that externally I do a few tweaks to make it work at half duplex. The AX-12 interface runs at the baud rate of 1000000. I was able to get it to start working, but it would hang the processor on a pretty frequent basis. By instrumenting my code by bracketing some code fragments with digitalWrite statements and the use of my Logic Analyzer, I found the hang was in the flush function. So I took a look over the code in HardwareSerial.cpp, I found a race condition in the write function. The problem is that you clear the TXC condition, which at very high baud rates may happen after the last byte was output and the state was already set. Then when you call off to flush, the state will never be set again and you are hung in the loop.

The code also relies on every byte being output from the Interrupt hander, which for high baud rates will slow down. (Tried to add image, but would not take JPG file).

So I rewrote the write function:

size_t HardwareSerial::write(uint8_t c)
{
  // If the queue is empty and data register empty, we should simply be able to output the character
  // May only need to check for UDRE as unless interrupts are disabled the only case we should
  // have where it is empty, the interrupt handler should have handled this before we could see it
  if ((_tx_buffer_head == _tx_buffer_tail) && (*_ucsra & _BV(UDRE0))) {
    *_udr = c;
  } else {
    // Head and tail usage reversed?  Normally write to tail and read from head???
    int i = (_tx_buffer_head + 1) % SERIAL_BUFFER_SIZE;

    // If the output buffer is full, there's nothing for it other than to 
    // wait for the interrupt handler to empty it a bit
    // ???: return 0 here instead?
    while (i == _tx_buffer_tail)
      ;

    _tx_buffer[_tx_buffer_head] = c;
    _tx_buffer_head = i;

    // Now see if our UDREI interrupt is enabled.
    if (!(*_ucsrb & _BV(UDRIE0))) {
      // The interrupt is not yet enabled.  see if we need to feed it
      // the first character.
      if ((_tx_buffer_head != _tx_buffer_tail) && (*_ucsra & _BV(UDRE0))) {
        c = _tx_buffer[_tx_buffer_tail];
        _tx_buffer_tail = (_tx_buffer_tail + 1) % SERIAL_BUFFER_SIZE;
        *_udr = c;
      }
      // Then enable the interrupt
      sbi(*_ucsrb, _udrie);
    }  
  }      

  transmitting = true;
  return 1;
}

So far I have not had an issue with this updated code. Also from the logic Analyzer of the AX-12 buss, the output of a 9 byte packet went from taking about 144.5us to about 87us.

@matthijskooijman
Copy link
Collaborator

Nice job tracking down this issue! However, the issue you found is a duplicate of #1456, so it was already debugged before (and my pull request #1711 fixes this).

It is interesting that you say your improved code makes things faster. Are you saying the queueing of the 9 bytes is now faster, or also the actual transmission. Or, thinking about it, that is basically the same because 1 Mbit/s (== 1us/bit == 10us/byte) the queuing is (currently) slower than the sending.

Perhaps you could test my pullrequest to see if this fixes your issue and what the speed of it is? If it's still slower than 87us (or thereabouts), I'm happy to add another patch to my pullrequest to improve the speed, if possible.

@KurtE
Copy link
Author

KurtE commented Dec 16, 2013

Thanks, yes, it is the same issue. Will take a closer look at the code in your pull request. Need to figure out the easiest way to try out your code, as I am not currently enlisted... Looks like it is more involved change over several files. There are a few things in your changes that I need to look up. Things like the ATOMIC_BLOCK with a nested NONATOMIC_BLOCK, will it restore the atomic part once it leaves the NONATOMIC_BLOCK...

My first look through your changes may fix the issue. Just trying to decide if you need that much of the code set in ATOMIC blocks. In fact wondering if you simply need it around these two lines in the write function:

    _tx_buffer_head = i;
    sbi(*_ucsrb, UDRIE0);

Also a minor thing, but in your HardwareSerial1,2,3 files, don't think you need all of those conditional names for those interrupt vectors.

Again will try to figure out good way to get the necessary files to run with 1.5.5 or the nightly build
Kurt

@matthijskooijman
Copy link
Collaborator

The easiest way to get at the code is to clone the repository and get my branch: https://github.com/matthijskooijman/Arduino/tree/ide-1.5.x-hardware-serial

You can clone my repository directly:

git clone https://github.com/matthijskooijman/Arduino.git

And then I think you can do

git checkout ide-1.5.x-hardware-serial

to get the code from my branch. You can then either build the entire IDE and use the git version, or just copy the "hardware/arduino" folder from git and replace it inside the 1.5.5 IDE you have already downloaded.

@KurtE
Copy link
Author

KurtE commented Dec 17, 2013

Thanks,

I did the clone to my machine and then copied the appropriate files to my 1.5.5 IDE (HardwareSerial_._, Arduino.h, usbapi.h). Note when I tried to compile my program, it would not compile as it said _NOP was not defined...

To fix, I updated your definition in Arduino.h From:

#define NOP() do { _asm__ volatile ("nop"); } while (0)

To:

#define _NOP() do { __asm__ volatile ("nop"); } while (0)

There was a _ missing from the start of the name, as well as the asm was missing an _

The good news is then I ran my test program and so far I have not hung the board. However the output performance is about the same as the original code base. 148us for the 9 bytes.

I tried to include images showing the different Logic Analyzer outputs of a packet, but this editor keeps saying it does not sport my files (tried both .png and .jpg). If you are interested, I did post information about this up on the Trossen forum:

http://forums.trossenrobotics.com/showthread.php?6508-Dynamix-AX-12-with-a-regular-Arduino-%28not-Arbotix-%29

@KurtE KurtE closed this as completed Dec 17, 2013
@KurtE KurtE reopened this Dec 17, 2013
@matthijskooijman
Copy link
Collaborator

Hmm, I wonder where those underscores went... Thanks for pointing that out, I'll make sure I fix that in the pullrequest (my local toolchain has a newer libc that already defines _NOP, so it didn't error out here...).

As for the performance, I guess that means there is still some room for improvement in the code :-) I haven't got time right now to look at your code in detail and compare it with mine, but I certainly intend to do so later.

@KurtE
Copy link
Author

KurtE commented Dec 17, 2013

Thanks,
The main difference with my code, is that I only use the queue/interrupt handler when I have to. That is if the UDRE flag is set and our queue is empty, The write method simply stuffs the character in to the UDR register and returns. If these conditions are not true, it stuffs the character into the queue and then sees if the interrupt is enabled, if not it then checks to see if the queue is not empty, if so it then checks again to see if UDRE is set, if so it removes a char from the queue and outputs it to UDR, else it enables the UDRE interrupt... So in almost all cases you will not enable and use the interrupt until at least the third character is output and on really fast baud rates and smaller packets, I can imagine it may not enable the interrupt at all.

This avoids the interrupt overhead. My quick look at the output from avr_objdump -D of the generated elf file shows that the interrupt handler (__vector_37) uses about 86 clocks to handle the interrupt (16 push, 16 pop, ...) plus the actual code in Serial1._tx_udr_empty_irq. I have not counted the clocks in that function yet, but ...

But I will be happy with a release that does not hang! Good work

@matthijskooijman
Copy link
Collaborator

Ah, I see that now, yes. I guess we could do the same improvement on top of my code to get the speed gain.

With your fix, do you get the improved speed consistently? Or are there sometimes a few bits that are slower?

With your fix, I would expect the queuing of bytes (e.g. calls to write()) to happen faster than the actual sending of bytes. This means that eventually (and probably after the first couple of bytes already), the direct queueing of bytes cannot happen anymore because the hardware is transmitting one byte and has the next byte queued in UDRE already. When this happens, interrupts will be enabled, which should cause all bytes sent from that point onwards to be slower? But perhaps the extra slowness causes the buffer to be depleted, interrupts disabled and allows a few more bytes to be queued directly again. Also, perhaps the fact that the hardware has two queue bytes could maks this variation in queueing latency so all bytes still are sent at the nominal baudrate? Do you see what I mean here?

@matthijskooijman
Copy link
Collaborator

I applied your shortcut to my code and found out that at 1Mbit/s, the queueing of new bytes is exactly fast enough to keep the buffer full (effectively, the UART is just a slight bit faster than the incoming new bytes). However, if I lower the baudrate to 0.9Mbit/s, I can see the behaviour I described above: the supply of bytes catches up with the UART, the interrupt gets enabled and the next byte gets delayed. However, this is just one interupt, which then disables itself again and a few more bytes can be sent back-to-back (I think this is because the interrupt slows down the UART byte supply, but also halts the supply of new bytes. Because the UART can send bytes while the interrupt is running, the UART can catch up).

If the bitrate is lowered further, the interrupt gets enabled sooner and more often. At 0.5Mbit/s, the interrupt overhead no longer prevents the full bitrate from being attained. At 0.3Mbit/s, the interrrupt remains enabled all the time. At this speed (and higher speeds), the interrupt overhead and queuing of new bytes still takes up pretty much all the CPU time. At lower speeds, the (relative) overhead decreases to reasonable proportions (though this is including some digitalWrite calls for debugging, so the practical values might be a bit different).

This shows that, at speeds from 0.5Mbit to 1Mbit, it could be beneficial to skip the interrupt altogether - since we're using pretty much 100% CPU time already we won't lose any CPU time and gain a bit of serial transmission speed. However, I'm not sure if we can automatically detect this case and I don't reallly like hardcoding a threshold for the bitrate. Also, it's probably not a big deal if the speed is a bit less that the configured speed, especially if at 1Mbit/s the full speed is attained.

So much for my observations :-)

@KurtE
Copy link
Author

KurtE commented Dec 18, 2013

Thanks, It looks like it is performing like I would expect. At the fastest speed it can more or less completely bypass the interrupt handler. At some mid speeds it will be able to stuff a few out every so often and at lower speeds it will output almost everything using the interrupt handler. Will be interesting to try it out with some real world programs. For example may try with one of my Hexapod robots that talk to an SSC-32 controller at 115200. I suspect that most output will go through interrupt handler, but the code does do some computation for each servo before it outputs the position of it, so maybe a few of the characters will be output directly.

Again thanks!

Thanks for testing this out and your work on this!
Kurt

KurtE added a commit to KurtE/Phantom_Phoenix that referenced this issue Dec 22, 2013
Warning: WIP - I am using 1.5.5 and/or Daily Build.  Also working on
fixes to Hardware Serial on Arduino.

This is a work in progress.  Requires changes to Arbotix files.
Information in issues:
vanadiumlabs/arbotix#4
Also more information up on TrossenRobotics Forum:

http://forums.trossenrobotics.com/showthread.php?5786-Arbotix-on-Arduino-1.5(will-be-1.0.2)-IDE/page2

More Information about Hardware Serial issues on 1.5.5 are up on their
github:
arduino/Arduino#1742

Also in this first pass, added the ability to define for Quad instead of
Hex.

Note: Has code to bypass HardwareSerial write method which I am testing
currently.  Later if Arduino fixes issue can turn off and use ::write
and ::flush methods.
@matthijskooijman
Copy link
Collaborator

@cmaglie, this issue can be closed as well. The fix for the actual issue is merged in 4fb15f2 and the improvement suggested by Kurt is merged in 9ad14b2.

@cmaglie cmaglie closed this as completed Jan 29, 2014
@hallard
Copy link

hallard commented Jan 14, 2015

Hi there;
sorry to bring back this issue, but even if it is fixed in 1.5.x version I'm using 1.0.5 and 1.0.6 IDE and I have exactly the same problem that flush cause lookup at high speed (250000)

Any solution to fix that in old version ?

Thank you for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Duplicate Another item already exists for this topic
Projects
None yet
Development

No branches or pull requests

5 participants