Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imxrt: serial dma rx occasionally reports incorrect byte #10912

Closed
danielappiagyei-bc opened this issue Oct 15, 2023 · 11 comments · Fixed by #11070
Closed

imxrt: serial dma rx occasionally reports incorrect byte #10912

danielappiagyei-bc opened this issue Oct 15, 2023 · 11 comments · Fixed by #11070
Assignees

Comments

@danielappiagyei-bc
Copy link
Contributor

danielappiagyei-bc commented Oct 15, 2023

Bug description

When using the LPUART serial driver with DMA for RX, I am seeing several dropped, missing, or otherwise incorrect bytes in the rxfifo (g_lpuartXfifo) the DMA places bytes in.

Partial solution

The major problem looks to be the imxrt_dma_nextrx() function here which can return one index past the end of the buffer when dmaresidual is 0 (all bytes in the dma transfer have been sent). The return value of this function is used without verifying that it's a valid index in the buffer. Correcting this bug causes missed bytes to occur less often (from several times a minute to only once or twice every few minutes) but does not fix the problem completely.

The only permanent fix I found is placing the dma's rxfifo inside the IMXRT's tightly coupled memory (TCM) region via __attribute__((section("<tcm region>"))), which is non-cacheable and super fast compared to SRAM. No incorrect bytes occur when I do this. This leads me to think there might be a caching issue.

Aside from that, I tried disabling write through cache (CONFIG_ARMV7M_DCACHE_WRITETHROUGH=n) since the chip's errata mentions "Cortex-M7: Write-Through stores and loads may return incorrect data" but that did not solve the issue.

Relevant Info

I setup the IMXRT to receive the same ~36 byte message over UART from another processor. The first few bytes of the message are a header containing sync bytes, a length, and a CRC32 precalculated by the other processor. I log anytime there's a mismatch in expected data. When disabling DMA and using interrupts for each character received, there are no incorrect bytes. When using DMA, there are. I've confirmed bytes aren't being dropped by framing, parity, overrun, or noise errors. I've also confirmed that the upper layer of the serial driver isn't causing this.

Other info:

  • sending the message every 10ms
  • 480600 baud rate
@danielappiagyei-bc
Copy link
Contributor Author

I have a PR up which fixes the overrun in imxrt_dma_nextrx(). The other half of the fix which I haven't included in the PR is to place the g_lpuart{1-8}rxfifo into TCM. It fixes the issue but what I don't understand is why our call to up_invalidate_dcache() here isn't sufficient.

@davids5 @xiaoxiang781216 Do either of u have any thoughts on what's going on?

@xiaoxiang781216
Copy link
Contributor

@davids5 is more familiar chip driver than me, do you have any suggestion? @davids5 .

@pkarashchenko
Copy link
Contributor

What is the size of the RX FIFO buffer in your case? Is that also a multiple of dcache line size?

@davids5
Copy link
Contributor

davids5 commented Oct 15, 2023

@danielappiagyei-bc

Great problem write up! This will help very much. Thank you for it!
Hopefully others can learn to put more detail in their issues and pull requests.

I will work on this is the LPI2C issue this coming week.

@danielappiagyei-bc
Copy link
Contributor Author

What is the size of the RX FIFO buffer in your case? Is that also a multiple of dcache line size?

Indeed! I've used 128 and 256 with no success

@danielappiagyei-bc
Copy link
Contributor Author

@danielappiagyei-bc

Great problem write up! This will help very much. Thank you for it! Hopefully others can learn to put more detail in their issues and pull requests.

I will work on this is the LPI2C issue this coming week.

Absolutely, and thank you David! We were on nuttx 10.x and were in need of a dma driver to speed up our uart comms. I spent a week reading the processor datasheet and exploring the code in imxrt_edma.c in preparation of writing my own driver. Luckily before implementing I stumbled upon your serial dma addition in nuttx 12 and we were able to upgrade and take advantage of all your hard work :-)

@danielappiagyei-bc
Copy link
Contributor Author

@davids5 do you have any idea why I would need to put the rxfifo's into non-cacheable TCM for things to work properly? Additionally, have you seen this behavior before? Also also, should I make a commit that puts the rxfifos into TCM since that is what worked for me or should we snoop around to see if there's another explanation for what's going on

@davids5
Copy link
Contributor

davids5 commented Oct 16, 2023

@danielappiagyei-bc
It has been found in our testing on the 1170 as well. We also found that placing it in no cacheable region fixed it as well. I will be trying to figure out the root cause this week.

@pkarashchenko
Copy link
Contributor

I have a PR up which fixes the overrun in imxrt_dma_nextrx(). The other half of the fix which I haven't included in the PR is to place the g_lpuart{1-8}rxfifo into TCM. It fixes the issue but what I don't understand is why our call to up_invalidate_dcache() here isn't sufficient.

@davids5 @xiaoxiang781216 Do either of u have any thoughts on what's going on?

Maybe you can move

  up_invalidate_dcache((uintptr_t)priv->rxfifo,
                       (uintptr_t)priv->rxfifo + RXDMA_BUFFER_SIZE);

from imxrt_dma_rxcallback to imxrt_dma_setup before imxrt_dmach_xfrsetup(priv->rxdma , &config); call? Not sure if that will work, but worth of trying

@davids5
Copy link
Contributor

davids5 commented Oct 24, 2023

@danielappiagyei-bc - I am looking into this. The DMA count when using the channel is circular mode counts from RXDMA_BUFFER_SIZE 32-31....1-32 in HW and is never 0 So that RXDMA_BUFFER_SIZE - dmaresidual will be 0 - 31.
Unless that is a race on the read. I have added an ASSERT and continue to test this.....

@davids5
Copy link
Contributor

davids5 commented Oct 24, 2023

@danielappiagyei-bc This resolve the issue #11020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants