-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data loss in TCP rx #46
Comments
Hello. Let me answer directly.
|
Hi, thanks for answering
|
My tests [at least in the past] of downloading 10MB file over HTTP, I did not always receive all the bytes as server was supposed to send [based on HTTP header value]. I have received for example 90% of data, or sometimes 99% or sometimes 100%. So there is/was an issue on ESP side or any other, because data actually didn't arrive over UART to stack itself => stack had nothing to process. I cannot know that packet has not been received by the current AT setup.
This could be. The question is what is happening on your application side that you cannot read data faster? Writing any data to memory or processing byte-by-byte?
Application could, potentially, implement global callback notifying that there was overrun of data packets. But the way how netconn API is designed, having separate notification for each connection might not be possible. |
My test case is similar to yours, in that I am receiving 11.1Mbytes of data and creating a set of 2550 files out of it on EMMC. It is a sort of a tar file extracted on the fly on the receiver side. This extraction part is where I have the delays in processing the data [pls take into account I am using SPI as UART link, so I am receiving data @200+kB/s]. As far as I can tell, I do not have data loss issues on the UART or ESP32, as I am getting the expected result now that I properly size the memory AND the rx queue. Instead, if I reduce too much the memory allocated to the ESP_AT_Lib, I do get pbuf allocation errors. Likewise, if I reduce the What I am saying is that I feel that in these cases we should somehow report an error to the application. Can't we just behave as if we receive a connection drop event? This should work on the application side. I do not know how difficult it would be on the stack side. How about dropping all rx data from this point on, but sending a +CIPCLOSE to ESP32? This should generate a CLOSED event which in turn will be notified to the application. Can this work? |
If we modify this in netconn API to close it, then normally we break compatibility. However we can still try to implement it. Normally you would need to call And then your application will get |
Ok, maybe I can try to implement something. Not sure when I will do this though (it takes some time for proper testing). If the current solution with increased resources works well, I need to focus on sending the current version to pre-production. Just to complete the discussion however:
More in general, I feel we miss a way to cope with slow data processing. In my custom application protocol, for the time being I am lucky, because I have an upper limit to the incoming amount of data burst. But this is likely to change in the future, so I would like to have such a flow control mechanism available. What I was thinking of is some sort of low resources check (that could take into account a low memory threshold as well as rx queues depths thresholds). In case of low resources one could stop reading from the UART. In this case the ESP32 should fill its internal buffers and eventually map the condition into the usual 0-window mechanism. Of course all of this may not work if some of intermediate pieces does not handle correclty the buffer full condition. |
You can do all this manually and setting RTS pin high for esp32. How it works over SPI, I dont know. To answer your 2 points
|
Manual TCP receive implemntation is on-oing. However we need to wait Espressif to fix some bugs in it first. It is disabled in the lib for now. |
I'm running into similar issues; downloading a 3MB file and I can't get it "stable". At the best it's missing a handful of data blocks. I've set ESP_MEM_SIZE to 0x8000 and ESP_CFG_NETCONN_RECEIVE_QUEUE_LEN to 32. It shows no more signs of a buffer overrun on the applications receiving side. Do you have any ideas on how to get a reliable large data transfer? I'm using an ESP32 with the latest esp-at firmware, compiled from the repo (commit: 655467dae8aaca1ec2f1405e5eab7ce797515281). |
Download |
Ah, I see! I've tried to use the develop branch, but I'm having trouble migrating to CMSIS_V2. It's not compatible with some of the libs in my project. E.g. my uSD driver for FatFS still uses CMSIS methods for semaphores and message queues. |
This should be few minutes of work. You need to, either modify stm32 driver, or create new one based on CMSIS. |
Yeah, that's what I thought as well. But for some reason it only starts the init_thread task and none of the subsequent threads. Your STM32F769i-discovery example is having the same issues. Not sure what I'm doing wrong here... |
It looks like the SysTick is no longer working |
@tvandergeer I have tested on F769 and cannot reproduce this problem. Directly example as on repository. |
Not sure what I'm missing here... For some reason I'm running into issues with the TaskScheduler. It's no longer working correctly when using CMSIS_V2. I'm now using develop but reverted the changes for CMSIS_V2 (from this commit). Now I was able to test receiving data with manual TCP option enabled. It's receiving a couple of blocks, but halts after a few seconds. I can see that it's still polling for data, but no data is coming. Is the ESP-32 firmware this buggy? |
Please note that message queie api is different in V2. Maybe this is a reason too. I will not say that my lib is bug free. Do you have log you can share? |
Yes, I made some progress! It looks like the manual TCP receive option only works reliable when you disable echo mode.
|
Do you have the log you can share? |
Yes, I'd be happy to! Which configuration do you want me to use to create a log? |
The one when it is not working. I'm interested in raw AT response from ESP and point where it fails. |
UART log (from ESP32 TX pin @ 2000000 baud): https://gist.github.com/tvandergeer/d2632fbe5550cfc9792edb02567500ac This is the file I'm downloading: https://gist.github.com/tvandergeer/d2118705607bf731cf7013adfde63104 It's using a configuration with manual TCP download and AT Echo on. Application log:
|
I've found that uncommenting this line greatly improves the stability of the download:
It's here: https://github.com/MaJerle/ESP_AT_Lib/blob/develop/esp_at_lib/src/esp/esp_int.c#L1522 |
Sorry, Ím a little bit lost. Do you have an example where communication fails? So that I can see AT commands info. |
This is a log when it fails. Please note the received data. The file is only partly downloaded. It just stops downloading. My guess is that the ESP "forgets" to send another +IPD line to signal new data. But when we start requesting for new data there is more data available. That's what the uncommented line does. |
Ok, so the issue is related to ESP not sending more +IPD info. If you enable uncommented area as you mentioned, does it work stable then? |
Yes, it works much more stable when uncommenting that line |
Much nore stable means 100% stable or still some issues? |
I'll perform some more tests and share the results with you |
It is important to understand if it fails on lib stack or esp stack. And what has echo to do here (maybe esp issue, dont know yet). |
I've performed 5 tests without AT echo. 4 were successfull and 1 test failed. It stopped midway. The curious thing was that the ESP send the following as last lines in the UART log:
This looks like conflicting information. On the +IPD line it indicates that there's 5176 bytes available, but the +CIPRECVLEN line indicates an empty buffer. This looks like an ESP bug (but read on...) Subsequently I've performed 5 tests with AT echo. I had a similar success ratio but found the following in the last line of the UART data:
Notice the last lines. The Lib requested a "AT+CIPRECVLEN?" but before the answer is produced by the ESP a new "+IPD" line comes in, but this line is - probably - never processed by the Lib, because it was expecting an answer to the "CIPRECVLEN?" request. My conclusions are:
Do you agree? |
Thanks. I see where it goes sideways. Typical race condition and potential bug in ESP itself. Either they need to prevent sending +IPD messages when commands are active, or they update response before send it with last value. Could you please create a ticket in esp-at repo from Espressif and reference this one there? |
I have added repeated AT+CIPRECVLEN command if +IPD is detected. I could not reproduce this issue on my side, please try and let me know if it works more stable now. |
Works much better/more reliable now! Thanks! FYI. I've reported another issue in the esp-at repository regarding |
@tvandergeer can you please start new issue for this? |
I am having issues of incomplete data received on TCP links. My application protocol server expects bursts of 32kB of data in TCP and it times out because it only get a fraction of it (from 9 to 16kB usually).
After dumping the received data and comparing it with what was sent by the TCP client (windows PC) I see that big chunks of data are missing inside the received data. In other words, I only get the first and the last part of the 32k (the actual sizes vary).
After some debugging I think I confirmed this has to do with an invalid IPD buffer in
espi_process()
. At first I thought this was due to a failed pbuf allocation (I was getting allocation failures inesp_pbuf_new()
), due to the fact that the server may be slow in consuming the data in this particual case.So I tried to enlarge the available memory to 64kB (32kB at a time should be the biggest data flowing through the stack, only in the worst case that the server does not consume anything when the full burst is received).
However, this solved the
esp_pbuf_new()
allocation failures, but I still get NULLesp.m.ipd.buff
inespi_process()
.Then I have two questions:
esp.m.ipd.buff
is NULL? see the following codeshould we not fail (e.g. drop connection) in case
esp.m.ipd.buff
is NULL?The text was updated successfully, but these errors were encountered: