Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss in TCP rx #46

Closed
sysopch opened this issue Sep 20, 2019 · 35 comments
Closed

Data loss in TCP rx #46

sysopch opened this issue Sep 20, 2019 · 35 comments

Comments

@sysopch
Copy link

sysopch commented Sep 20, 2019

I am having issues of incomplete data received on TCP links. My application protocol server expects bursts of 32kB of data in TCP and it times out because it only get a fraction of it (from 9 to 16kB usually).

After dumping the received data and comparing it with what was sent by the TCP client (windows PC) I see that big chunks of data are missing inside the received data. In other words, I only get the first and the last part of the 32k (the actual sizes vary).

After some debugging I think I confirmed this has to do with an invalid IPD buffer in espi_process(). At first I thought this was due to a failed pbuf allocation (I was getting allocation failures in esp_pbuf_new()), due to the fact that the server may be slow in consuming the data in this particual case.

So I tried to enlarge the available memory to 64kB (32kB at a time should be the biggest data flowing through the stack, only in the worst case that the server does not consume anything when the full burst is received).

However, this solved the esp_pbuf_new() allocation failures, but I still get NULL esp.m.ipd.buff in espi_process().

Then I have two questions:

  1. Is it correct that we drop the incoming data if esp.m.ipd.buff is NULL? see the following code
            if (len > 0) {
                if (esp.m.ipd.buff != NULL) {   /* Is buffer valid? */
                    ESP_MEMCPY(&esp.m.ipd.buff->payload[esp.m.ipd.buff_ptr], d, len);
                    ESP_DEBUGF(ESP_CFG_DBG_IPD | ESP_DBG_TYPE_TRACE,
                        "[IPD] Bytes read: %d\r\n", (int)len);
                    dbg_ipd_tag_count += len;
                } else {                        /* Simply skip the data in buffer */
                    ESP_DEBUGF(ESP_CFG_DBG_IPD | ESP_DBG_TYPE_TRACE,
                        "[IPD] Bytes skipped: %d\r\n", (int)len);
                    dbg_ipd_err = 1;    /******************** <<<breakpoint stops here>>> *******************/
                }
                d_len -= len;                   /* Decrease effective length */
                d += len;                       /* Skip remaining length */
                esp.m.ipd.buff_ptr += len;      /* Forward buffer pointer */
                esp.m.ipd.rem_len -= len;       /* Decrease remaining length */
            }

should we not fail (e.g. drop connection) in case esp.m.ipd.buff is NULL?

  1. Do you have any specific suggestion for further investigation in understanding why we have invalid buffers in the first place? Note that the communication is working with the same pattern in other test conditions. From the TCP server standpoint the working and not working case are quite similiar, the only difference I can think of is the speed of incoming data consumption, which can lead to higher buffering in the stack
@MaJerle
Copy link
Owner

MaJerle commented Sep 20, 2019

Hello. Let me answer directly.

  1. Yes, it is valid. We start reading data and first pbuf allocation after +IPD has been received. If it failes, we have to drop the data. Thats why there is a check. I also try to allocate new packet if there is more data to receive (after current is full). If you were not aware, library supports to set maximal pbuf size for receive data, regardless of packet size. So if packet size is 1kB and max pbuf size is 256B, stack will allocate 4 pbufs respectively. First one is always allocated just after +IPD parsing, next one is allocated just after previous one has been reported to upper layer.

  2. Check if you have iysues with receiving data. Maybe UART issues? Up to today I was not able to see similar problem. If there was a missmatch of characters on UART, system could de-sync.

@sysopch
Copy link
Author

sysopch commented Sep 23, 2019

Hi, thanks for answering

  1. There is something I am missing here. I agree that we have to drop data if we do not have space where to put it.
    But, as far as I understand, the +IPD packets we are receiving are part of a TCP payload, so they are supposed to be delivered to the socket with no losses. If packets have to be dropped at the IP level, TCP stack should handle retransmission and guarantee transparent delivery to the socket. I do not see how this is possible if we drop an +IPD packet here; is there some way to inform the stack that we did not receive the packet?
    Otherwise we should fail in some way. Better failing that letting the application get corrupted data without knowledge of it. Don't you agree?

  2. I think I have found why esp.m.ipd.buff is NULL. It is set to NULL further down in the code because esp_pbuf_free() returns espOKIGNOREMORE. Then I enlarged ESP_CFG_NETCONN_RECEIVE_QUEUE_LEN to 16 and this seems to solve my problems.
    Now I am trying to fine tune this parameter to make sure I don't run into exhausted queue entries again.
    Hower, also in this case where we run out of queue entries, my understanding is that the application should be notified of the failure. What are your thoughts?

@MaJerle
Copy link
Owner

MaJerle commented Sep 23, 2019

  1. There is something I am missing here. I agree that we have to drop data if we do not have space where to put it.
    But, as far as I understand, the +IPD packets we are receiving are part of a TCP payload, so they are supposed to be delivered to the socket with no losses. If packets have to be dropped at the IP level, TCP stack should handle retransmission and guarantee transparent delivery to the socket. I do not see how this is possible if we drop an +IPD packet here; is there some way to inform the stack that we did not receive the packet?
    Otherwise we should fail in some way. Better failing that letting the application get corrupted data without knowledge of it. Don't you agree?

My tests [at least in the past] of downloading 10MB file over HTTP, I did not always receive all the bytes as server was supposed to send [based on HTTP header value]. I have received for example 90% of data, or sometimes 99% or sometimes 100%. So there is/was an issue on ESP side or any other, because data actually didn't arrive over UART to stack itself => stack had nothing to process.

I cannot know that packet has not been received by the current AT setup.

  1. I think I have found why esp.m.ipd.buff is NULL. It is set to NULL further down in the code because esp_pbuf_free() returns espOKIGNOREMORE. Then I enlarged ESP_CFG_NETCONN_RECEIVE_QUEUE_LEN to 16 and this seems to solve my problems.
    Now I am trying to fine tune this parameter to make sure I don't run into exhausted queue entries again.

This could be. The question is what is happening on your application side that you cannot read data faster? Writing any data to memory or processing byte-by-byte?

Hower, also in this case where we run out of queue entries, my understanding is that the application should be notified of the failure. What are your thoughts?

Application could, potentially, implement global callback notifying that there was overrun of data packets. But the way how netconn API is designed, having separate notification for each connection might not be possible.

@sysopch
Copy link
Author

sysopch commented Sep 23, 2019

My test case is similar to yours, in that I am receiving 11.1Mbytes of data and creating a set of 2550 files out of it on EMMC. It is a sort of a tar file extracted on the fly on the receiver side. This extraction part is where I have the delays in processing the data [pls take into account I am using SPI as UART link, so I am receiving data @200+kB/s]. As far as I can tell, I do not have data loss issues on the UART or ESP32, as I am getting the expected result now that I properly size the memory AND the rx queue.

Instead, if I reduce too much the memory allocated to the ESP_AT_Lib, I do get pbuf allocation errors. Likewise, if I reduce the ESP_CFG_NETCONN_RECEIVE_QUEUE_LEN I get rx queue full condition.

What I am saying is that I feel that in these cases we should somehow report an error to the application. Can't we just behave as if we receive a connection drop event? This should work on the application side. I do not know how difficult it would be on the stack side.

How about dropping all rx data from this point on, but sending a +CIPCLOSE to ESP32? This should generate a CLOSED event which in turn will be notified to the application. Can this work?

@MaJerle
Copy link
Owner

MaJerle commented Sep 23, 2019

My test case is similar to yours, in that I am receiving 11.1Mbytes of data and creating a set of 2550 files out of it on EMMC. It is a sort of a tar file extracted on the fly on the receiver side. This extraction part is where I have the delays in processing the data [pls take into account I am using SPI as UART link, so I am receiving data @200+kB/s]. As far as I can tell, I do not have data loss issues on the UART or ESP32, as I am getting the expected result now that I properly size the memory AND the rx queue.

Instead, if I reduce too much the memory allocated to the ESP_AT_Lib, I do get pbuf allocation errors. Likewise, if I reduce the ESP_CFG_NETCONN_RECEIVE_QUEUE_LEN I get rx queue full condition.

What I am saying is that I feel that in these cases we should somehow report an error to the application. Can't we just behave as if we receive a connection drop event? This should work on the application side. I do not know how difficult it would be on the stack side.

How about dropping all rx data from this point on, but sending a +CIPCLOSE to ESP32? This should generate a CLOSED event which in turn will be notified to the application. Can this work?

If we modify this in netconn API to close it, then normally we break compatibility. However we can still try to implement it.

Normally you would need to call esp_conn_close just after this line:
https://github.com/MaJerle/ESP_AT_Lib/blob/master/src/api/esp_netconn.c#L192

And then your application will get espCLOSED event, but please note that same queue is used to receive data and to notify application about closed event => you may lost this event too then.

@sysopch
Copy link
Author

sysopch commented Sep 23, 2019

Ok, maybe I can try to implement something. Not sure when I will do this though (it takes some time for proper testing). If the current solution with increased resources works well, I need to focus on sending the current version to pre-production.

Just to complete the discussion however:

  1. I think we can avoid the issue of loosing espCLOSED event by probing the number of elements in queue, ensuring there is always a free place for a possible connection close event. This requires extending the esp_sys API though, maybe we can make it configurable and disabled by default to avoid breaking backwards compatibility
  2. as far as I understand, the same behavior should occour if esp_pbuf_new() fails in espi_process(). Do you agree?

More in general, I feel we miss a way to cope with slow data processing.
Usually in TCP this is done with the 0-window mechanism. Here we should somehow avoid to consume the data on the serial link, so that the stack on ESP32 knows that there is a processing bottleneck and can slow down the producer.

In my custom application protocol, for the time being I am lucky, because I have an upper limit to the incoming amount of data burst. But this is likely to change in the future, so I would like to have such a flow control mechanism available.

What I was thinking of is some sort of low resources check (that could take into account a low memory threshold as well as rx queues depths thresholds). In case of low resources one could stop reading from the UART. In this case the ESP32 should fill its internal buffers and eventually map the condition into the usual 0-window mechanism.
Maybe some sort of timeout is needed as a misbehaving thread could starve the whole connectivity forever.

Of course all of this may not work if some of intermediate pieces does not handle correclty the buffer full condition.

@MaJerle
Copy link
Owner

MaJerle commented Sep 23, 2019

You can do all this manually and setting RTS pin high for esp32. How it works over SPI, I dont know.

To answer your 2 points

  1. This is not possible as check for queue is not standard OS function. Furthermore you should never check if anything is in the queue (like you should never use feof function in C, google more more info).

  2. Yes, if no memory, we stop reading.

@MaJerle MaJerle closed this as completed Oct 13, 2019
@MaJerle
Copy link
Owner

MaJerle commented Oct 20, 2019

Manual TCP receive implemntation is on-oing. However we need to wait Espressif to fix some bugs in it first. It is disabled in the lib for now.

@tvandergeer
Copy link
Contributor

I'm running into similar issues; downloading a 3MB file and I can't get it "stable". At the best it's missing a handful of data blocks.

I've set ESP_MEM_SIZE to 0x8000 and ESP_CFG_NETCONN_RECEIVE_QUEUE_LEN to 32. It shows no more signs of a buffer overrun on the applications receiving side.
Worth noting that I've increased the ESP32's baud rate to 2152000 to get a decent transfer time for large files.

Do you have any ideas on how to get a reliable large data transfer? I'm using an ESP32 with the latest esp-at firmware, compiled from the repo (commit: 655467dae8aaca1ec2f1405e5eab7ce797515281).

@MaJerle
Copy link
Owner

MaJerle commented Nov 15, 2019

Download develop branch of this repository, enable manual TCP receive.
I know that ESP is known to not send all packets to UART port for some reason when not in manual receive mode. Especially on high data throughput.

@tvandergeer
Copy link
Contributor

Ah, I see! I've tried to use the develop branch, but I'm having trouble migrating to CMSIS_V2. It's not compatible with some of the libs in my project. E.g. my uSD driver for FatFS still uses CMSIS methods for semaphores and message queues.

@MaJerle
Copy link
Owner

MaJerle commented Nov 17, 2019

This should be few minutes of work. You need to, either modify stm32 driver, or create new one based on CMSIS.

@tvandergeer
Copy link
Contributor

Yeah, that's what I thought as well. But for some reason it only starts the init_thread task and none of the subsequent threads. Your STM32F769i-discovery example is having the same issues. Not sure what I'm doing wrong here...

@tvandergeer
Copy link
Contributor

It looks like the SysTick is no longer working

@MaJerle
Copy link
Owner

MaJerle commented Nov 18, 2019

@tvandergeer I have tested on F769 and cannot reproduce this problem. Directly example as on repository.

@tvandergeer
Copy link
Contributor

Not sure what I'm missing here... For some reason I'm running into issues with the TaskScheduler. It's no longer working correctly when using CMSIS_V2.

I'm now using develop but reverted the changes for CMSIS_V2 (from this commit). Now I was able to test receiving data with manual TCP option enabled. It's receiving a couple of blocks, but halts after a few seconds. I can see that it's still polling for data, but no data is coming. Is the ESP-32 firmware this buggy?

@MaJerle
Copy link
Owner

MaJerle commented Nov 18, 2019

Please note that message queie api is different in V2. Maybe this is a reason too.

I will not say that my lib is bug free. Do you have log you can share?

@tvandergeer
Copy link
Contributor

Yes, I made some progress! It looks like the manual TCP receive option only works reliable when you disable echo mode.

#define ESP_CFG_AT_ECHO                     0

@MaJerle
Copy link
Owner

MaJerle commented Nov 19, 2019

Do you have the log you can share?

@tvandergeer
Copy link
Contributor

Yes, I'd be happy to! Which configuration do you want me to use to create a log?

@MaJerle
Copy link
Owner

MaJerle commented Nov 19, 2019

The one when it is not working. I'm interested in raw AT response from ESP and point where it fails.

@tvandergeer
Copy link
Contributor

UART log (from ESP32 TX pin @ 2000000 baud): https://gist.github.com/tvandergeer/d2632fbe5550cfc9792edb02567500ac

This is the file I'm downloading: https://gist.github.com/tvandergeer/d2118705607bf731cf7013adfde63104
NB. It's a 3000000 Byte file I've created. Each line is 100 Byte (incl. newline) The first 10 chars are a decimal representation of the number of bytes of the file at the start of that line. I use this because it's easy to diff the downloaded file with the original so you can spot the discrepancies.

It's using a configuration with manual TCP download and AT Echo on.

Application log:

Default task started
Mounted FS successfuly
InitializingDisplay Manager started
 ESP-AT Lib
ESP callback evt: 0
Library initialized!
ESP callback evt: 1
Device reset detected!
ESP callback evt: 1
Device reset detected!
ESP callback evt: 3
ESP callback evt: 1
Device reset detected!
ESP callback evt: 2
ESP-AT Lib initialized!
Started Video Player task
ESP callback evt: 14
ESP callback evt: 15
ESP callback evt: 17
ESP callback evt: 19
Connected to network!
Station IP address: 10.0.5.203; Is DHCP: 1
Client netconn created
Connected to Server
Sent request to Server
Received response
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 2048
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 4096
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 6144
Bytes read: 1648 Written bytes: 1648
Received new data packet of 1648 bytes, total: 7792
Bytes read: 824 Written bytes: 824
Received new data packet of 824 bytes, total: 8616
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 10664
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 12712
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 14760
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 16808
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 18856
Bytes read: 1248 Written bytes: 1248
Received new data packet of 1248 bytes, total: 20104
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 22152
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 24200
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 26248
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 28296
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 30344
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 32392
Bytes read: 1648 Written bytes: 1648
Received new data packet of 1648 bytes, total: 34040
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 36088
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 38136
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 40184
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 42232
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 44280
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 46328
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 48376
Bytes read: 448 Written bytes: 448
Received new data packet of 448 bytes, total: 48824
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 50872
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 52920
Bytes read: 1648 Written bytes: 1648
Received new data packet of 1648 bytes, total: 54568
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 56616
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 58664
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 60712
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 62760
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 64808
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 66856
Bytes read: 1648 Written bytes: 1648
Received new data packet of 1648 bytes, total: 68504
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 70552
Bytes read: 1248 Written bytes: 1248
Received new data packet of 1248 bytes, total: 71800
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 73848
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 75896
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 77944
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 79992
Bytes read: 424 Written bytes: 424
Received new data packet of 424 bytes, total: 80416
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 82464
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 84512
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 86560
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 88608
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 90656
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 92704
Bytes read: 1648 Written bytes: 1648
Received new data packet of 1648 bytes, total: 94352
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 96400
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 98448
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 100496
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 102544
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 104592
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 106640
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 108688
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 110736
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 112784
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 114832
Bytes read: 1648 Written bytes: 1648
Received new data packet of 1648 bytes, total: 116480
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 118528
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 120576
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 122624
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 124672
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 126720
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 128768
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 130816
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 132864
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 134912
Bytes read: 2048 Written bytes: 2048
Received new data packet of 2048 bytes, total: 136960
Bytes read: 896 Written bytes: 896
Received new data packet of 896 bytes, total: 137856
Netconn timeout while receiving data. You may try multiple readings before deciding to close manually
Netconn timeout while receiving data. You may try multiple readings before deciding to close manually
Netconn timeout while receiving data. You may try multiple readings before deciding to close manually
Netconn timeout while receiving data. You may try multiple readings before deciding to close manually
Netconn timeout while receiving data. You may try multiple readings before deciding to close manually
Netconn timeout while receiving data. You may try multiple readings before deciding to close manually
Netconn timeout while receiving data. You may try multiple readings before deciding to close manually
Netconn timeout while receiving data. You may try multiple readings before deciding to close manually
Netconn timeout while receiving data. You may try multiple readings before deciding to close manually

@tvandergeer
Copy link
Contributor

I've found that uncommenting this line greatly improves the stability of the download:

            /* This one is optional, to check for more data just at the end */
            SET_NEW_CMD(ESP_CMD_TCPIP_CIPRECVLEN);  /* Inquiry for latest status on data */

It's here: https://github.com/MaJerle/ESP_AT_Lib/blob/develop/esp_at_lib/src/esp/esp_int.c#L1522

@MaJerle
Copy link
Owner

MaJerle commented Nov 20, 2019

Sorry, Ím a little bit lost.
UART log you shared with me seems to work OK. There is only single CIPRECVLEN check before reading the data, not before & after. I see last received byte was 896 bytes in your log & in UART log. This is the in line. After that I see no more data notification coming from ESP in UART log.

Do you have an example where communication fails? So that I can see AT commands info.

@tvandergeer
Copy link
Contributor

This is a log when it fails. Please note the received data. The file is only partly downloaded. It just stops downloading. My guess is that the ESP "forgets" to send another +IPD line to signal new data. But when we start requesting for new data there is more data available. That's what the uncommented line does.

@MaJerle
Copy link
Owner

MaJerle commented Nov 20, 2019

Ok, so the issue is related to ESP not sending more +IPD info.
Only option is yes, to manually read for changes, but ESP shall send notification about it.

If you enable uncommented area as you mentioned, does it work stable then?

@tvandergeer
Copy link
Contributor

Yes, it works much more stable when uncommenting that line

@MaJerle
Copy link
Owner

MaJerle commented Nov 20, 2019

Much nore stable means 100% stable or still some issues?

@tvandergeer
Copy link
Contributor

I'll perform some more tests and share the results with you

@MaJerle
Copy link
Owner

MaJerle commented Nov 20, 2019

It is important to understand if it fails on lib stack or esp stack. And what has echo to do here (maybe esp issue, dont know yet).

@tvandergeer
Copy link
Contributor

I've performed 5 tests without AT echo. 4 were successfull and 1 test failed. It stopped midway. The curious thing was that the ESP send the following as last lines in the UART log:

+IPD,4,5176
+CIPRECVLEN:-1,-1,-1,-1,0

This looks like conflicting information. On the +IPD line it indicates that there's 5176 bytes available, but the +CIPRECVLEN line indicates an empty buffer. This looks like an ESP bug (but read on...)

Subsequently I've performed 5 tests with AT echo. I had a similar success ratio but found the following in the last line of the UART data:

+IPD,4,424
AT+CIPRECVLEN?
+CIPRECVLEN:-1,-1,-1,-1,424

OK
AT+CIPRECVLEN?
+CIPRECVLEN:-1,-1,-1,-1,424

OK
AT+CIPRECVDATA=4,424
to read 424 bytes
+CIPRECVDATA:424,"185.xx.xx.xx",80,ijklmnopqrstuvwxyz!@#$%^&*()_+<>,.;'"[]{}
                                                                              0001952300:ABCDEFGHIJKLMNOPQRSTUVWXYZ`1234567890-=abcdefghijklmnopqrstuvwxyz!@#$%^&*()_+<>,.;'"[]{}
                                                                                                                                                                                 0001952400:ABCDEFGHIJKLMNOPQRSTUVWXYZ`1234567890-=abcdefghijklmnopqrstuvwxyz!@#$%^&*()_+<>,.;'"[]{}
                                             0001952500:ABCDEFGHIJKLMNOPQRSTUVWXYZ`1234567890-=abcdefghijklmnopqrstuvwxyz!@#$%^&*()_+<>,.;'"[]{}
                                                                                                                                                0001952600:ABCDEFGHIJKLMNOPQRSTUVWXYZ`1234567890-=abcdefghijklmnopqrstuvwxyz!@#$%^
OK
AT+CIPRECVLEN?

+IPD,4,2872
+CIPRECVLEN:-1,-1,-1,-1,0

OK

Notice the last lines. The Lib requested a "AT+CIPRECVLEN?" but before the answer is produced by the ESP a new "+IPD" line comes in, but this line is - probably - never processed by the Lib, because it was expecting an answer to the "CIPRECVLEN?" request.

My conclusions are:

  • My earlier hypothesis that the AT echo should be disabled is proven to be false. My new hypothesis is that the uncommented line (that makes an extra "CIPRECVLEN?") request made the difference in stability.
  • Uncommenting the line definitely improves stability, but not 100%.
  • The race condition receiving an "+IPD" line while in a "CIPRECVLEN?" request breaks things.

Do you agree?

@MaJerle
Copy link
Owner

MaJerle commented Nov 21, 2019

Thanks. I see where it goes sideways.
When ESP receives a command for ciprecvlen, it might be really 0 internally. And it prepares data to send 0. In the meantime +IPD is triggered and send updated value while response prepared for ciprecvlen holds old value.

Typical race condition and potential bug in ESP itself. Either they need to prevent sending +IPD messages when commands are active, or they update response before send it with last value.

Could you please create a ticket in esp-at repo from Espressif and reference this one there?

@MaJerle
Copy link
Owner

MaJerle commented Nov 30, 2019

I have added repeated AT+CIPRECVLEN command if +IPD is detected. I could not reproduce this issue on my side, please try and let me know if it works more stable now.

@tvandergeer
Copy link
Contributor

Works much better/more reliable now! Thanks!

FYI. I've reported another issue in the esp-at repository regarding +IPD messages in "passive mode":
espressif/esp-at#294
This can be solved on the library side as well by sending a AT+CIPRECVLEN? right after receiving a +LINK_CONN. Is that something you're inclined to do?

@MaJerle
Copy link
Owner

MaJerle commented Dec 6, 2019

@tvandergeer can you please start new issue for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants