Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mbedtls_ssl_write returns: error: [write():207]: (0XFFFFFFB2) UNKNOWN ERROR CODE (004E) #3954

Closed
alorbach opened this issue Dec 12, 2020 · 13 comments

Comments

@alorbach
Copy link

alorbach commented Dec 12, 2020

I am using mbedtls in a typical esp32 devkit with ESP-IDF release 3.3 with arduino-esp32 master branch.

I have more than enough memory free:
FreeMem ~ 140KB
MaxFreeHeap: 180KB

I have a sketch that receives about 60 UDP packages per seconds, with a data size of up to 1024 bytes per packet.
At the same time its streaming DTLS data (UDP) with about 50 packets per second with about 70 bytes erach packet.

At a random point mbedtls_ssl_write returns an error while sending:
error: [write():207]: (0XFFFFFFB2) UNKNOWN ERROR CODE (004E)

I then tear down the connection and reestablish DTLS Streaming. However from this point Network is unstable.
Also incoming UDP packages start to stutter and fill up the internal queue. When I stop UDP streaming to the device, I can see that it still is processing UDP packages for 5-15 seconds.

I digged a little deeper and I found out that the lower socket error code is 12 (ENOMEM ) happening in write call of lwip.
https://github.com/ARMmbed/mbedtls/blob/7f007f70e06e45ffcb70cee75c5e71b3316a3316/library/net_sockets.c#L628

I have played with larger buffers here and there but nothing seems to be fixing this problem.
And I am only experiencing it once DTLS streaming is active,

@alorbach
Copy link
Author

I traced down the problem to be happning somewhere in LWIP function lwip_sendto:
https://github.com/espressif/esp-lwip/blob/ffd1059c9adfd9aece3578a030a74eb0b67c552e/src/api/sockets.c#L1645

I have the feeling mbedtls DTLS streaming causes some kind of memory leak within LWIP as once the error occured, the esp32 wifi becomes very unstable.

@alorbach
Copy link
Author

Another step further, debugging lwip shows me where the mem alloc fails

lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_getsockopt(57, SOL_SOCKET, SO_ERROR) = 12
ssl_client_udp error: send_ssl_data failed with res 0XFFFFFFB2, errno 12, retry after vTaskDelay!
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
ssl_client_udp error: send_ssl_data failed with res 0XFFFFFFB2, errno 12, tear down session!
lwip_close(57)
lwip_close: is_tcp=0

1 similar comment
@alorbach
Copy link
Author

alorbach commented Dec 14, 2020

Another step further, debugging lwip shows me where the mem alloc fails

lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_getsockopt(57, SOL_SOCKET, SO_ERROR) = 12
ssl_client_udp error: send_ssl_data failed with res 0XFFFFFFB2, errno 12, retry after vTaskDelay!
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
ssl_client_udp error: send_ssl_data failed with res 0XFFFFFFB2, errno 12, tear down session!
lwip_close(57)
lwip_close: is_tcp=0

@alorbach
Copy link
Author

Even with LWIP_STATS enable an pbub debug enabled, nothing which explains the ERR_MEM:

lwip_send(52, data=0x3ffee904, size=71, flags=0x0)
lwip_sendto(52, data=0x3ffee904, short_size=71, flags=0x0 to=0.0.0.0 port=0
pbuf_alloc(length=71)
pbuf_alloc(length=71) == 0x3ffd6e0c
netconn_send: sending 71 bytes
pbuf_header: old 0x3ffd6e70 new 0x3ffd6e68 (8)
pbuf_header: old 0x3ffd6e68 new 0x3ffd6e54 (20)
pbuf_header: old 0x3ffd6e54 new 0x3ffd6e46 (14)
pbuf_free(0x3ffd6e0c)
pbuf_free: deallocating 0x3ffd6e0c
lwip_send(52, data=0x3ffee904, size=71, flags=0x0)
lwip_sendto(52, data=0x3ffee904, short_size=71, flags=0x0 to=0.0.0.0 port=0
pbuf_alloc(length=71)
pbuf_alloc(length=71) == 0x3ffd6e0c
netconn_send: sending 71 bytes
pbuf_header: old 0x3ffd6e70 new 0x3ffd6e68 (8)
pbuf_header: old 0x3ffd6e68 new 0x3ffd6e54 (20)
pbuf_header: old 0x3ffd6e54 new 0x3ffd6e46 (14)
pbuf_free(0x3ffd6e0c)
pbuf_free: deallocating 0x3ffd6e0c
lwip_send(52, data=0x3ffee904, size=71, flags=0x0)
lwip_sendto(52, data=0x3ffee904, short_size=71, flags=0x0 to=0.0.0.0 port=0
pbuf_alloc(length=71)
pbuf_alloc(length=71) == 0x3ffd6e0c
netconn_send: sending 71 bytes
pbuf_header: old 0x3ffd6e70 new 0x3ffd6e68 (8)
pbuf_header: old 0x3ffd6e68 new 0x3ffd6e54 (20)
pbuf_header: old 0x3ffd6e54 new 0x3ffd6e46 (14)
pbuf_free(0x3ffd6e0c)
pbuf_free: deallocating 0x3ffd6e0c
lwip_getsockopt(52, SOL_SOCKET, SO_ERROR) = 12

LINK
xmit: 3
recv: 3
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

ETHARP
xmit: 24
recv: 279
fw: 0
drop: 429
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 429
opterr: 0
err: 0
cachehit: 15310

IP_FRAG
xmit: 0
recv: 0
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

IPv6 FRAG
xmit: 0
recv: 0
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

IP
xmit: 15507
recv: 20934
fw: 0
drop: 1009
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

ND
xmit: 3
recv: 0
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

IPv6
xmit: 3
recv: 524
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

IGMP
xmit: 0
recv: 7
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
proterr: 0
rx_v1: 0
rx_group: 0
rx_general: 7
rx_report: 0
tx_join: 0
tx_leave: 0
tx_report: 0

MLDv1
xmit: 0
recv: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
proterr: 0
rx_v1: 0
rx_group: 0
rx_general: 0
rx_report: 0
tx_join: 0
tx_leave: 0
tx_report: 0

ICMP
xmit: 53
recv: 42
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

ICMPv6
xmit: 0
recv: 0
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

UDP
xmit: 15316
recv: 19747
fw: 0
drop: 11
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 11
opterr: 0
err: 0
cachehit: 19

TCP
xmit: 87
recv: 130
fw: 0
drop: 1
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 1
opterr: 0
err: 0
cachehit: 112

SYS
sem.used: 0
sem.max: 0
pbuf_alloc(length=102)

@alorbach
Copy link
Author

It seems to fail at a simple malloc call (traced back to mem_malloc),
could this be heap fragmentation?
https://github.com/espressif/esp-lwip/blob/61d840ff4778f4946c8743f7e412345abcd537f1/src/core/mem.c#L251

@alorbach
Copy link
Author

After weeks of getting mad because of this issue, I digged deep down into heap, lwip debugging and so on - nothing really was helping and showing that there is any memory shortage.

However today I most likely found the root cause of the issue.

So I have a Class that does the DTLS UDP Streaming which uses HTTPClient from arduino-esp32, and I was using new/delete to create HTTPClient objects dynamically after the ERR_MEM occurred.

I remembered that I had trouble (strange exceptions) when deleting dynamic HTTPClient objects before in my OTA Updatecode. So as a workaround, I made the HTTPClient Object static which fixed the problem at that time.

Now I applied the same to my Class that does the DTLS Streaming, I made the HTTPClient object static at the top of the Class code like this:
HTTPClient mHttp;

And it appears to have fixed the problem mostly - I am running DTLS Streaming now since about 60 minutes without any ERR_MEM till yet.

So I think HTTPClient is messing something badly up in the destructor, perhaps heap corruption?

All the samples I find are not using new/delete but either static HTTPClient objects or on Stack.

@bensze01
Copy link
Contributor

Thanks for the report, and i'm happy to hear that you've managed to work around the problem.
From your description in your last comment, (not being familiar with arduino-esp32) it doesn't seem to me like mbedtls is causing the problem.

If the problem does turn out to originate from mbedtls after all, do please come back to us.

@alorbach
Copy link
Author

@bensze01 thats correct, the problem does not come from mbedtls, the original out of memory error was also happening inside LWIP and not mbedtls. But I already created this issue when I traced this down.

https://github.com/espressif/arduino-esp32 builds on https://github.com/espressif/esp-idf/ and is used on esp32 IOT based boards.

I hope this helps other facing the same problem.

@gilles-peskine-arm
Copy link
Contributor

I'm closing this issue since it isn't a problem in Mbed TLS.

@anjalir14
Copy link

Hello,
Currently, I'm working on a project in which I'm sending my data on AWS cloud (shadow) but I'm getting errors in middleware i.e mbedtls_ssl_write returned -0x6d & mbedtls_ssl_write returned -0x6c00. I started doing debugging but it's a bit confusing. Can you please help me out to know, whether it's middleware problem or memory problem?

Platform : STM32Cube

@gilles-peskine-arm
Copy link
Contributor

gilles-peskine-arm commented Sep 26, 2022

@anjalir14 Posting on a closed issue is not a good way to ask for support. However, I happened to still be notified of messages on this issue, so here's my answer. -0x6d is not an mbedtls error code, so it must be an error code coming from the I/O layer (probably the f_send function set with mbedtls_ssl_set_bio). That layer is supposed to return Mbed TLS error codes, but it isn't. It's probably returning a platform error code without converting it.

In the future, for support requests, please use the mailing list.

@anjalir14
Copy link

Firstly, I'm sorry and thank you for your help. I will try to debug my code again and if I came up with any error then I will mail you.
Thank you.

@anjalir14
Copy link

Hi, I'm not able to send a message on the mailing list that you have provided me. I'm having a few errors can you pls help me out over here only?

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants