-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mbedtls_ssl_handshake server sticks in failing with MBEDTLS_ERR_RSA_VERIFY_FAILED error after some point #7833
Comments
Use
So something is leaking memory. The memory leak is probably the cause of the signature verification failure, not a consequence. Although I am a bit surprised that the error is consistently |
I've missed it when writing, I was looking for the bit-mask corresponding codes and it's right, 0x0010 corresponds to a failed allocation.
But the heap memory is available here, except it's an allocation within a pool managed by mbedtls, what do you think? |
I could narrow down the issue to the state which the error being return: Debug level 1.
This corresponds to code: bool mbedtls_is_ssl_handshake_over = false;
int ret;
do {
LWIP_DEBUGF(ALTCP_MBEDTLS_DEBUG, ("--> mbedtls_ssl_handshake_step state %d heap %u -->\n", state->ssl_context.state, get_heap()));
ret = mbedtls_ssl_handshake_step(&state->ssl_context);
mbedtls_is_ssl_handshake_over = (state->ssl_context.state == MBEDTLS_SSL_HANDSHAKE_OVER);
LWIP_DEBUGF(ALTCP_MBEDTLS_DEBUG, ("<-- mbedtls_ssl_handshake_step state %d ret %d heap %u <--\n", state->ssl_context.state, ret, get_heap()));
}
while (!mbedtls_is_ssl_handshake_over && !ret); |
As a continuation of debug, an attached log of several https requests happen, where SSL session is also managed. There happens memory leak for every new HTTPS connection handshake. device-monitor-230629-161033 copy 2.log A manual memory leak calculation per handshake suggests multiple probable bytes per handshake (ranges from 203B to 299B). No memory leak encountered by a reconnect handshake as a client. |
Can you use tools like Asan to identify the memory leak precisely? If not, can you at least find the exact amount that's leaked, and look for a structure that has that size? We run a lot of tests with Asan and they validate that the library doesn't have memory leaks. But of course that only validates that there are no memory leaks for the settings that we test: there can be a leak in a different configuration or scenario. |
I'll check that out soon, meanwhile can you mention which structures I
should check against?
…On Thu, Jun 29, 2023, 17:54 Gilles Peskine ***@***.***> wrote:
Can you use tools like Asan
<https://github.com/google/sanitizers/wiki/AddressSanitizer> to identify
the memory leak precisely? If not, can you at least find the exact amount
that's leaked, and look for a structure that has that size?
We run a lot of tests with Asan and they validate that the library doesn't
have memory leaks. But of course that only validates that there are no
memory leaks for the settings that we test: there can be a leak in a
different configuration or scenario.
—
Reply to this email directly, view it on GitHub
<#7833 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH3O7J4XGHCCMPOEESXLZHTXNWJKXANCNFSM6AAAAAAZTGBEBY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Probably one of the SSL structures ( |
Given the size of SSL structures in my setup:
And given that I've identified the exact amount of memory leaked per handshake, which is 164 Bytes, which is the sum of the following structures:
Given that I'm not initiating any session for server requests, should my code manage them even so? and how further? ** My previous calculation was based on Web refresh with the browser, which initiates more than a connection simultaneously as needed by. Last calculation is based on CURL command line which initiates exactly one connection against the server. |
I think your code should only reference |
In the current setup I've the tickets and cache enabled and managed, but it's shared across requests (at the server level): Initialization: Freeing is done when the listening connection (in LwIP it's LISTEN Protocol Control Block - PCB) gets closed, only: Should these be managed on request-level .. not server-level? It's insured that the What do you think? |
I had a quick look at your code and I don't see anything obviously wrong. But I'm not familiar at all with LWIP, it might be worth asking people who are familiar with it. One thing to pay attention is, if there has ever been an error on the connection, then you must close it: attempting to use the connection may put the SSL object in a bad state. But the error handling for the handshake looks fine to me. |
I've just noticed missing updates to esp-lwip repo meeting latest changes, occured when I was preparing lwip patches.
Correct, the TCP connection reset is well managed there, but we might further check these lines which are TLS protocol related. Although logs across history show only one occurance of Also what I've suspected, but couldn't confirm due to not-enough experience in C, is the memory initialization of configuration struct which seems out of order compared to the structure sequence. But however, this is done per server setup, not every connection. Another note: So this might relate to a full handshake process; wherein the browser initiates more than a connection, and when one has succeeded it aborts the other, shortening the probability of leak to be in advanced states ( What do you think? |
That doesn't matter.
This is a promising lead. We have a lot of negative tests where the client sends an invalid message, but I'm not sure we have tests where the client just closes the connection. However, I do see a call to |
I've just confirmed that altcp_close(conn) is being called to a given connection, also old logs that enabled LwIP debug showed I've further made another test wherein I've turned it to ALTCP_TCP rather ALTCP_TLS, to check whether it's an application/TCP leak or not, it initially showed a leak, but the memory did recover after a while (I think by LwIP internal operation in freeing closed PCBs). Suspecting my code which is a library I maintain, I've failed in finding any source of memory leak, really (One can trace |
Summary
System information
Mbed TLS version (number or commit id): 2.28.3
Operating system and version: Built with Ubuntu
Configuration (if not default, please attach
mbedtls_config.h
): Please find attachedconfig.txt
Compiler and options (if you used a pre-built binary, please indicate how you obtained it): With default esp-idf compilation environment: version
release/v4.4
Additional environment information: Built using a fork of espressif/esp32-arduino-lib-builder
Expected behavior
I host TLS webserver using LwIP ALTCP TLS - MBEDTLS port, I expect every client (browser) to be able to connect anytime.
In essence, It works correctly to some point, correct behaviour includes making successful handshakes and communication.
The problem starts after some amount of time/process, in which causes to deny any new connections, even from browsers were already connected by the means of a reload button.
Already connected connections prior to this point continue to communicate successfully afterwards.
The error code returned by the
mbedtls_ssl_handshake
is fixed at -17280 (-0x4380) (MBEDTLS_ERR_RSA_VERIFY_FAILED).I've noticed that some times the preceeding handshake's error code is -17168 (-0x4310), which I've tried to find the represented meaning, but failed to, nearest assumption is
MBEDTLS_ERR_RSA_PRIVATE_FAILED | MBEDTLS_ERR_MPI_NEGATIVE_VALUE
.Is there any hint for what's going wrong?
With thanks,
Actual behavior
Log shows:
Or in the other described case:
Steps to reproduce
It's hard to tell how to reproduce, as the time is uncertain, and upon that any further reset of the webserver (including SSL context) yield in yet again successful handshakes until it reappears.
However, here's the running ALTCP MBEDTLS Code, pointing to the handshake API call:
https://github.com/HamzaHajeir/esp-lwip/blob/2.1.2-esp-patch/src/apps/altcp_tls/altcp_tls_mbedtls.c#L278
Additional information
Meanwhile this issue, the MCU is able to reconnect to a MQTT server by TLS with successful (one-way auth) handshakes,
My investigation of a very long log shows that this problem leaks memory too.
Debug Logs.
The text was updated successfully, but these errors were encountered: