Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SSL getter API for handshake state #4383

Closed
hanno-becker opened this issue Apr 21, 2021 · 58 comments
Closed

Introduce SSL getter API for handshake state #4383

hanno-becker opened this issue Apr 21, 2021 · 58 comments
Assignees
Labels
component-tls enhancement size-s Estimated task size: small (~2d)

Comments

@hanno-becker
Copy link
Contributor

Context: This was raised by Jeremy Audiger on the Mbed TLS mailing list.

Issue: When we make mbedtls_ssl_context internal, there is no supported way of extracting the handshake state. Given that we do expose the fact that the handshake happens in steps via mbedtls_ssl_handshake_step(), there should arguably be a public getter function that allows to retrieve the handshake state.

@mpg mpg added Product Backlog size-s Estimated task size: small (~2d) labels Apr 21, 2021
@mpg mpg added this to Incoming Items in OBSOLETE - SEE https://github.com/orgs/Mbed-TLS/projects/3 via automation Apr 21, 2021
@mpg
Copy link
Contributor

mpg commented Apr 27, 2021

Note: this is a follow-up to #4371 and #4373.

If we execute #4371 without this follow-up it would make some people's working code "non-compliant" with no transition path.
If we additionally execute #4373 without this follow-up, it would actually break working to without a transition path.

(In addition to the recent posting on the list, I've seen at least twice in the past code that accessed ssl->state, so I guess multiple users are going to be in that case.)

So while in principle it's something we could do any time in 3.x (add a getter function), in practice I think this is a necessary follow-up to #4373.

@mpg
Copy link
Contributor

mpg commented Jun 18, 2021

As I previously wrote elsewhere:

I don't think we necessarily want to make that field public (as it could hinder work on 1.3 which has a different state machine), but we probably want to make some info public, for example "is the handshake over yet?", otherwise exposing mbedtls_ssl_handshake_step() in the API makes little sense.

I'm not now thinking it's probably better to wait for feedback and see what people actually need, so that we can design an API that give them enough information, while not exposing too much of the internal state, which would again block us for future development.

@gilles-peskine-arm
Copy link
Contributor

I'm not thinking

ITYM “I'm now thinking”?

I think that 3.0 or at least 3.1 should have getters for things we know people need. Sample programs would give us a clue. I'm not sure if the handshake state is one of those though.

@bensze01 bensze01 modified the milestone: Mbed TLS 3.1 / early 3.x Jul 28, 2021
@bensze01 bensze01 removed this from the Mbed TLS 3.1 / early 3.x milestone Aug 11, 2021
@gstrauss
Copy link
Contributor

As I previously wrote elsewhere:

I don't think we necessarily want to make that field public (as it could hinder work on 1.3 which has a different state machine), but we probably want to make some info public, for example "is the handshake over yet?", otherwise exposing mbedtls_ssl_handshake_step() in the API makes little sense.

I'm not now thinking it's probably better to wait for feedback and see what people actually need, so that we can design an API that give them enough information, while not exposing too much of the internal state, which would again block us for future development.

One data point for server-side use of mbedTLS is lighttpd. I am a lighttpd developer and author of lighttpd mod_mbedtls TLS module and I posted more detail in #5331, but wanted to mention here that lighttpd mod_mbedtls needs more than "is the handshake over yet?". lighttpd currently steps the handshake for a) ALPN to be able to handle server certificate selection and support RFC8737 Automated Certificate Management Environment (ACME) TLS Application-Layer Protocol Negotiation (ALPN) Challenge Extension, and for b) client certificate validation.
reference: https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/mod_mbedtls.c#L2051

@mpg
Copy link
Contributor

mpg commented Jan 10, 2022

@ronald-cron-arm I think at this point you're the most familiar with the TLS 1.3 handshake states, could you have a look and comment?

I'm a bit providing a simple mbedtls_get_state() getter may introduced issues, because the application will then need to check the version and know about both 1.3 and 1.2 flows to interpret the result. (Though we already have a similar issue with session resumption which means you have basically two different flows in TLS 1.2, and apparently this hasn't been a problem.)

The alternative would be to add a series of functions like mbedtls_ssl_handshake_is_over(), mbedtls_ssl_handshake_certificate_request_was_sent()... the obvious drawback is that we have to know that functions are needed exactly, so every time someone needs something new, they need to ask us, and there is some delay. OTOH, perhaps the part about getting in touch is good, though, because it seems to me that at least the ALPN issue would be better solved by us providing an adequate certificate selection callback - and perhaps the client cert too, with a better CA callback?

@ronald-cron-arm
Copy link
Contributor

@mpg ok I will have a look.

@ronald-cron-arm ronald-cron-arm self-assigned this Jan 10, 2022
@gstrauss
Copy link
Contributor

Another item which could use improvement in stepping handshake and which is very visible on server-side use of mbedTLS: some callbacks are only for mbedtls_ssl_config, and do not have a corresponding callback on mbedtls_ssl_context. This is not thread-safe when mbedtls_ssl_config is shared. Prime example: mbedtls_ssl_conf_sni() is for mbedtls_ssl_config, and there is no SNI callback override for specific mbedtls_ssl_context (!)

@gilles-peskine-arm
Copy link
Contributor

I'm rather in favor of adding new functions with specific goals. The state is an internal detail and not every condition can be observed solely from the state field. For example, as we're seeing now, a new version of the protocol can introduce new states. Another example is a potential mbedtls_ssl_is_context_still_valid which requires looking at other fields (and we may or may not want change the state field on an error, as a compromise between implementation simplicity, ease of debugging and protection against accidental misuse: this shouldn't be considered an API change).

@mpg
Copy link
Contributor

mpg commented Jan 12, 2022

(Note: also requested by two people in #5184, correctly pointing out that currently there's no way to use mbedtls_ssl_handshake_step() as documented without using MBEDTLS_PRIVATE. Labeling "regression".)

@paul-elliott-arm
Copy link
Member

Another request for state, for handshake reasons.

@mpg
Copy link
Contributor

mpg commented Mar 17, 2022

@neoxic

Sorry if I'm missing something obvious, but thinking about it, it seems to me that you could achieve what you want just by using the existing cookie callback

Unfortunately, I don't see how the existing cookie callback can solve the problem since mbedtls_ssl_conf_dtls_cookies() assigns it to a configuration rather than a context, hence it's impossible to get a context pointer (or any other session state pointer for that matter) as its p_ctx argument to operate upon (change BIO for example).

Ok, so that's the obvious issue that I was missing, thanks for pointing it out.

Regarding exposing the state with a "no guarantee" warning - Gilles made his point again while I was writing and I think it's a really strong point.

So, the outcome of this discussion (plus some internal discussions) is that we favour approaches based on callbacks. I take your point that neither the existing cookie callback nor the new certificate selection callback introduced in #5454 meet your needs so far.

It seems to me that the new callback from #5454 would meet your needs if modified the code slightly so that the callback is only called when a valid cookie was found (or when using TLS obviously), which I think makes sense: why would you spend time selecting a certificate if you're not going to continue the handshake anyway? @gstrauss wdyt?

@mpg
Copy link
Contributor

mpg commented Mar 17, 2022

I hope everyone can see that mbedtls_ssl_handshake_step() doesn't make sense without some kind of a "step" or "state" to track progress.

I heard some people were using handshake_step() instead of handshake() for latency reasons, so that they can attend to other tasks (unrelated to the handshake) between steps in single-threaded environments. I agree this is a bit marginal (it only works if your acceptable latency falls right between the duration of the slowest step and that of a full handshake), but apparently some users where in that case. Also, in the meantime we've developed a better solution for single-threaded users who can't accept the handshake blocking for too long (and are not using RSA or FFDH), in the shape of MBEDTLS_ECP_RESTARTABLE, so arguably this use is even more marginal now.

That said, I fully agree that our public API right now is not consistent, as mbedtls_ssl_handshake_step() can't be used at all (even for the marginal use case mentioned above) without at least a way to know when the handshake is over. Worse, its documentation recommends checking the state while it's not longer publicly accessible. I think there are mainly several way to resolve this inconsistency:

  1. Add a function like mbedtls_ssl_is_the_handshake_over_yet() and have the documentation of mbedtls_ssl_handshake_step() say to use that new function to decide when to stop calling handshake_step().
  2. Add a big fat warning to the documentation of mbedtls_ssl_handshake_step() saying you can only use that in conjunction with MBEDTLS_PRIVATE() to access the state, which voids your guarantee, so if you find yourself needing to do that, please get in touch so that we can add a solution to your problem that will be guaranteed to work in the future.
  3. A compromise: add is_the_handshake_over_yet() and document that handshake_step() should be used with that, but also add a word about ssl->MBEDTLS_PRIVATE(state) with a big fat warning as above.

@gstrauss
Copy link
Contributor

gstrauss commented Mar 17, 2022

It seems to me that the new callback from #5454 would meet your needs if modified the code slightly so that the callback is only called when a valid cookie was found (or when using TLS obviously), which I think makes sense: why would you spend time selecting a certificate if you're not going to continue the handshake anyway? @gstrauss wdyt?

@mpg I have not traced through the code to understand the DTLS flows, so this is a question: ssl->handshake->verify_cookie_len is set in ssl_parse_client_hello() (if ssl->renego_status == MBEDTLS_SSL_INITIAL_HANDSHAKE) before ssl->conf->f_cert_cb(). However, later in the handshake in ssl_write_server_hello(), verify_cookie_len is checked and ssl_write_hello_verify_request() is sent. That looks like it should stay in ssl_write_server_hello(), but can it also be called in ssl_parse_client_hello(), short-circuiting ssl_parse_client_hello(), when ssl->handshake->verify_cookie_len is set in ssl_parse_client_hello()?

Possible overload suggestion: Now that there is mbedtls_ssl_conf_get_user_data_p() and mbedtls_ssl_get_user_data_p(), might ssl->conf->p_cookie be overloaded so that if ssl->conf->p_cookie is NULL, then mbedtls_ssl_context is passed to ssl->conf->f_cookie_check( ( ssl->conf->p_cookie ? ssl->conf->p_cookie : ssl ), ... ) ? [Edit: that might not be feasible: ssl_check_dtls_clihlo_cookie() does not have access to mbedtls_ssl_context, so passing non-NULL ssl->conf->p_cookie is the only way to get additional context there.]

@gstrauss
Copy link
Contributor

Musing: Were there to be a new accessor available to check the cookie status, then the new accessor could be called from f_cert_cb before bind() and connect() the DTLS socket (and skip further certificate selection logic if ssl_write_hello_verify_request() is to be called later from ssl_write_server_hello()

@neoxic
Copy link

neoxic commented Mar 17, 2022

@gstrauss

Your proposals are neither new or unique, and suggest that you have not carefully read the prior conversations in this PR. For example, I, too, previously posted that mbedtls_ssl_states enum is public, and requested that mbedtls_ssl_handshake_step() be supported or deprecated (along with appropriate alternatives).

Please look for "X hidden items" "Load more..." on this page to review conversations that might have been hidden by the initial github PR page rendering.

Didn't know it's forbidden here to repeat what has been already said... As for me having not read the previous conversation, while I respect your opinion, brother, may I kindly ask you to refrain from far-fetched assumptions and "policing"? Thank you!

@neoxic
Copy link

neoxic commented Mar 17, 2022

@mpg

Regarding exposing the state with a "no guarantee" warning - Gilles made his point again while I was writing and I think it's a really strong point.

So, the outcome of this discussion (plus some internal discussions) is that we favour approaches based on callbacks.

Fair enough. The path MbedTLS is taking is now clear. Unfortunately, it looks like the problem with a session-based DTLS server that I tried to shed some light upon above will become buried even deeper with callbacks. Moreover, since there is no easy transition path between handshake_step() and callbacks, supporting both 2.x and 3.x without resorting to "unstable APIs" becomes a real pain (unless the new approach is going to be back-ported, of course).

@mpg
Copy link
Contributor

mpg commented Mar 18, 2022

@gstrauss

@mpg I have not traced through the code to understand the DTLS flows, so this is a question: ssl->handshake->verify_cookie_len is set in ssl_parse_client_hello() (if ssl->renego_status == MBEDTLS_SSL_INITIAL_HANDSHAKE) before ssl->conf->f_cert_cb(). However, later in the handshake in ssl_write_server_hello(), verify_cookie_len is checked and ssl_write_hello_verify_request() is sent. That looks like it should stay in ssl_write_server_hello(), but can it also be called in ssl_parse_client_hello(), short-circuiting ssl_parse_client_hello(), when ssl->handshake->verify_cookie_len is set in ssl_parse_client_hello()?

I think that's correct, but I was thinking of a smaller change, like:

diff --git a/library/ssl_srv.c b/library/ssl_srv.c
index 094fca8932d7..7379b6df18ca 100644
--- a/library/ssl_srv.c
+++ b/library/ssl_srv.c
@@ -1875,7 +1875,10 @@ read_record_header:
     /*
      * Server certification selection (after processing TLS extensions)
      */
-    if( ssl->conf->f_cert_cb && ( ret = ssl->conf->f_cert_cb( ssl ) ) != 0 )
+    if( ssl->conf->f_cert_cb &&
+        !( ssl->conf->transport == MBEDTLS_SSL_TRANSPORT_DATAGRAM &&
+        ssl->handshake->verify_cookie_len != 0 ) &&
+        ( ret = ssl->conf->f_cert_cb( ssl ) ) != 0 )
     {
         MBEDTLS_SSL_DEBUG_RET( 1, "f_cert_cb", ret );
         return( ret );

(or something equivalent with better formatting and a comment).

Musing: Were there to be a new accessor available to check the cookie status, then the new accessor could be called from f_cert_cb before bind() and connect() the DTLS socket (and skip further certificate selection logic if ssl_write_hello_verify_request() is to be called later from ssl_write_server_hello()

I agree, I think that was (2) in the first paragraph of this comment. I think both ways would work, though right now I have a slight preference for (1) because I think it's just simpler.

@mpg
Copy link
Contributor

mpg commented Mar 18, 2022

@neoxic

Unfortunately, it looks like the problem with a session-based DTLS server that I tried to shed some light upon above will become buried even deeper with callbacks.

I agree that's not ideal. OTOH, like someone said over in the OpenSSL discussion, I think the "right" way to solve the underlying issue is neither callbacks nor stepping the handshake, but a dedicated BIO implementation that would probably use a single unconnected socket and handle the dispatch. There's a mismatch between what the DTLS layer expects and what the sockets API provides, and bridging that gap should be the BIO callback's job. And ideally it should be provided by the DTLS library (or at least a working example of how to go about it), rather that written over and over again by each application developer. Unfortunately, considering all the other things on our plate, that's unlikely to happen soon.

If you ever have the right combination of time and motivation and feel like submitting a PR for this, please get in touch on the mailing list first to discuss the general design and the planning (our review bandwidth tends to be a bottleneck, so for large/complex contributions it's better if the review effort is budgeted in advance).

@mpg
Copy link
Contributor

mpg commented Mar 18, 2022

@neoxic

supporting both 2.x and 3.x without resorting to "unstable APIs" becomes a real pain (unless the new approach is going to be back-ported, of course).

As a matter of principle, I think we could consider back-porting selected features in order to make supporting both 2.x and 3.x easier, but unfortunately in this case the changes affect the ABI, which we "try very hard" not to break in LTS branches - meaning we'd only break the ABI in an LTS if that was the only way we found to fix a security vulnerability.

Sorry about that, I hear your pain, but I don't think we can do anything to relieve it now. (Perhaps in the future we release 4.0 we can wait a bit before turning the latest 3.x to LTS status, to leave time for similar issues to surface and find solutions that make it less of a pain to support both 3.x and 4.x.)

@neoxic
Copy link

neoxic commented Mar 18, 2022

@mpg

I agree that's not ideal. OTOH, like someone said over in the OpenSSL discussion, I think the "right" way to solve the underlying issue is neither callbacks nor stepping the handshake, but a dedicated BIO implementation that would probably use a single unconnected socket and handle the dispatch.

Personally, I think complicating the implementation of BIO is a dead-end approach. And the exact reason I abandoned my attempts to incorporate OpenSSL in favour of MbedTLS was that:

  1. MbedTLS has (at least for now) a way to give control to the user (me) over how I want my BIO to deal with the difficulties and subtleties that stem from OS-based inconsistencies;
  2. OpenSSL developers are stuck in babysitting users, i.e. solving unrelated tasks, thus this particular problem is unlikely to be ever solved. My hopes MbedTLS is not going to follow suit...

If you ever have the right combination of time and motivation and feel like submitting a PR for this, please get in touch on the mailing list first to discuss the general design and the planning (our review bandwidth tends to be a bottleneck, so for large/complex contributions it's better if the review effort is budgeted in advance).

Yes, that was on my TODO list. In particular, I planned to develop and donate a correctly working example as part of the MbedTLS example suite. But... Since we hit a major snag with MbedTLS 3.0 in regards to handshake_step(), these plans are now on hold until some feasible approach is on the horizon for I don't believe you will favour using "unstable APIs" in your examples.

EDIT:

There's a mismatch between what the DTLS layer expects and what the sockets API provides, and bridging that gap should be the BIO callback's job. And ideally it should be provided by the DTLS library (or at least a working example of how to go about it), rather that written over and over again by each application developer. Unfortunately, considering all the other things on our plate, that's unlikely to happen soon.

Indeed, there is much more sense in what you are saying than modifying a totally unrelated server certificate selection callback. Since there are already things like mbedtls_ssl_conf_dtls_cookies() and mbedtls_ssl_set_client_transport_id() that work only in case of a DTLS server, it might be natural to just add something like:

typedef int mbedtls_ssl_greeter_t(void *ctx);

void mbedtls_ssl_set_greeter(mbedtls_ssl_context *ssl, void *ctx, mbedtls_ssl_greeter_t *func);

... which again makes sense only for DTLS servers and triggers func on a successful ClientHello. This way, it will also be backward-compatible and easily back-portable. Furthermore, a dedicated callback like this will very much expose the bind/connect problem rather than suppress it with the unrelated server certificate callback.

@gstrauss
Copy link
Contributor

I think that's correct, but I was thinking of a smaller change, like:
[snip]

Musing: Were there to be a new accessor available to check the cookie status, then the new accessor could be called from f_cert_cb before bind() and connect() the DTLS socket (and skip further certificate selection logic if ssl_write_hello_verify_request() is to be called later from ssl_write_server_hello()

I agree, I think that was (2) in the first paragraph of this comment. I think both ways would work, though right now I have a slight preference for (1) because I think it's just simpler.

f_cert_cb is intended to be called unconditionally, unless there is a fatal error in processing the client hello before that point. Your simple patch would add an exception, which I do not think is the best design choice here. If the handshake proceeds (it does), then f_cert_cb should be called to be able to perform certificate selection and any other application-level policy that should occur after the ClientHello has been processed. @mpg, I think your suggestion ((2) in the first paragraph of this comment) is a better solution, though other alternatives might also be proposed by those who know DTLS better than I. A place for DTLS bind() and connect() by the application might be created elsewhere, before the ServerHello or any other response is sent to the client.

@paul-elliott-arm
Copy link
Member

Added #5653 to implement mbedtls_ssl_is_handshake_over(), as discussed above. Not setting it to close this issue, as there is still conversation going on here about other problems aroundt this area.

@minosgalanakis
Copy link
Contributor

As discussed in issue and subsequent design review issue #8529, the handshake and handshake state should not be directly exposed.

As mentioned above exposing the internal state does not guarantee consistent behaviour across different versions.

We have added callbacks at #5454 and #8456 and stepping the handshake is possbile. We are not considering adding a dedicated BIO implementation at this stage.

I will be closing this issue, but please feel free to re-open or write a new-one if there is still a gap that we need to consider moving forward.

@minosgalanakis minosgalanakis removed this from [3.6] Mbed TLS PRIVATE in EPICs for Mbed TLS Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component-tls enhancement size-s Estimated task size: small (~2d)
Projects
Status: [3.6] Mbed TLS PRIVATE
Development

No branches or pull requests