Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues migrating lighttpd mod_mbedtls to mbedtls 3.0.0 #5331

Closed
gstrauss opened this issue Dec 14, 2021 · 29 comments
Closed

issues migrating lighttpd mod_mbedtls to mbedtls 3.0.0 #5331

gstrauss opened this issue Dec 14, 2021 · 29 comments
Assignees

Comments

@gstrauss
Copy link
Contributor

gstrauss commented Dec 14, 2021

I am a lighttpd developer and author of lighttpd mod_mbedtls TLS module. I have read through the migration guide and have some issues, feedback, and questions.

https://github.com/ARMmbed/mbedtls/blob/development/docs/3.0-migration-guide.md

If no accessor function exists, please open an enhancement request against Mbed TLS and describe your use case. The Mbed TLS development team is aware that some useful accessor functions are missing in the 3.0 release, and we expect to add them to the first minor release(s) (3.1, etc.).

This is a long list. I can break it into separate github issues if you like, but I hope to get some feedback before creating a large number of new github issues. Thank you.

minor nits and simple bugs

  • removal of MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO is poor form
    While more detailed MBEDTLS_ERR_SSL_* values are now available (good),
    why was it necessary to remove MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO
    and break existing usage? Some developers attempt to maintain software
    which works across multiple (major) versions of mbedtls.
    [edit: not doing anything in 3.x, keeping in mind for 4.0]

  • mbedtls_ssl_context makes major_ver and minor_ver members private
    Is there an accessor function in mbedTLS 3.0.0? Should there be? [edit: Access to TLS version #5407]
    Separate bug: [edit: mbedtls_ssl_get_version() does not support TLS 1.3 #5406]
    const char *mbedtls_ssl_get_version( const mbedtls_ssl_context *ssl )
    returns a string (which would then need to be parsed), and for some
    reason is missing support for "TLSv1.3" (MBEDTLS_SSL_MINOR_VERSION_4)

blockers: no alternative other than accessing private structures

  • mbedtls_ssl_handshake_step() callers must use mbedtls_ssl_context ssl.state
    but mbedtls_ssl_context state member is now private. What gives? [edit: Introduce SSL getter API for handshake state #4383]

  • old limitation: no interface for per mbedtls_ssl_context data ptr for SNI callback
    (only function is mbedtls_ssl_conf_sni()) [edit: Allow per-context SNI callback (state) #5413]

  • old limitation, but biggest code mess for lighttpd supporting mbedtls:
    library/ssl_srv.c:ssl_parse_client_hello() design limitations:
    Certificate selection should be a separate callback, after all TLS extensions
    have been parsed. Right now, SNI callback must select a certificate, and
    IIRC, there may be problems/challenges trying to set the certificate again
    if it should be changed. Please correct me if I am wrong, but TLS extensions
    can occur in any order, so ALPN and SNI extensions might be received in any
    order. lighttpd supports
    RFC8737 "Automated Certificate Management Environment (ACME)
    TLS Application-Layer Protocol Negotiation (ALPN) Challenge Extension"
    and to support ALPN "tls-alpn-01" with mbedTLS, lighttpd must parse the
    client hello for ALPN before ssl_parse_client_hello(), so that when
    the SNI extension is processed and callback executed, the proper certificate
    is chosen, even if the ALPN extension comes after the SNI extension.
    To parse the client hello, lighttpd forked and specialized
    library/ssl_srv.c:ssl_parse_client_hello() to handle the TLS ALPN extension.
    This is fragile, and will have to be done again with mbedTLS 3.0.0, where
    more of the struct members and functions are now private and hidden.
    Please suggest a better design in mbedTLS to support TLS ALPN "tls-alpn-01"
    and please remedy the lack of per mbedtls_ssl_context data ptr for SNI callback
    (mentioned in the bullet point directly above).
    Please add a new callback for certificate selection which occurs after all
    TLS extensions have been processed, and along with enhancing the
    SNI callback, allow the callback to be per mbedtls_ssl_context,
    not restricted to per mbedtls_ssl_config.
    [edit: https://github.com/Add certificate selection callback #5430]

  • partial writes
    lighttpd uses ssl.out_left to detect partial writes.
    Please allow access or create inline accessor funcs.
    https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/mod_mbedtls.c#L1972
    [edit: it looks like the return value of ssl_write is enough, after discussion]

  • SSL tickets
    mbedtls_ssl_ticket_context members are now all private, but there are no
    interfaces to manage it. lighttpd allows synchronization of session tickets
    across multiple servers, and so writes into mbedtls_ssl_ticket_context to
    manage mbedtls_ssl_ticket_context.
    lighttpd mod_mbedtls_session_ticket_key_check()
    https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/mod_mbedtls.c#L335
    [edit: Add access to some fields of mbedtls_ssl_ticket #5421]

  • PEM file parsing
    mbedtls_pem_context members are now all private
    lighttpd uses mbedtls_pem_read_buffer() to import a cert from memory.
    lighttpd then calls mbedtls_x509_crt_parse_der_with_ext_cb() to parse it,
    but must now use pem.MBEDTLS_PRIVATE(buf), pem.MBEDTLS_PRIVATE(buflen)
    Please allow access or create inline accessor funcs.
    [edit: Allow access to result of PEM decoding #5414]

  • setting up a CGI environment
    SSL_CIPHER mbedtls_cipher_info_get_name(cipher_info)
    SSL_CIPHER_ALGKEYSIZE mbedtls_cipher_info_get_key_bitlen(cipher_info)
    SSL_CIPHER_USEKEYSIZE (same as algkeysize with (most?) modern ciphers?)
    These are possibly important to some people, though one may argue that keysize
    values are no longer as representative of cipher strength as they once were.
    I still need the cipher name, even if keysize is less important.
    Before 3.0.0, cipher_info was directly accessible through
    ssl.transform->cipher_ctx_enc.cipher_info
    and SSL_CIPHER_USEKEYSIZE via ssl.transform->cipher_ctx_enc.key_bitlen
    With 3.0.0, struct mbedtls_ssl_transform is private in ssl_misc.h
    (and transform member ssl.MBEDTLS_PRIVATE(transform) is private, too)
    With 3.0.0, how to get cipher_info and key_bitlen from struct mbedtls_ssl_context?
    Here is a ridiculous chain for a simple lookup, and still not safe in 3.0.0:

    mbedtls_ssl_ciphersuite_t *ciphersuite_info =
      mbedtls_ssl_ciphersuite_from_id(mbedtls_ssl_get_ciphersuite_id(mbedtls_ssl_get_ciphersuite(ssl)));
    mbedtls_cipher_info_t *cipher_info
      mbedtls_cipher_info_from_type(ciphersuite_info->cipher); // but cipher is private, too!
    

    and that still does not provide a way to access to mbedtls_cipher_context_t key_bitlen
    How about:

    /* (not inline in header; mbedtls_ssl_transform is private in ssl_misc.h) */
    const mbedtls_cipher_context_t *
    mbedtls_ssl_get_cipher_context_enc(mbedtls_ssl_context *ssl)
    {
        return &ssl->transform->cipher_ctx_enc;
    }
    
    /* (not inline in header; mbedtls_ssl_transform is private in ssl_misc.h) */
    const mbedtls_cipher_context_t *
    mbedtls_ssl_get_cipher_context_dec(mbedtls_ssl_context *ssl)
    {
        return &ssl->transform->cipher_ctx_dec;
    }
    

    Then, I can use
    mbedtls_cipher_get_name() for SSL_CIPHER
    mbedtls_cipher_get_key_bitlen() for SSL_CIPHER_ALGKEYSIZE
    but there is still no accessor to ctx->MBEDTLS_PRIVATE(key_bitlen)

    static inline int
    mbedtls_cipher_get_key_bitlen_used(const mbedtls_cipher_context_t *ctx)
    {
        return ctx->MBEDTLS_PRIVATE(key_bitlen);
        // Note: different from mbedtls_cipher_get_key_bitlen() which returns
        //     ctx->MBEDTLS_PRIVATE(cipher_info)->MBEDTLS_PRIVATE(key_bitlen)
    }
    
    static inline const mbedtls_cipher_info_t *
    mbedtls_cipher_get_cipher_info(const mbedtls_cipher_context_t *ctx)
    {
        return ctx->MBEDTLS_PRIVATE(cipher_info);
    }
    

    BTW, mbedtls_cipher_get_key_bitlen() should
    return (int) mbedtls_cipher_info_get_key_bitlen(ctx->MBEDTLS_PRIVATE(cipher_info));
    rather than reaching through the structures directly
    (ctx->MBEDTLS_PRIVATE(cipher_info)->MBEDTLS_PRIVATE(key_bitlen))
    [edit: https://github.com/Add mbedtls_ssl_get_cipher_info() accessor #5417]

  • mbedtls_x509_name member next_merged is now private
    (reasonable, since it is an implementation detail)
    However, lighttpd mod_mbedtls.c https_add_ssl_client_subject() replaces the
    inefficient mbedtls_x509_dn_gets() which repeatedly calls snprintf() (slow) for
    buffer management.
    https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/mod_mbedtls.c#L2400
    lighttpd formats components of client Subject DN into env vars SSL_CLIENT_S_DN_*.
    NOTE: to properly walk a mbedtls_x509_name linked list,
    one needs to know if nm->next needs to be merged or not.
    [edit: https://github.com/Add access to mbedtls_x509_name::next_merged #5431]

  • mbedtls_ssl_config_defaults does not detect if called more than once
    mbedtls_ssl_config params dhm_P and dhm_G are now private.
    lighttpd calls
    mbedtls_mpi_free(ssl_ctx->MBEDTLS_PRIVATE(dhm_P));
    mbedtls_mpi_free(ssl_ctx->MBEDTLS_PRIVATE(dhm_G));
    before calling mbedtls_ssl_config_defaults() a second time when lighttpd
    reconfigures the config to use one of SUITEB* (e.g. SUITEB192)
    Since mbedtls_ssl_config_init() zeros mbedtls_ssl_config,
    calls to mbedtls_ssl_config_defaults() should probably mbedtls_mpi_free()
    the dhm_P and dhm_G params if they had previously been allocated.
    (This might be considered a bug, possibly resulting in a memory leak.)
    [edit: Reset dhm_P and dhm_G if config call repeated; avoid memory leak #5353]

  • RFE: missing OCSP stapling support is a huge detriment to adoption of mbedTLS
    on the server-side.

@gilles-peskine-arm
Copy link
Contributor

gilles-peskine-arm commented Dec 14, 2021

Thank you for the detailed feedback!

I (or a fellow team member) will go through the list in more detail and file individual issues. It's going to take a while to resolve them all. We just had a code freeze for the 3.1 release and it's going to still be missing some public fields or accessor functions. The 2.28 version will remain supported for at least 3 years.

A partial reply to some of your points:

Some developers attempt to maintain software which works across multiple (major) versions of mbedtls.

Sorry. In our defense, this is the first incompatible release in 4 years. We needed to do some cleanup in TLS to prepare for adding TLS 1.3 support.

mbedtls_ssl_handshake_step() callers must use mbedtls_ssl_context ssl.state but mbedtls_ssl_context state member is now private. What gives?

This is an identified gap which unfortunately missed the 3.1 window.

partial writes lighttpd uses ssl.out_left to detect partial writes.

We can add an accessor function for this. I think its implementation will change with the planned rewrite of the TLS stack. In the meantime, can't you use the return value from write?

… mbedtls_cipher_context_t

3.1 has accessor functions for this type.

PEM file parsing
mbedtls_x509_name

Many fields of ASN.1 and X.509 parsing structures are public in 3.1. Please let us know if that's not enough.

mbedtls_ssl_config_defaults does not detect if called more than once

I think that's a general problem with SSL config functions.

RFE: missing OCSP stapling support is a huge detriment to adoption of mbedTLS
on the server-side.

Unfortunately this is a major feature which is currently not on our roadmap.

@gstrauss
Copy link
Contributor Author

Thanks for the detailed response.

Some developers attempt to maintain software which works across multiple (major) versions of mbedtls.

Sorry. In our defense, this is the first incompatible release in 4 years. We needed to do some cleanup in TLS to prepare for adding TLS 1.3 support.

I totally understand incompatible changes in major version releases. My minor nit questions why MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO had to be removed, instead of preserving the identifier, either with a unique error number, or sharing an identifier with a renamed identifier, e.g.
#define MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO MBEDTLS_ERR_SSL_DECODE_ERROR

mbedtls_ssl_config_defaults does not detect if called more than once

I think that's a general problem with SSL config functions.

Would you accept patches to improve this? I mentioned this because I (now) reach into private dhm_P and dhm_G params, and would not need to do so if mbedtls_ssl_config_defaults() called mbedtls_mpi_free() on these params.

partial writes lighttpd uses ssl.out_left to detect partial writes.

We can add an accessor function for this. I think its implementation will change with the planned rewrite of the TLS stack. In the meantime, can't you use the return value from write?

Hopefully this will give you an idea of how lighttpd is using mbedTLS, and please let me know if I am crazy and there are better ways to do this -- or if this might suggest there might be better ways that mbedTLS could handle this. The following is the comment I left myself in the code to try to explain why some hoop-jumping is done, and why ssl.out_left is used:
https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/mod_mbedtls.c#L2004

        /* mbedtls_ssl_write() copies the data, up to max record size, but if
         * (temporarily) unable to write the entire record, it is documented
         * that the caller must call mbedtls_ssl_write() again, later, with the
         * same arguments.  This appears to be because mbedtls_ssl_context does
         * not keep track of the original size of the caller data that
         * mbedtls_ssl_write() attempted to write (and may have transformed to
         * a different size).  The func may return MBEDTLS_ERR_SSL_WANT_READ or
         * MBEDTLS_ERR_SSL_WANT_WRITE to indicate that the caller should wait
         * for the fd to be readable/writable before calling the func again,
         * which is why those (temporary) errors are returned instead of telling
         * the caller that the data was successfully copied.  When the record is
         * written successfully, the return value is supposed to indicate the
         * number of (originally submitted) bytes written, but since that value
         * is unknown (not saved), the caller's len parameter is reflected back,
         * which is why the caller must call the func again with the same args.
         * Additionally, to be accurate, the size must fit into a record which
         * is why we restrict ourselves to sending max out record payload each
         * iteration.
         */

@gstrauss
Copy link
Contributor Author

While I may have some technical criticisms, I do want to shout out that I believe mbedtls interfaces to, in general, be quite nice, and that the mbedtls documentation in the headers is fantastic! Were mbedtls not open source, I would not have been able to write lighttpd mod_mbedtls and integrate it as well with mbedtls. Thank you very much.

@gstrauss
Copy link
Contributor Author

PEM file parsing
mbedtls_x509_name

Many fields of ASN.1 and X.509 parsing structures are public in 3.1. Please let us know if that's not enough.

It does not look like these changes were included in the mbedtls-3.1.0 tag. They are still all private members.

… mbedtls_cipher_context_t

3.1 has accessor functions for this type.

Not (yet?) enough to solve the issue I posted above. There is also no accessor funcs (see suggestion in my initial post) for access to retrive the (mbedtls_cipher_context_t *) from &ssl->transform->cipher_ctx_enc or to access ctx->key_bitlen.

mbedtls_cipher_get_name() is the same as mbedtls_cipher_info_get_name() (other than the input param)
mbedtls_cipher_get_key_bitlen() is the same as mbedtls_cipher_info_get_key_bitlen() (other than the input param)

mbedtls_cipher_get_key_bitlen() accesses ctx->MBEDTLS_PRIVATE(cipher_info)->MBEDTLS_PRIVATE(key_bitlen) (instead of ctx->MBEDTLS_PRIVATE(key_bitlen). There is no accessor func for ctx->MBEDTLS_PRIVATE(cipher_info) (returning (const mbedtls_cipher_info_t *)), which could then be used with mbedtls_cipher_info_get_name() and mbedtls_cipher_info_get_key_bitlen() without the redundant accessors mbedtls_cipher_get_name() and mbedtls_cipher_get_key_bitlen(), and then mbedtls_cipher_get_key_bitlen() could return its key_bitlen member.

@gilles-peskine-arm gilles-peskine-arm self-assigned this Dec 20, 2021
@gstrauss
Copy link
Contributor Author

Would you like me to submit the following as pull requests?
From two feature branches, each with a simple commit with proposed changes to address one or two of the items in this PR:
gstrauss@4f4df71
gstrauss@e0fb849
These are simple items from the list above, not some of the more complex questions.

@gilles-peskine-arm
Copy link
Contributor

@gstrauss Yes please! With a unit test (tests/suites/test_suite_ssl.* for both) that fails without your patch, and a changelog entry (ChangeLog.d/*) for for each.

@gstrauss
Copy link
Contributor Author

#5353 for memory leak if mbedtls_ssl_config_defaults() called more than once on same mbedtls_ssl_config.

As for mbedtls_ssl_get_version(), it appears that the rest of mbedtls does not yet support TLSv1.3, so I'll hold off on a PR for that.

I am interested in your take on some of the more complex questions above before I attempt to craft patches for those. Thanks.

@mpg
Copy link
Contributor

mpg commented Jan 10, 2022

@gstrauss Thank you so much for your feedback, that's extremely valuable! (And thanks for the kind words about the API and documentation, too.) I'll be creating individual issues based on your list (and, when possible, consolidating with feedback we received from other sources). I hope you won't mind it I annotate your list with references to issues created, that will help be keep track of what's done and what's left to do.

Hopefully we can continue the discussion in the individual issues then. Again, thank you so much for your feedback!

@gstrauss
Copy link
Contributor Author

I hope you won't mind it I annotate your list with references to issues created, that will help be keep track of what's done and what's left to do.

Not at all. Please do so as I am interested in following the issues.

I'm eager to help submit some patches for review, and understand that there will be latency in scheduling. Before I spend time (which might turn out to be wasted) on larger patches, I am waiting for feedback on smaller patches, e.g. #5353, which while small, is also low priority.

@mpg
Copy link
Contributor

mpg commented Jan 11, 2022

There is also no accessor funcs (see suggestion in my initial post) for access to retrive the (mbedtls_cipher_context_t *) from &ssl->transform->cipher_ctx_enc or to access ctx->key_bitlen.

I'm afraid adding access to ssl->transform->cipher_ctx_enc is not something we want to do at this point.

We're in the process of migrating the X.509 and TLS layers to use PSA Crypto APIs instead of the current mbedtls_xxx crypto APIs, which are to be removed at some point in the future. In the short term, we're planning to remove cipher_ctx_enc when MBEDTLS_USE_PSA_CRYPTO is enabled, and in the medium term, we'd like to make MBEDTLS_USE_PSA_CRYPTO non-optional.

This is one of the reasons why made most fields private by default: in order to give ourselves more freedom to modify the implementation.

In this case, the obvious solution would be to take a step back and give you access directly to the information you actually want: the name and key size. This could be extracted from the ciphersuite. For example:

const char *cs_name = mbedtls_ssl_get_ciphersuite( &ssl );
int cs_id = mbedtls_ssl_get_ciphersuite_id( cs_name );
size_t key_bits = mbedtls_ssl_get_ciphersuite_key_bits( cs_id );

Would that work for you?

By the way:

  • I can't think of a meaningful difference between SSL_CIPHER_ALGKEYSIZE and SSL_CIPHER_USEKEYSIZE with any ciphersuite currently in use (or at least, implemented in Mbed TLS / that I know of). Perhaps this was meaningful for export ciphersuites which used artificially short keys (for example 40-bit RC4), but I think nowadays those two are always equal.
  • I've tried looking around a bit, and it's not clear to me what SSL_CIPHER is actually supposed to hold: namely, if it's the name of the cipher proper, or of the TLS ciphersuite. In all examples I've found, it was the name of the ciphersuite, and and actually the non-standard name used by the SSL engine (so, OpenSSL in most cases).
  • Generally speaking, is there any kind of standard about what variables are to be defined in a CGI environment and what they should mean? Or has it evolved organically, with everyone trying to provide compatibility with previous/popular implementations?

Also, while looking around I've seen that apparently nginx is extracting the key size from the ciphersuite name - I think we can agree this seems suboptimal and we'll try to provide a better API for that :)

@mpg
Copy link
Contributor

mpg commented Jan 11, 2022

Hopefully this will give you an idea of how lighttpd is using mbedTLS, and please let me know if I am crazy and there are better ways to do this -- or if this might suggest there might be better ways that mbedTLS could handle this.

Thanks for pointing us to this comment. I don't think you're crazy, and clearly the situation with partial writes is a bit more complex than we'd all like it to be (I also have to spend some time thinking and reading the code again to clarify my understanding of the situation). I think there are two potential kinds of partial writes: (1) those at the TLS level (the requested length is large than the maximum fragment length for TLS), which I understand you're avoiding by always calling ssl_write() with a small enough length, and those at the transport level (I/O callback): the I/O callback returns WANT_WRITE, potentially after writing some but not all of the data.

These correspond to the following paragraphs in the documentation:

 * \warning        This function will do partial writes in some cases. If the
 *                 return value is non-negative but less than length, the
 *                 function must be called again with updated arguments:
 *                 buf + ret, len - ret (if ret is the return value) until
 *                 it returns a value equal to the last 'len' argument.
[...]
 * \note           When this function returns #MBEDTLS_ERR_SSL_WANT_WRITE/READ,
 *                 it must be called later with the *same* arguments,
 *                 until it returns a value greater that or equal to 0. When
 *                 the function returns #MBEDTLS_ERR_SSL_WANT_WRITE there may be
 *                 some partial data in the output buffer, however this is not
 *                 yet sent.

If I understand correctly, one of the things you were saying even before 3.0, is that the requirement in the second paragraph (to call again with the same arguments) is inconvenient, and actually unnecessary because all the info is in the ssl_context already, and you'd rather just call ssl_flush_output() directly in that case. I think that's a fair point for your use case, where you're taking care never pass more than the maximum fragment length but the general case may be more complext. However that's something we can try to improve in the future.

The other point, which causes issues with 3.0, is that you're currently using out_left to detect if a partial write happened in the past, and if that's so, call ssl_write() again with the same length, which you stored independently. I think (and please correct me if I'm wrong) that you don't need to look at out_left in this case, and as Gilles said, can use the return value of the previous call to ssl_write() instead: it it's WANT_WRITE, set a bit somewhere in one of your structures to note that it needs to be called again next time (that is, that bit's value would be the same as 0 != ssl->out_left, but you would get the info from the return value of ssl_write() and store it on your side). By the way, this method should work just as well with 2.28.

@hanno-arm You're probably more familiar than me with plans for improving the ssl_write() API in the future, so if you have any thoughts to share, feel free! For context on this discussion, you can look at the "partial writes" paragraph here and the last paragraph here.

@mpg
Copy link
Contributor

mpg commented Jan 11, 2022

  • Please add a new callback for certificate selection which occurs after all
    TLS extensions have been processed,

Actually, I was just thinking: would it work to just keep the existing callback, but make sure it's called after all the extensions have been processed? I think that would be a backwards compatible change (people can't reasonably rely on the fact that all extensions have not been processed, since the SNI extension could very well be the last one), and would give you the assurance you need, that from the SNI callback you can query the state of ALPN - we'd need to make sure mbedtls_ssl_get_alpn_protocol() works at the point where the SNI callback is called (currently the documentation says it's only guaranteed to work once the handshake is complete, but I think it should work as soon as we've finished parsing the ClientHello extensions).

RFE: missing OCSP stapling support is a huge detriment to adoption of mbedTLS
on the server-side.

As Gilles said, full OCSP support is a major feature currently not on our roadmap, but I'm wondering it there is a small enough subset we could implement that would meet your needs for server-side stapling. I might be over-optimistic here, but perhaps there's a way we can keep handling of OCSP proper outside of Mbed TLS, and just handle the TLS extension itself: parse the client extension, invoke an application-provided callback to get the data to be stapled if any, then write the extension* back if applicable. That way, all the TLS layer has to do is generic extension handling and copying opaque bytes provided by the application, not having any knowledge of OCSP itself. Just thinking out loud, perhaps things are not so easy when looking deeper. But is there's a way to reduce the problem to something simple enough, and if you're interested in contributing a PR for that, then perhaps that could work. Otherwise, if more OCSP support is needed, then I'm afraid it's unlikely to happen soon, sorry.

*Edit: I went and read RFC 6066 sec. 8, and actually the server does not respond with an extension, but with a new handshake message CertificateStatus inserted right after the Certificate message. That's a bit harder to implement than just returning an extension. But I'm still under the impression that the TLS stack does not need to know about OCSP, and can just pass opaque bytes around: from the ClientHello extension to the calback, and then from the callback to the CertificateStatus message.

@gilles-peskine-arm
Copy link
Contributor

Actually, I was just thinking: would it work to just keep the existing callback, but make sure it's called after all the extensions have been processed? I think that would be a backwards compatible change (people can't reasonably rely on the fact that all extensions have not been processed, since the SNI extension could very well be the last one)

This is a behavior change, although since our documentation doesn't specify exactly when the various callbacks run, it wouldn't be a change in documented behavior and I'd be perfectly ok with doing this change in a non-LTS branch. Consider a server that is only intended to work with a specific client (not that rare in IoT) which happens to pass extensions in a certain order, and where the server runs two callbacks and expects them to be called in that order. If we change to running callbacks in an order that doesn't depend on what's on the write, this server may break.

@gstrauss
Copy link
Contributor Author

gstrauss commented Jan 11, 2022

@mpg: Thank you for the deeper analysis.

I am going to split the ssl->transform->cipher_ctx_enc discussion off into a separate pull request as that issue is less pressing than others and more of a distraction here.

More important: use of out_left.

If a write returns MBEDTLS_ERR_SSL_WANT_WRITE, ssl->MBEDTLS_PRIVATE(out_left) might be non-zero. (Aside: what about MBEDTLS_ERR_SSL_CRYPTO_IN_PROGRESS or MBEDTLS_ERR_SSL_ASYNC_IN_PROGRESS?) Checking out_len is used to flag that there is pending data which must be flushed, or else trying to send number of bytes from mbedtls_ssl_get_max_out_record_payload() might exceed the record size, which lighttpd tries not to do so that lighttpd can predictably know how many bytes were actually consumed from what lighttpd submitted.

Later on, lighttpd checks ssl->MBEDTLS_PRIVATE(out_left) again to see if it is still non-zero. The out_left pending data could (theoretically?) have been written if an intervening mbedtls_ssl_read() resulted in reading and writing. If that occurred, then lighttpd accounting marks the pending bytes as written without resubmitting those bytes to mbedtls. Now then, you might correct me and tell me that this will never occur with mbedtls. I am currently accessing ssl->MBEDTLS_PRIVATE(out_left) to be sure.

The mbedtls implementation choice of mbedtls_ssl_write() reflecting the caller's length argument back in certain cases (non-zero ssl->out_left) is what requires that the caller use the exact same arguments with mbedtls_ssl_write() after mbedtls_ssl_write() returns MBEDTLS_ERR_SSL_WANT_WRITE. Otherwise, the caller has no idea how much of the submitted data was actually consumed, since the return value might include previously submitted data still pending to be sent. If a new interface mbedtls_ssl_write_ex() took an extra parameter -- or an input/output pointer to (size_t *len) -- then the return value from mbedtls_ssl_write_ex() could be the bytes written to TLS stream, while the size_t *len output could be the same or could be larger to include length of data buffered by mbedtls, but not yet sent in TLS records. Alternatively, mbedtls_ssl_write_ex() might always return the number of bytes consumed, even if there is still data buffered and pending to be sent, and an additional pointer argument could be used to flag MBEDTLS_ERR_SSL_WANT_WRITE, and that the stream had data pending to be flushed. If there was always a way to know how much data mbedtls consumed from the caller, then there would not be a need to call mbedtls_ssl_write() with the exact same arguments as a prior call which returned MBEDTLS_ERR_SSL_WANT_WRITE.

For mbedtls 3.0.0, if ssl->out_left is to become private, then at the very least, I need an accessor to query if the mbedtls stream state is in the middle of sending a TLS record or not, i.e. if in the middle of a partial write. Does MBEDTLS_ERR_SSL_WANT_WRITE always signal a partial write, or can it be also be returned when ssl->out_len == 0?

Please add a new callback for certificate selection which occurs after all
TLS extensions have been processed,

SNI is the very first extension added to TLS, and has TLS extension id 0. It is often the first extension provided, and other extensions may rely on SNI having been provided, e.g. extensions to ALPN. That is why my request is for an additional callback after all TLS extensions have been processed, and which can be used for certificate selection and for any other purposes after all TLS extensions have been processed. In the future, TLS ECH (encrypted client hello) might be handled here, too. To be crystal clear about the need for a callback after TLS extensions have been processed: if such a callback is not implemented, I expect to have to intercept and process the TLS client hello myself, before calling mbedtls, and I would prefer not to write portions of a custom TLS library separate from mbedtls!

As Gilles said, full OCSP support is a major feature currently not on our roadmap, but I'm wondering it there is a small enough subset we could implement that would meet your needs for server-side stapling.

Yes, I agree with this line of thinking. For server-side, minimal support would add OCSP stapling information to the Server Hello if the application provided OCSP stapling info along with certificate selection. TLSv1.2 permits OCSP stapling info for the leaf cert in the chain, while TLSv1.3 extends this to allow OCSP stapling info for each certificate in the chain. Server-side parsing of OCSP stapling info from client certificate authentication could be implemented later (and is not currently supported by lighttpd mod_mbedtls). Adding selective OCSP stapling support is more than a few lines of code and not a simple patch, but might be a more manageable chunk. I might attempt this in the future, though not in the near-term.

@gstrauss
Copy link
Contributor Author

I am going to split the ssl->transform->cipher_ctx_enc discussion off into a separate pull request as that issue is less pressing than others and more of a distraction here.

Follow-up in PR #5417

@mpg
Copy link
Contributor

mpg commented Jan 12, 2022

Ok, just trying to sum up what hasn't been moved to its own issue/PR yet:

  • partial write and out_left - I'll respond to your latest comments soon
  • end-of-clienthello callback - I'll respond below as well
  • removal of MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO (and potentially other macros?) - I agree with your point, we'll keep that in mind for future removals of macros. Is there something we can do now, for example, does it help if we provide a compatibility #define in compat-2.x.h, or is it too late (because even if we add something in 3.2, you'll still have to take care of 3.0 and 3.1)?
  • mbedtls_x509_name::next_merged - it's not entirely clear to me if this is about usability or performance? If mbedtls_x509_dn_gets() but is just too inefficient, then perhaps upstreaming your improvements would be better for everyone that letting you (and possibly others) maintain improved versions. If it's not just about performance, or if you think your needs are very specific to your use case so that we can't design a better function that would work for you and others, then sure, we'll make next_merged public.

@mpg
Copy link
Contributor

mpg commented Jan 12, 2022

SNI is the very first extension added to TLS, and has TLS extension id 0. It is often the first extension provided, and other extensions may rely on SNI having been provided, e.g. extensions to ALPN.

That's a good point. If several callbacks all want to be the last to run, obviously that won't work, and we'll need a separate callback for that. However, unless I missed something, we don't have any callback related to ALPN, and actually the SNI callback is the only callback related to a ClientHello extension (except for the PSK callback, but I really can't imagine someone using both with an interaction between them).

So, I don't think ordering is an obstacle to the existing SNI callback playing the role of "end-of-ClientHello" callback.

OTOH, perhaps there are situations where it's desirable to have an "end-of-ClientHello" callback that's called no matter what (that is, even if the ClientHello didn't include the SNI extension).

The thing its, I'd like to avoid having more callbacks than necessary, as the complexity of testing and reasoning about the code tends to go up (often more than linearly) with the number of callbacks. OTOH, clearly we don't want to have less callbacks than necessary, if that results in situations like:

if such a callback is not implemented, I expect to have to intercept and process the TLS client hello myself, before calling mbedtls, and I would prefer not to write portions of a custom TLS library separate from mbedtls!

I want to re-assure you that we absolutely want to avoid that. I'm not pushing back against adding something to meet your needs, just trying to determine what is the best way to do that.

@mpg
Copy link
Contributor

mpg commented Jan 12, 2022

Does MBEDTLS_ERR_SSL_WANT_WRITE always signal a partial write, or can it be also be returned when ssl->out_left == 0?

Unless I'm mistaken, yes, "mbedtls_ssl_write() returned MBEDTLS_ERR_SSL_WANT_WRITE" always signals a partial write. Assuming this is correct, do you agree that you don't need access to out_left?

Here's my current reasoning for why MBEDTLS_ERR_SSL_WANT_WRITE can only be returned by mbedtls_ssl_write() when ssl->out_left != 0:

  1. There's nothing in the SSL module that returns WANT_WRITE, it can only be returned by the f_send callback (evidence: grep MBEDTLS_ERR_SSL_WANT_WRITE library/*.c).
  2. The f_send callback is only called in two places in the code (again, confirmed by grep), only one of which matter for mbedtls_ssl_write(): in mbedtls_ssl_flush_output() where it's inside a while( ssl->out_left > 0 ) loop. There are only two ways to exit that loop early: if f_send() returned an error code (including WANT_WRITE) or a non-sensible success value.

So, there's only one source of WANT_WRITE from ssl_write, and it implies that ssl->out_left > 0.

@gstrauss
Copy link
Contributor Author

gstrauss commented Jan 13, 2022

partial write and out_left - I'll respond to your latest comments soon

Unless I'm mistaken, yes, "mbedtls_ssl_write() returned MBEDTLS_ERR_SSL_WANT_WRITE" always signals a partial write. Assuming this is correct, do you agree that you don't need access to out_left?

Assuming that is correct, then yes. I walked through the code, too, and I agree with your assessment in that scope.

Related, I am still uneasy with the internals since other parts of the code theoretically can modify out_left. mbedtls_write_real() and mbedtls_ssl_send_alert_message() both may call mbedtls_ssl_write_record() after handshake is done. (I am ignoring a few funcs which may call that before the handshake is done.) In the case where anything that called mbedtls_ssl_write_record() received MBEDTLS_ERR_SSL_WANT_WRITE, that is the one function that needs to call mbedtls_ssl_flush_output() until it succeeds. If that does not happen, then mbedtls_ssl_write() might incorrectly reflect back its len parameter when len was not actually written. This probably does not happen in normal operation, but easily could if TLS alerts are sent besides TLS alert Close Notify.

Overall, I am uneasy with the delayed-action and the return value from mbedtls_ssl_write() 'len' not necessarily matching reality of what just happened, since mbedtls_ssl_write() assumes that it has been called with exactly the same arguments (or will be called with exactly the same arguments next time), and this might not be true if alerts were sent in between.

I would prefer more deterministic behavior. If out_left == 0 and I submit <= mbedtls_ssl_get_max_out_record_payload() to mbedtls_ssl_write_record(), I am guaranteed that a record will be encrypted to internal buffers (and I theoretically no longer have to hold onto that data myself). If I had another interface that I could use to check if any data was pending (out_left != 0), and an interface to flush any buffers, if pending, then I could be sure that my data was written, and I could ensure that my data has been flushed to f_send().

If I wanted to not rely on mbedtls for this, I might achieve this by writing my own f_send() to replace mbedtls_net_send(), and have my custom function buffer records and never return MBEDTLS_ERR_SSL_WANT_WRITE, and keep track myself and avoid calling mbedtls_ssl_write() if I might block. I think it might be better if mbedtls provided some additional interfaces instead. It is unlikely that every caller of mbedtls_ssl_write_record() can ensure that flushing succeeds, or that the same function will be called again before any other function which might call mbedtls_ssl_write_record().

@gstrauss
Copy link
Contributor Author

end-of-clienthello callback - I'll respond below as well

OTOH, perhaps there are situations where it's desirable to have an "end-of-ClientHello" callback that's called no matter what (that is, even if the ClientHello didn't include the SNI extension).

Yes, end of ClientHello callback is very desirable. End of Client Hello callback called unconditionally was added in openssl 1.0.2 and I assure you that it has been put to good use. I loudly champion a similar addition to mbedtls.

I want to re-assure you that we absolutely want to avoid that. I'm not pushing back against adding something to meet your needs, just trying to determine what is the best way to do that.

I appreciate the conversation and do feel like I am being heard. Thank you.

As extensions have been added to TLS, it has become clear that some extensions may depend on others. Unless I am mistaken, TLS extension ordering is not defined by the specification. Even were it required, then for security reasons, it should not be relied upon.

  • SNI results in preliminary certificate selection
    If SNI is not provided, then a default needs to have been set in the ssl config.
  • ALPN "acme/tls-1" support requires SNI, but results in selecting a different certificate from the one selected from the SNI callback, which is something not supported by mbedtls if SNI callback has already been called and a certificate has already been assigned in that callback. (lighttpd mod_mbedtls currently walks the TLS extensions prior to mbedtls, and looks specifically for ALPN before mbedtls is directed to step the handshake.)
  • Not currently supported in lighttpd, but there can be logic to select different certificates based on heuristics for client support for RSA or ECC certificates.
  • In the future, support for ECH (encrypted client hello) will definitely further complicate things. A desire for sanity suggests minimal TLS extension parsing and validation while mbedtls is stepping through the TLS extension. Then, after all TLS extensions have been parsed, perform processing and evaluation, including certificate selection, before TLS Server Hello.

@gstrauss
Copy link
Contributor Author

gstrauss commented Jan 13, 2022

removal of MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO (and potentially other macros?) - I agree with your point, we'll keep that in mind for future removals of macros. Is there something we can do now, for example, does it help if we provide a compatibility #define in compat-2.x.h, or is it too late (because even if we add something in 3.2, you'll still have to take care of 3.0 and 3.1)?

With the changes in mbedtls 3.0, lighttpd mod_mbedtls does not build. Among other reasons, MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO is not defined. Also, MBEDTLS_PRIVATE must be added in many places before lighttpd mod_mbedtls will build without warnings or errors. Some other migrations are necessary, too. In short, nobody is using lighttpd mod_mbedtls with mbedtls 3.0, because trying to do so does not work.

I am capable of working around the missing MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO. ;) However, adding a compatibility #define in compat-2.x.h would be a friendly thing to do for others who have not yet attempted porting to mbedtls 3.x.

[Edit:] What do you think of this?
#define MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO MBEDTLS_ERR_SSL_ILLEGAL_PARAMETER

In my code, I ported existing use of MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO to values introduced in mbedtls 3.0, and have backward compatibility defines for when lighttpd mod_mbedtls is compiled against older versions. This is specific to lighttpd mod_mbedtls:

#if MBEDTLS_VERSION_NUMBER < 0x03000000 /* mbedtls 3.00.0 */
#define MBEDTLS_ERR_SSL_ILLEGAL_PARAMETER MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO
#define MBEDTLS_ERR_SSL_DECODE_ERROR      MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO
#endif

@gstrauss
Copy link
Contributor Author

gstrauss commented Jan 13, 2022

mbedtls_x509_name::next_merged - it's not entirely clear to me if this is about usability or performance? If mbedtls_x509_dn_gets() but is just too inefficient, then perhaps upstreaming your improvements would be better for everyone that letting you (and possibly others) maintain improved versions. If it's not just about performance, or if you think your needs are very specific to your use case so that we can't design a better function that would work for you and others, then sure, we'll make next_merged public.

The proliferation of ASN.1 parsers is an historical source of security bugs. lighttpd is trying to use the mbedtls_x509_name linked list and my understanding is that checking next_merged is required to properly walk and interpret the mbedtls_x509_name linked list.

[Edit:] Also, yes, mbedtls_x509_dn_gets() use of mbedtls_snprintf() is inefficient and slow.

@mpg
Copy link
Contributor

mpg commented Jan 13, 2022

Yes, end of ClientHello callback is very desirable. End of Client Hello callback called unconditionally was added in openssl 1.0.2 and I assure you that it has been put to good use. I loudly champion a similar addition to mbedtls.

#5430

@mpg
Copy link
Contributor

mpg commented Jan 13, 2022

[Edit:] What do you think of this?
#define MBEDTLS_ERR_SSL_BAD_HS_CLIENT_HELLO MBEDTLS_ERR_SSL_ILLEGAL_PARAMETER

Initially, I'm in favour, so I started writing up an ticket about that, trying to generalize, when I realized that I don't think it would be really that helpful: there were several instances of BAD_HS_CLIENT_HELLO, and they have not all been replace with the same error code, actually the 6 following errors codes have been used in different places:

MBEDTLS_ERR_SSL_BAD_PROTOCOL_VERSION
MBEDTLS_ERR_SSL_DECODE_ERROR
MBEDTLS_ERR_SSL_HANDSHAKE_FAILURE
MBEDTLS_ERR_SSL_ILLEGAL_PARAMETER
MBEDTLS_ERR_SSL_UNEXPECTED_MESSAGE
MBEDTLS_ERR_SSL_UNRECOGNIZED_NAME

So, providing a #define to one of these names would not help that much as it would miss the cases where the others are now returned. Conversely, those error codes are also used in other places than ClientHello.

As you say, you (and hopefully other maintainers of software using Mbed TLS) are capable of working around the change on your own, so it's more about being friendly and avoiding the need to everyone to come up with their own work-around. But in that case, I'm now thinking that there's no way to make this transparent for everyone, because as there's no 1-1 correspondence between old and new codes (in either direction), different people will probably want to use different workarounds depending on how they used those specific error codes.

So, I'm afraid I'm inclined to do nothing on our side here, and just let people implement workarounds that work for them, unless someone can suggest a better solution.

@gstrauss
Copy link
Contributor Author

So, I'm afraid I'm inclined to do nothing on our side here, and just let people implement workarounds that work for them

That's the way it is. Please do keep this in mind for the future when planning to remove public values (including enums), as removing them causes code using them to fail to compile. This may be desirable to force changes, but if a compatible #define can be provided, that is preferable to compilation failures.

@gstrauss
Copy link
Contributor Author

FYI: #4183 "Simplify SSL write API for Mbed TLS 3" requests enhancements to mbedtls_ssl_write() for DTLS and mentions some of the same limitations discussed here (for which I am reaching int out_left)

@mpg
Copy link
Contributor

mpg commented Jan 14, 2022

FYI: #4183 "Simplify SSL write API for Mbed TLS 3" requests enhancements to mbedtls_ssl_write() for DTLS and mentions some of the same limitations discussed here (for which I am reaching int out_left)

Ah, I remembered Hanno had some plans about that, but I didn't remember it was written down in an issue. Thanks for finding it.

lighttpd-git pushed a commit to lighttpd/lighttpd1.4 that referenced this issue Jan 19, 2022
x-ref:
  "issues migrating lighttpd mod_mbedtls to mbedtls 3.0.0"
  Mbed-TLS/mbedtls#5331
lighttpd-git pushed a commit to lighttpd/lighttpd1.4 that referenced this issue Jan 19, 2022
remove use of ssl->out_left in mbedtls 3.0.0

Discussed in Mbed-TLS/mbedtls#5331,
the current implementations of mbedtls_net_send() and mbedtls_net_recv()
return MBEDTLS_ERR_SSL_WANT_WRITE only when there is a partial write
(though there is theoretical issue if writes are mixed with TLS alerts)

x-ref:
  "issues migrating lighttpd mod_mbedtls to mbedtls 3.0.0"
  Mbed-TLS/mbedtls#5331
@gstrauss
Copy link
Contributor Author

With the release of mbedtls 3.2.0 which includes updates to address many of the issues raised herein, lighttpd mod_mbedtls no longer uses MBEDTLS_PRIVATE, so I think that this issue can be closed. Thank you.

Any remaining loose ends can be followed up in new issues, but are not blocking lighttpd mod_mbedtls use of mbedtls 3.x APIs.

@mpg
Copy link
Contributor

mpg commented Jul 22, 2022

Thanks for letting us know! And for all your contributions of course, very much appreciated!

@mpg mpg closed this as completed Jul 22, 2022
lighttpd-git pushed a commit to lighttpd/lighttpd1.4 that referenced this issue Oct 23, 2023
handle mbedtls 3.x partial write better by saving
partial write bytes in hctx->pending_write only if
mbedtls_ssl_write() returned MBEDTLS_ERR_SSL_WANT_WRITE

(fixes crash due to subsequent NULL pointer read if
 mbedtls_ssl_write() error was MBEDTLS_ERR_SSL_WANT_READ)

x-ref:
  "issues migrating lighttpd mod_mbedtls to mbedtls 3.0.0"
  Mbed-TLS/mbedtls#5331
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants