Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send session ID when resuming TLS 1.3 sessions. #464

Closed
wants to merge 1 commit into from

Conversation

dhobsd
Copy link

@dhobsd dhobsd commented Feb 3, 2021

Rustls implements client- and server-side requirements for RFC8446 Appendix D.4, except it forgets to send a session ID in the ClientHello when also resuming TLS 1.3 sessions. This PR addresses that issue.

@@ -18,6 +18,7 @@ COMMIT=47a6f5b4bf4d7acd8d5f7d43cb9c94335cec1c60
git fetch --depth=1 https://boringssl.googlesource.com/boringssl $COMMIT
git checkout $COMMIT
patch -p1 < ../patches/testerrormap.diff
patch -p1 < ../patches/fix-tls13sessionid.diff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting COMMIT=c5e4538e3b0bcfdda30d2dabecbee952faf54c1c above is a better approach to getting the patch. Are there downsides to this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give this a shot. It's possible that it requires more shim / config changes.

Copy link
Author

@dhobsd dhobsd Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 58 failing tests:

  • The current shim doesn't understand the -quic flag, failing one test.
  • The current shim doesn't understand many new flags to tests ALPS.
  • Same for new flags to do GREASE ClientHelloes.
  • There are several new tests failing with different error classifications than expected. (I didn't attempt to validate or classify these.)

Do you prefer to update in a different PR before accepting this one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My own personal preference, which isn't official in any capacity in this project, is that we do the work that allows us to independently (from this PR) update to the newer version of Bogo. In general, we should avoid patching Bogo when we can. I filed #465 to track updating to the new version of Bogo.

@briansmith
Copy link
Contributor

Additionally, section D.4 suggests the client to provide "a non-empty
session ID in the ClientHello" both for establishing a new session and
for resuming a session to mitigate against middlebox incompatibility.
TLS 1.3 session resumption previously used an empty session id derived
from the returned ClientSessionValue.

Maybe I'm misunderstanding, but IMO it would be better to address this in a separate PR. I would expect there should be a test that verifies that we follow the section D.4 recommendation independently from any test that verifies the HRR is done correctly.

This behavior is additionally problematic for a StoresClientSessions
implementation that implements RFC8446 appendix C.4, as subsequent cache
lookups are guaranteed to return different tickets and session IDs.

Following the Section C.4 recommendation should be done by Rustls internally; The StoresClientSessions implementation shouldn't need to do anything for it.

@briansmith
Copy link
Contributor

prohibited per section RFC8446 section 4.1.2.

Which BoGo test(s) tests this?

section D.4

Which Bogo test(s) test this?

RFC8446 appendix C.4

Which Bogo test(s) test this?

@dhobsd
Copy link
Author

dhobsd commented Feb 3, 2021

Thanks for the comments.

Maybe I'm misunderstanding, but IMO it would be better to address this in a separate PR. I would expect there should be a test that verifies that we follow the section D.4 recommendation independently from any test that verifies the HRR is done correctly.

I agree, though I think this is the right way to express the "always provides a non-empty session ID" part of D.4. In getting here, I tried a few different approaches, but it ultimately felt incorrect to not address the "try-to-look-up-a-session-when-replying-to-HRR" part.

One solution is to supply the ClientSessionValue with a random session ID rather than using SessionID::empty(). This "works", but is susceptible to the race: if a concurrent resumed session receives a NewSessionTicket message in-between the initial ClientHello and the HelloRetryRequest, the session ID will change because we're always calling find_session. This breaks HRR. Bogo tests do not find this because they do not exercise the race condition.

The other approach I took was to do something like:

...
    if retryreq.is_none() {
        handshake.resuming_session = find_session(sess, handshake.dns_name.as_ref());
    }
    let (session_id, ticket, resume_version) = if handshake.resuming_session.is_some() {
        let resuming = handshake
            .resuming_session
            .as_mut()
            .unwrap();
        if retryreq.is_none() && !sess.common.is_quic() {
            random_sessionid_for_ticket(resuming);
        }
...

This avoids the race, but feels and reads awkwardly: the whole is_some bit after that also really only needs to be executed only for the initial ClientHello, and once that moves up into the retryreq.is_none() case, the rest of the change sort of falls out.

Following the Section C.4 recommendation should be done by Rustls internally; The StoresClientSessions implementation shouldn't need to do anything for it.

Thanks, I was hoping this would be the case. I have a StoresClientSessions implementation that does this, but it does require looking at the ClientSessionValue, which feels gross even if it were to live internally. Let's move this discussion into a separate issue?

prohibited per section RFC8446 section 4.1.2.

Which BoGo test(s) tests this?

There are numerous HelloRetryRequest tests. These are trivially tripped if you implement this in such a way that e.g. TLS 1.3 resumptions don't use a session_id (which is the current behavior) and you call random_sessionid_for_ticket without checking whether you're servicing a HRR. I don't think it's reasonable to expect bogo to exercise the race condition between NewSessionTicket and HRR.

section D.4

Which Bogo test(s) test this?

EmptySessionID-TLS13, which is the test fixed by the patch currently associated with this commit. D.4 has two bullet points for TLS 1.3 clients that this test checks: the early CCS and the non-empty session_id.

RFC8446 appendix C.4

Which Bogo test(s) test this?

I poked through and didn't find one explicitly checking that a client never re-sends a ticket. Maybe that's because you can only prove a client didn't send a ticket in N connections, but not that it never will. @davidben, do you have any commentary here?

@briansmith
Copy link
Contributor

Thanks, I was hoping this would be the case. I have a StoresClientSessions implementation that does this, but it does require looking at the ClientSessionValue, which feels gross even if it were to live internally. Let's move this discussion into a separate issue?

Yes, could you file the issue and/or PR for it? I suggest that we narrow the scope of this PR to just the HRR race issue.

@dhobsd
Copy link
Author

dhobsd commented Feb 3, 2021

SGTM, will get to that this afternoon.

@briansmith
Copy link
Contributor

EmptySessionID-TLS13, which is the test fixed by the patch currently associated with this commit. D.4 has two bullet points for TLS 1.3 clients that this test checks: the early CCS and the non-empty session_id.

If we factor out the D.4 stuff from this PR then we can remove the patch to Bogo, right? I think that seems like a good approach.

I don't think it's reasonable to expect bogo to exercise the race condition between NewSessionTicket and HRR.

I agree. It would be good to have a test for this within Rustls's own test suite though, especially since we've already experienced a bug here. I believe the test could be written by using a custom version of the session storage interface that simulates the race.

@dhobsd
Copy link
Author

dhobsd commented Feb 4, 2021

If we factor out the D.4 stuff from this PR then we can remove the patch to Bogo, right? I think that seems like a good approach.

Sure. It sounds like we'll need to update bogo before D.4 is landed. (I ran into this by accident of fixing D.4.) I'd like to land the D.4 fix, though we've got a vendored version that's a couple releases old, so not entirely sure when I'll have time to update bogo beforehand.

I don't think it's reasonable to expect bogo to exercise the race condition between NewSessionTicket and HRR.

I agree. It would be good to have a test for this within Rustls's own test suite though, especially since we've already experienced a bug here. I believe the test could be written by using a custom version of the session storage interface that simulates the race.

Yes, that will do it. Are you suggesting this test in api.rs? I had to tend to some stuff so will likely have to look into this more tomorrow.

@davidben
Copy link

davidben commented Feb 4, 2021

Following the Section C.4 recommendation should be done by Rustls internally; The StoresClientSessions implementation shouldn't need to do anything for it.

Certainly the default one should implement it. But if someone writes their own, I don't think there's much Rustls can do about it. (I guess you could maintain a global hash table of session you ever sent. That would waste memory and still not capture process restarts with persistent session caches. It also wouldn't work great since C.4 caches should keep multiple sessions per key. You could call put immediately after get to clobber things, but that would have atomicity problems.)

I poked through and didn't find one explicitly checking that a client never re-sends a ticket. Maybe that's because you can only prove a client didn't send a ticket in N connections, but not that it never will. @davidben, do you have any commentary here?

There isn't a test because it's not applicable to BoringSSL's API. Our API (mostly by way of OpenSSL descent) punts all client session cache management to the caller. We don't even do cache lookups and instead ask the caller lookup and call SSL_set_session at setup time. That means C.4 is out of our hands and is the application's responsibility. Though we do provide SSL_SESSION_should_be_single_use to help here. (Telling the API if it should be single-use is useful since there's no benefit to making TLS 1.2 tickets single-use. Doing so can hurt resumption rates since 1.2 servers don't issue multiple tickets.)

(This does mean BoringSSL-using applications written against TLS 1.2 will do TLS 1.3 without C.4. Not ideal, but the security properties still hold, and it's no less leaky than TLS 1.2. And in contexts like backend-to-backend communication, C.4 is less relevant.)

@davidben
Copy link

davidben commented Feb 4, 2021

There are numerous HelloRetryRequest tests. These are trivially tripped if you implement this in such a way that e.g. TLS 1.3 resumptions don't use a session_id (which is the current behavior) and you call random_sessionid_for_ticket without checking whether you're servicing a HRR. I don't think it's reasonable to expect bogo to exercise the race condition between NewSessionTicket and HRR.

Yeah, bogo isn't really in a good position to test that, especially with BoringSSL's API shape. I mostly consider this yet another indication that we got HRR's design wrong. I think folks optimized too much for stateless mode (at this point only useful for DTLS), especially with some false starts around clever tricks stateless servers might play, and the result is a bit of a mess. If TLS 1.4 happens, I have a sketch of a replacement design in my back pocket that I'll probably pull out. But not much that can be done now. :-(

@dhobsd
Copy link
Author

dhobsd commented Feb 4, 2021

Thanks, David!

Are you suggesting this test in api.rs?

Looking at this a bit more, I'm really not sure how I would get that to happen; I don't think I can get the client and server to disagree on kx groups with the public API. (If at all -- it looks like the server will always match against suites::KeyExchange::supported_groups(), and the client will always prefer X25519.) It otherwise looks like unit-testy-things around this are in msgs.

I think the easiest test is effectively unit testing emit_client_hello_for_retry with some specific inputs, but I don't really see precedence for doing this. Suggestions would be appreciated.

@dhobsd
Copy link
Author

dhobsd commented Feb 4, 2021

Yes, could you file the issue and/or PR for it?

Yep, opened #466.

@briansmith
Copy link
Contributor

Regarding testing this, maybe we can extend Bogo to facilitate testing this. If we take the Bogo test that verifies that the same session ticket is used in both the initial and restarted handshake, and then tweak it to give it a new name indicating that the test is supposed to test the race condition, then the Bogo shim can recognize that test by name and use the mock session cache?

@briansmith
Copy link
Contributor

There are now separate issues about implementing the D.4 fake session ID (issue #470) and C.4 semantics (issue #466). IMO the PR is very messy with all these things mixed into one PR. We should probably close this PR and make new, separate PRs, one for each issue, especially since each issue and each proposed fixed needs its own careful analysis and testing.

@dhobsd
Copy link
Author

dhobsd commented Feb 4, 2021

If closing this PR, we'd need an issue for the HRR race as well.

AIUI a modified bogo test of that sort would require maintaining a patch to bogo. Maybe I'm also missing something obvious, but I don't see where the shim does anything with a test name, or where runTest sends test.name to the shim.

@briansmith
Copy link
Contributor

If closing this PR, we'd need an issue for the HRR race as well.

I was suggesting that we factor out all the D.4 and C.4 stuff from this PR, and then post what's left into another PR.

AIUI a modified bogo test of that sort would require maintaining a patch to bogo.

I would expect the patch to be sent upstream to BoringSSL so that we wouldn't need to maintain a patch in Rustls.

Maybe I'm also missing something obvious, but I don't see where the shim does anything with a test name, or where runTest sends test.name to the shim.

We'd work with BoringSSL team to add the infrastructure for this kind of test to Bogo. They probably have better suggestions than mine.

@dhobsd
Copy link
Author

dhobsd commented Feb 4, 2021

I feel like David said it wasn't an appropriate test for bogo in an earlier comment because it can't be meaningfully tested with BoringSSL's API. Maybe I'm misunderstanding what you mean here?

@briansmith
Copy link
Contributor

I feel like David said it wasn't an appropriate test for bogo in an earlier comment because it can't be meaningfully tested with BoringSSL's API. Maybe I'm misunderstanding what you mean here?

If that's the case, we should build a mechanism to add new BoGo tests that aren't accepted upstream, as this is unlikely to be the only time we need such tests, IMO.

BTW, I'm not wedded to my idea. If you have a better idea for how to test this, that's also OK.

@davidben
Copy link

davidben commented Feb 4, 2021

Regarding testing this, maybe we can extend Bogo to facilitate testing this. If we take the Bogo test that verifies that the same session ticket is used in both the initial and restarted handshake,

You have to update it, but every test that happens to do HRR enforces it now:
https://boringssl-review.googlesource.com/c/boringssl/+/43364.

and then tweak it to give it a new name indicating that the test is supposed to test the race condition, then the Bogo shim can recognize that test by name and use the mock session cache?

As soon as you all implement C.4, I suspect Bogo will test this for free because the single-use-ness of the ticket effectively forces the race condition. I.e. all these issues aren't quite separable. :-)

That said, yeah, ultimately Bogo is BoringSSL's test suite. It tests against BoringSSL's expectations, just as Rustls' tests test against Rustls' expectations. Expectations for TLS are mostly aligned, so Bogo's happily been useful for other folks. But we won't sacrifice our own testing and maintenance for portability, just as you all won't sacrifice that of your unit tests.

(There is probably room for better testing fidelity around resumptions, but I was actually thinking the other direction. The shim semantics are currently "resume some session you got last time", whereas it'd be better to target the exact one so we can verify all issued tickets work, etc.)

If that's the case, we should build a mechanism to add new BoGo tests that aren't accepted upstream, as this is unlikely to be the only time we need such tests, IMO.

I mean, it seems like you're able to carry patches. :-) Though I think you won't need it here (see above). Or you could just write a plain unit test: have rustls talk to itself, mock out the session cache to count how many times you query it. If that number is more than 1, fail.

@briansmith
Copy link
Contributor

briansmith commented Feb 5, 2021

As soon as you all implement C.4, I suspect Bogo will test this for free because the single-use-ness of the ticket effectively forces the race condition. I.e. all these issues aren't quite separable. :-)

That sounds reasonable to me. If we correctly implement C.4 within Rustls as I proposed in #466 (comment) (as opposed to delegating the responsibility to the cache implementation) then this issue gets fixed automatically, right? In that case, it would be fine to use the existing Bogo test that verifies the resumption case of "In that case [of the server sending HelloRetryRequest], the client MUST send the same ClientHello without modification."

@dhobsd
Copy link
Author

dhobsd commented Feb 5, 2021

then this issue gets fixed automatically

It doesn't quite. This is one of the reasons that this initial patch I sent fixes the HRR race and D.4 at the same time.

The detectability depends on whether the server sends the client-under-test 0, 1, or >1 NewSessionTicket messages, and whether D.4 is implemented correctly.

Currently, D.4 is not sending session ID on resumption, so whether or not C.4 is implemented, bogo will never catch this:

  • If a server sends 0 NewSessionTicket messages, we don't try to resume, so there's no issue.
  • If 1, the second call to find_session returns nothing and the "not resuming any session" ends up keeping the session_id the same because it was already set on the handshake.
  • If it sends >1, then the current implementation still doesn't trip the test because random_sessionid_for_ticket isn't called for TLS 1.3.

This means that bogo doesn't test C.4 for free. It only does this if C.4 is implemented and D.4 is implemented. To implement D.4, you have to first solve this race, because then bogo tests this for you automatically :)

@dhobsd
Copy link
Author

dhobsd commented Feb 5, 2021

Said differently: when all three things are fixed, bogo effectively tests them all, but when you do 2 out of 3, it's not necessarily clear that they are. HRR race can be done without fixing D.4, but not easily tested without unit test. D.4 can't be implemented without HRR race fix, bogo finds that. C.4 can be implemented and tested with HRR race fixed, but you can't test it in isolation if D.4 is implemented first.

FWIW, in Fuchsia, I implemented HRR and D.4 together and opted to test those with bogo. I tested C.4 with unit tests because C.4 defines a cache behavior, and that can be observed via the cache's API. Regardless, I definitely understand the desire to keep a clean separation between these issues, and I'm happy to work with y'all to get the fixes upstreamed in a way y'all are comfortable with. (And sorry that the communication around this has been circuitous!)

@ctz
Copy link
Member

ctz commented Feb 6, 2021

Bogo is updated, so please rebase and reinstate TLS13SessionID-TLS13 in bogo/config.json (which I've left disabled for now). Thanks for your work on this!

@dhobsd
Copy link
Author

dhobsd commented Feb 6, 2021

Thanks! I'll be able to get to this on Monday!

@dhobsd
Copy link
Author

dhobsd commented Feb 8, 2021

Done! I've updated the commit message to be a bit less of a novel and a bit clearer on why these changes are included together. Thanks for getting bogo updated.

Copy link
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind separating the code refactoring from the material change here (into separate commits)? That would make the review easier, I think.

@dhobsd
Copy link
Author

dhobsd commented Feb 9, 2021

Sure, will do.

RFC8664 Appendix D.4 recommends always filling in the session ID in the
ClientHello message for compatibility with middleboxes that do TLS
inspection. Rustls implements the rest of the recommendations for D.4,
but was not filling this field in when resuming a session.

This change is tested by `TLS13SessionID-TLS13` in bogo, which is
enabled. The HelloRetryRequest semantics are tested by numerous TLS 1.3
HRR and resumption tests in bogo.
@dhobsd
Copy link
Author

dhobsd commented Feb 9, 2021

This is done, though now I wonder if we're back to Brian's suggestion to split these commits into different PRs. Notably, the HRR race is never relevant in this history. Happy to do whatever, just lemme know.

@briansmith
Copy link
Contributor

Notably, the HRR race is never relevant in this history.

Yeah, that's because none of the tests are really testing the race. Basically we're just preceding without a test for it, which isn't ideal, but understandable.

@dhobsd
Copy link
Author

dhobsd commented Feb 10, 2021

This seems like it's a bit problematic for different folks for different reasons, so I'll go ahead and move ca5f8b9 to a different PR that comes with a test in api.rs, making use of a bespoke session cache. (I forgot that this isn't really tenable because there's no good way to test HRR in the API tests. In any case, this seems simpler to accept, and I'll figure out something else to do with the refactor.)

@dhobsd dhobsd changed the title Fix race around HRR and properly implement D.4 Send session ID when resuming TLS 1.3 sessions. Feb 10, 2021
// handling a HelloRetryRequest, make sure to use the same session ID used in the
// initial ClientHello (RFC8446 section 4.1.2).
if retryreq.is_none() && handshake.session_id.is_empty() {
handshake.session_id = random_sessionid();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If handshake.session_id.is_empty() then we must not be retrying, because if we were retrying then we'd have already set handshake.session_id, right? The code isn't wrong but the redundant check makes the code less clear.

More importantly, it's not obvious when reviewing this code in GitHub unless you expand the context, but this is copy-pasta from original lines 212-214. We should instead refactor the code to avoid the duplication and be easier to understand.

Please take a look at PR #502 and let me know what you think.

Also, I now understand why you say all these things are so interrelated. :)

@briansmith
Copy link
Contributor

This seems like it's a bit problematic for different folks for different reasons

Sorry, I hope my comments didn't come across as negative. What I was trying to say is that if we fix the underlying issue then we won't have a test specifically for the race, but I am OK with not having a test for the race. I guess the only test that would make sense would be along the lines of "ensure we only query the session cache once per handshakel" that might be a useful test, but as I was trying to say before, I think if it's hard to add such a test, we shouldn't feel obliged to.

@dhobsd
Copy link
Author

dhobsd commented Feb 12, 2021

No worries, I'm relatively new to Rust (let alone rustls) but not OSS, so it's somewhat expected. #502 looks like a much better approach. Closing this PR.

@dhobsd dhobsd closed this Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants