Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSSL 3.0 #1434

Closed
wants to merge 5 commits into from
Closed

OpenSSL 3.0 #1434

wants to merge 5 commits into from

Conversation

thomaslee
Copy link
Contributor

Debian is preparing to transition to OpenSSL 3.0 and we ran into two issues with this in capnproto:

  1. Failing tests attributable to openssl error messages changing slightly
  2. Lots of "unimplemented bioctrl" warnings, mostly associated with kTLS

case BIO_CTRL_EOF:
// Queries for info and EOF
return 0;
#if OPENSSL_VERSION_NUMBER >= 0x30000000L && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could just be #ifdef BIO_CTRL_GET_KTLS_SEND? That way if/when BoringSSL adds support, we won't need to update this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, sure.

@@ -369,6 +369,18 @@ private:
switch (cmd) {
case BIO_CTRL_FLUSH:
return 1;
case BIO_CTRL_INFO:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suppose there are any, like, docs describing these?

BIO_CTRL_INFO and BIO_CTRL_EOF don't seem to be new, but I've never seen the warning log for them...

Copy link
Contributor Author

@thomaslee thomaslee Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIO_CTRL_INFO and BIO_CTRL_EOF don't seem to be new, but I've never seen the warning log for them...

I just ran the tests against both 1.1.1 and 3.0.1 again to be sure: link. Neither are called using Debian unstable's 1.1.1 build and only BIO_CTRL_EOF is called in Debian's 3.0.1. No calls to BIO_CTRL_INFO in either release. I grepped the full build logs just to be certain:

$ grep bio_ctrl capnproto*.build
capnproto_0.9.1-4_arm64.build:kj/compat/tls.c++:377: unimplemented bio_ctrl; cmd = 2
capnproto_0.9.1-4_arm64.build:kj/compat/tls.c++:377: unimplemented bio_ctrl; cmd = 2

So I must have messed something up when I was testing the changes and prepping the patch, looks like it might just be some subtle change in semantics around polling the EOF bit in the BIO. I'll update the PR to remove the special casing of BIO_CTRL_INFO.

I don't suppose there are any, like, docs describing these?

Yeah, the OpenSSL docs are painfully terse. You probably know more about OpenSSL than I do but my understanding is that bioctrls are the BIO (OpenSSL I/O abstraction) equivalent of ioctls: openssl calls the bioctrl callback to query the state of the BIO. The best "docs" I was able to find was the openssl sources unfortunately, but you can typically tie the bioctrls back to macros in bio.h that do have some useful (if minimal) docs.

For example, it's irrelevant now that we've confirmed it's not being hit in tests but BIO_CTRL_INFO drives BIO_get_mem_data which seems to be used to fetch a pointer into buffer memory in things like the mem BIO (and/or "current file pointer position" for unbuffered stuff like the fd BIO? ... seems dangerous to mix the two now that I look at it, but what do I know.). Here are the docs for BIO_get_mem_data, among others.

BIO_CTRL_EOF unsurprisingly drives BIO_eof, so just OpenSSL polling to see if the BIO is in an "EOF" state. Some docs for BIO_eof.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess the question then, is: Are we doing anything wrong by implementing BIO_CTRL_EOF to always return zero? Does zero mean "I don't know", or does it mean "No, not at EOF"? If the latter, then is it bad to lie when we actually are at EOF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full backtrace from within bioCtrl while running capnproto tests show the BIO_eof checks originate from ssl3_read_n in both tests: relevant openssl 3.0.1 source

This particular case seems like a premature EOF check while accepting a new TLS connection. If we lie about EOF we'll just keep retrying the read even though the underlying socket might be dead. At the same time I suspect we always would have lied about the EOF check in 1.1.1 if we tickled a code path that happened to trigger that bioctrl, (though we would have emitted a warning if that happened).

So if we wanted to be safe, is there an easy way for us to detect stream EOF for a TlsConnection from within bioCtrl that might make sense here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do, no. So let's go with a comment like:

OpenSSL sometimes calls this, hoping to probe whether the stream is at EOF. We don't have any way to check this in terms of KJ interfaces. But if we lie and say "no", OpenSSL will keep trying reads, and if the read itself reports EOF, OpenSSL accepts that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, tests fail because an exception is thrown:

failed: OpenSSL error; message = error:0A000126:SSL routines::unexpected eof while reading

So throwing an exception on unexpected EOF would seem to be a change in semantics, but guess that could be considered a bug & we just change the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd argue this was a bug before, possibly even a mild security problem (truncation attacks). We should fix the test to properly shut down or to expect the exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kentonv I should clarify that the test failure will only occur against openssl 3.0, so we're talking about different semantics building against openssl 1.1 vs. 3.0. If you're still OK with that, I'll make it happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Yes I still think the behavior is desirable. We'll have to design the test so it succeeds either way, which I suppose means that it should always do a clean shutdown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay looks like the exception is being routed through TlsConnectionReceiver::taskFailed ... my repro case on top of all the Debian build tooling is currently using 0.9.1 which is missing the new callback in TlsConnectionReceiver that I would be able to use to capture the exception when building against OpenSSL 3.0. Planning to backport the change introducing the callback to 0.9.1 in Debian just to get this change passing tests against OpenSSL 3.0 then I'll work those changes back to capnproto latest.

@pepijndevos
Copy link

For what it's worth I'm also in a situation where using openssl 1.1 is causing issues. So is the current status that the tests have to be updated?

@thomaslee
Copy link
Contributor Author

@pepijndevos that's correct, the tests need to be modified and I've been slow to get back to this after failing to force a clean shutdown in the tests. I'll see if I can take the route of simply expecting the exception instead a little later today.

@kentonv kentonv force-pushed the master branch 2 times, most recently from c35c724 to 6ca3362 Compare August 3, 2022 13:42
@h-vetinari
Copy link

Hey all, what's the status of this PR?

When trying to package capnproto against OpenSSL 3 in conda-forge (we're trying to get things buttoned up well before the EOL of 1.1.1 relatively soon), we're running into some test errors on unix:

961 test(s) passed 
2 test(s) failed 

In more detail:

[ TEST ] kj/compat/tls-test.c++:685: TLS certificate validation
kj/compat/tls-test.c++:682: failed: expected ::kj::_::hasSubstring(e->getDescription(), message); exception description didn't contain expected substring; *e = kj/compat/tls.c++:170: failed: TLS peer's certificate is not trusted; reason = hostname mismatch
stack: 7f6583692450 7f658369294a 7f65835b37e1 7f6583682ad0 56121b5700f0 56121b569e90 56121b44ffaa 56121b56b0b8
stack: 56121b56ed55 56121b57435d 56121b5744ca 7f658356d38e 7f658350e199 7f658356ef26 7f658356f8c1 7f65835465ee 7f658350e199 7f65835409d2 7f658356d180 7f6582d42554 56121b1ea09c
kj/compat/tls-test.c++:682: failed: expected ::kj::_::hasSubstring(e->getDescription(), message); exception description didn't contain expected substring; *e = kj/compat/tls.c++:170: failed: TLS peer's certificate is not trusted; reason = self-signed certificate
stack: 7f6583692450 7f658369294a 7f65835b37e1 7f6583682ad0 56121b5700f0 56121b569e90 56121b44ffaa 56121b56b0b8
stack: 56121b56ed55 56121b57435d 56121b574622 7f658356d38e 7f658350e199 7f658356ef26 7f658356f8c1 7f65835465ee 7f658350e199 7f65835409d2 7f658356d180 7f6582d42554 56121b1ea09c
[ FAIL ] kj/compat/tls-test.c++:685: TLS certificate validation (199265 μs)
[ TEST ] kj/compat/tls-test.c++:703: TLS client certificate verification
kj/compat/tls.c++:66: failed: OpenSSL error; message = error:16000069:STORE routines::unregistered scheme
error:80000002:system library::No such file or directory
stack: 7f6583687390 7f658368d3fb 7f658368d658 7f65835d7699 7f65835d75a4 56121b56c2a1 56121b56c3e2 56121b578b3e 7f658356d38e
[ FAIL ] kj/compat/tls-test.c++:703: TLS client certificate verification (161491 μs)

I've picked the contents of this PR into a patch on top of 0.10.3, but while the "TLS certificate validation" then passes, the client version still fails:

2022-12-14T00:21:39.3218924Z [ TEST ] kj/compat/tls-test.c++:712: TLS client certificate verification
2022-12-14T00:21:39.5138043Z kj/compat/tls.c++:66: failed: OpenSSL error; message = error:16000069:STORE routines::unregistered scheme
2022-12-14T00:21:39.5139064Z error:80000002:system library::No such file or directory
2022-12-14T00:21:39.5139887Z stack: 7f9e425ec3b0 7f9e425f243b 7f9e425f2698 7f9e4253c699 7f9e4253c5a4 55b5c32f12a1 55b5c32f13e2 55b5c32fdb3e 7f9e424d238e
2022-12-14T00:21:39.5135986Z [ FAIL ] kj/compat/tls-test.c++:712: TLS client certificate verification (193842 μs)

@kentonv
Copy link
Member

kentonv commented Dec 28, 2022

I opened #1596 to complete this work.

@jakirkham
Copy link

It looks like PR ( #1596 ) was merged. Is this PR still needed?

@jakirkham
Copy link

Also it looks like the issue @h-vetinari mentioned above is similar (the same?) as the one described in PR ( #1435 )

@kentonv
Copy link
Member

kentonv commented Jan 11, 2023

Yeah this PR is superseded by #1596.

@kentonv kentonv closed this Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants