New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenSSL 3.0 #1434
OpenSSL 3.0 #1434
Conversation
c++/src/kj/compat/tls.c++
Outdated
case BIO_CTRL_EOF: | ||
// Queries for info and EOF | ||
return 0; | ||
#if OPENSSL_VERSION_NUMBER >= 0x30000000L && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could just be #ifdef BIO_CTRL_GET_KTLS_SEND
? That way if/when BoringSSL adds support, we won't need to update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, sure.
c++/src/kj/compat/tls.c++
Outdated
@@ -369,6 +369,18 @@ private: | |||
switch (cmd) { | |||
case BIO_CTRL_FLUSH: | |||
return 1; | |||
case BIO_CTRL_INFO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't suppose there are any, like, docs describing these?
BIO_CTRL_INFO
and BIO_CTRL_EOF
don't seem to be new, but I've never seen the warning log for them...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIO_CTRL_INFO and BIO_CTRL_EOF don't seem to be new, but I've never seen the warning log for them...
I just ran the tests against both 1.1.1 and 3.0.1 again to be sure: link. Neither are called using Debian unstable's 1.1.1 build and only BIO_CTRL_EOF
is called in Debian's 3.0.1. No calls to BIO_CTRL_INFO
in either release. I grepped the full build logs just to be certain:
$ grep bio_ctrl capnproto*.build
capnproto_0.9.1-4_arm64.build:kj/compat/tls.c++:377: unimplemented bio_ctrl; cmd = 2
capnproto_0.9.1-4_arm64.build:kj/compat/tls.c++:377: unimplemented bio_ctrl; cmd = 2
So I must have messed something up when I was testing the changes and prepping the patch, looks like it might just be some subtle change in semantics around polling the EOF bit in the BIO. I'll update the PR to remove the special casing of BIO_CTRL_INFO
.
I don't suppose there are any, like, docs describing these?
Yeah, the OpenSSL docs are painfully terse. You probably know more about OpenSSL than I do but my understanding is that bioctrls are the BIO (OpenSSL I/O abstraction) equivalent of ioctls: openssl calls the bioctrl callback to query the state of the BIO. The best "docs" I was able to find was the openssl sources unfortunately, but you can typically tie the bioctrls back to macros in bio.h
that do have some useful (if minimal) docs.
For example, it's irrelevant now that we've confirmed it's not being hit in tests but BIO_CTRL_INFO
drives BIO_get_mem_data which seems to be used to fetch a pointer into buffer memory in things like the mem BIO (and/or "current file pointer position" for unbuffered stuff like the fd BIO? ... seems dangerous to mix the two now that I look at it, but what do I know.). Here are the docs for BIO_get_mem_data, among others.
BIO_CTRL_EOF
unsurprisingly drives BIO_eof, so just OpenSSL polling to see if the BIO is in an "EOF" state. Some docs for BIO_eof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I guess the question then, is: Are we doing anything wrong by implementing BIO_CTRL_EOF
to always return zero? Does zero mean "I don't know", or does it mean "No, not at EOF"? If the latter, then is it bad to lie when we actually are at EOF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A full backtrace from within bioCtrl while running capnproto tests show the BIO_eof
checks originate from ssl3_read_n
in both tests: relevant openssl 3.0.1 source
This particular case seems like a premature EOF check while accepting a new TLS connection. If we lie about EOF we'll just keep retrying the read even though the underlying socket might be dead. At the same time I suspect we always would have lied about the EOF check in 1.1.1 if we tickled a code path that happened to trigger that bioctrl, (though we would have emitted a warning if that happened).
So if we wanted to be safe, is there an easy way for us to detect stream EOF for a TlsConnection
from within bioCtrl
that might make sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we do, no. So let's go with a comment like:
OpenSSL sometimes calls this, hoping to probe whether the stream is at EOF. We don't have any way to check this in terms of KJ interfaces. But if we lie and say "no", OpenSSL will keep trying reads, and if the read itself reports EOF, OpenSSL accepts that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, tests fail because an exception is thrown:
failed: OpenSSL error; message = error:0A000126:SSL routines::unexpected eof while reading
So throwing an exception on unexpected EOF would seem to be a change in semantics, but guess that could be considered a bug & we just change the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd argue this was a bug before, possibly even a mild security problem (truncation attacks). We should fix the test to properly shut down or to expect the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kentonv I should clarify that the test failure will only occur against openssl 3.0, so we're talking about different semantics building against openssl 1.1 vs. 3.0. If you're still OK with that, I'll make it happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Yes I still think the behavior is desirable. We'll have to design the test so it succeeds either way, which I suppose means that it should always do a clean shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay looks like the exception is being routed through TlsConnectionReceiver::taskFailed
... my repro case on top of all the Debian build tooling is currently using 0.9.1 which is missing the new callback in TlsConnectionReceiver
that I would be able to use to capture the exception when building against OpenSSL 3.0. Planning to backport the change introducing the callback to 0.9.1 in Debian just to get this change passing tests against OpenSSL 3.0 then I'll work those changes back to capnproto latest.
For what it's worth I'm also in a situation where using openssl 1.1 is causing issues. So is the current status that the tests have to be updated? |
@pepijndevos that's correct, the tests need to be modified and I've been slow to get back to this after failing to force a clean shutdown in the tests. I'll see if I can take the route of simply expecting the exception instead a little later today. |
c35c724
to
6ca3362
Compare
Hey all, what's the status of this PR? When trying to package capnproto against OpenSSL 3 in conda-forge (we're trying to get things buttoned up well before the EOL of 1.1.1 relatively soon), we're running into some test errors on unix:
In more detail:
I've picked the contents of this PR into a patch on top of 0.10.3, but while the "TLS certificate validation" then passes, the client version still fails:
|
I opened #1596 to complete this work. |
It looks like PR ( #1596 ) was merged. Is this PR still needed? |
Also it looks like the issue @h-vetinari mentioned above is similar (the same?) as the one described in PR ( #1435 ) |
Yeah this PR is superseded by #1596. |
Debian is preparing to transition to OpenSSL 3.0 and we ran into two issues with this in capnproto: