Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using InputStream as :ssl-context :trust-store is buggy #728

Open
DerGuteMoritz opened this issue Jun 13, 2024 · 6 comments
Open

Using InputStream as :ssl-context :trust-store is buggy #728

DerGuteMoritz opened this issue Jun 13, 2024 · 6 comments

Comments

@DerGuteMoritz
Copy link
Collaborator

Quoting the original report by @David-Ongaro from the addendum of #727:

As per the docs, instead of java.io.File instances, java.io.InputStream instances are also supported as keys for the :ssl-context map. But I can't figure out how this is supposed to be used, as I regularly get java.lang.IllegalArgumentException: Input stream does not contain valid certificates. exceptions. I.e., preparing a single request just works fine, but preparing them in quick succession may fail:

(def ssl-context {:trust-store (io/input-stream client-ca)})

(def pool (http/connection-pool {:connection-options {:ssl-context ssl-context}}))

(http/get "https://example.com" {:pool pool}) => #<Deferred@6fb6283a: :not-delivered>

[(http/get "https://example.com" {:pool pool}) (http/get "https://example.com" {:pool pool})] =>
[#<Deferred@58ac8429: Error printing return value (CertificateException) at io.netty.handler.ssl.PemReader/readCertificates (PemReader.java:114).
Error printing return value (CertificateException) at io.netty.handler.ssl.PemReader/readCertificates (PemReader.java:114).
found no certificates in input stream
            PemReader.java:  114  io.netty.handler.ssl.PemReader/readCertificates
           SslContext.java: 1263  io.netty.handler.ssl.SslContext/toX509Certificates
    SslContextBuilder.java:  276  io.netty.handler.ssl.SslContextBuilder/trustManager
                 netty.clj:  917  aleph.netty/eval23500/add-ssl-trust-manager!
                 netty.clj: 1026  aleph.netty/eval23500/ssl-client-context
                 netty.clj: 1193  aleph.netty/coerce-ssl-context
                 netty.clj: 1179  aleph.netty/coerce-ssl-context
                  core.clj: 2641  clojure.core/partial/fn
                client.clj:  699  aleph.http.client/client-ssl-context
                client.clj:  690  aleph.http.client/client-ssl-context
                client.clj:  799  aleph.http.client/http-connection
                client.clj:  752  aleph.http.client/http-connection
                  http.clj:  104  aleph.http/create-connection
                  http.clj:   97  aleph.http/create-connection
                  http.clj:  239  aleph.http/connection-pool/fn
                  flow.clj:   47  aleph.flow/instrumented-pool/reify
                 Pool.java:  273  io.aleph.dirigiste.Pool/addObject
                 Pool.java:  466  io.aleph.dirigiste.Pool/acquire
                  flow.clj:   74  aleph.flow/acquire/fn
                  flow.clj:   73  aleph.flow/acquire
                  flow.clj:   68  aleph.flow/acquire
                  http.clj:  377  aleph.http/eval30155/request/fn/fn
                  http.clj:  371  aleph.http/eval30155/request/fn
                  http.clj:  370  aleph.http/eval30155/request
                  http.clj:  481  aleph.http/req
                  http.clj:  477  aleph.http/req
                  core.clj: 2642  clojure.core/partial/fn

I didn't look into the implementation, but I suspect what's happening here is that when the first thread of the thread pool is initialized, it's exhausting the input-stream instance and this instance is reused during the initialization of a second thread. (At least that's what I hope is happening, since the alternative would be that each thread tries to reread the certificate on each request.)

So the question is, if this doesn't work, why is it even supported? But if this is indeed an issue, it probably should be handled in a separate ticket, since this behavior already applies to Aleph 0.6.4 and therefore can't be considered a regression.

@KingMob
Copy link
Collaborator

KingMob commented Jun 13, 2024

I don't recall ever using an InputStream, but as you can see here, Aleph doesn't do much with it by default.

It's interesting that they fail "in quick succession". Maybe there's some delayed init that's not triggered until needed, and then if two conns both try to setup the sslcontext/trust store, they end up racing on the InputStream, and one or both fail.

It looks like that happens on the client-side. It makes a new client context for each conn, which is necessary in case the context is actually just a map of options. And if you make a call slowly, the same conn will get reused, so it's not an issue there. But too fast, and it'll spawn multiple conns, corresponding multiple sslcontexts, and try to read from an exhausted stream or in the middle of an earlier conn.

Solution is to either (1) force the stream into another format ASAP, or (2) disallow streams.

@DerGuteMoritz
Copy link
Collaborator Author

@KingMob My reading of the code agrees with your analysis 👍 I'll come up with a test case to reproduce it.

It looks like that happens on the client-side. It makes a new client context for each conn, which is necessary in case the context is actually just a map of options

I think this might actually not be necessary: It should be possible to lift the construction of the context to the level of the pool instead which would solve this bug as well as reduce allocation. Will give this a try!

@KingMob
Copy link
Collaborator

KingMob commented Jun 19, 2024

Yeah, even when SslContext construction is idempotent, why do it multiple times?

@bitti
Copy link

bitti commented Jun 20, 2024

Yeah, even when SslContext construction is idempotent, why do it multiple times?

If these instances are not thread-safe, that could be a reason. But since they are immutable, I suppose they also should be thread safe. Furthermore, I think the netty SslContext instances are based on the JDK SslContext implementations and if these weren't thread-safe it would be a widespread common problem (even though I find it hard to find explicit documentation about this).

@KingMob
Copy link
Collaborator

KingMob commented Jun 21, 2024

@bitti Good point. Though I've never really considered, should thread safety be implicit in the definition of "idempotent"? I assumed so, but we programmers are much looser about the definition than mathematicians.

@bitti
Copy link

bitti commented Jun 21, 2024

@bitti Good point. Though I've never really considered, should thread safety be implicit in the definition of "idempotent"? I assumed so, but we programmers are much looser about the definition than mathematicians.

I think even in the mathematical sense, you can't 'define' it like that, since thread-unsafety implies undefined behavior. So no, neither idempotency nor immutability implies thread-safety.

But I think in this case we can safely assume the JDK/netty implementations are thread-safe since otherwise it would, it make it more difficult to share connections (at least that's what I gather from the SO discussions). I gather the reason why the JDK docs don't explicitly state this is because they can't make a guarantee for the millions of potential SslContext implementations out there.

PawelStroinski added a commit to PawelStroinski/aleph that referenced this issue Jul 27, 2024
Both testing contexts are failing. The serial one is to demonstrate that
the InputStream cannot be read twice without resetting, which obviously
is not done by Netty/Aleph.

This is also the case in the concurrent context, which was intended to
resemble the original report in clj-commons#728 and is a more likely scenario,
since it doesn't disable keep-alive. IIUC, the concurrent scenario
could fail in an even more unpleasant way, if the test certificate file
was greater than the 8192-byte buffer used to read it, but ours is not
(the fix would be the same).

NB: `with-http-ssl-servers` already runs things twice, so `repeatedly`
is not required to make it fail, but that would be harder to read and
wouldn't cover (at some level, at least) both servers.
PawelStroinski added a commit to PawelStroinski/aleph that referenced this issue Jul 27, 2024
As suggested by @DerGuteMoritz in clj-commons#728. This fixes the issue and makes
the test added in the previous commit pass.

Keeping the `client-ssl-context` call in `http-connection` as is,
even though it might seem superfluous considering the code path taken in
the test, but `http-connection` is a public API, so we have to keep
the call (which for us is a no-op, if we ignore the repeated ALPN check)
even for our case when the protocol is https and `ssl-context` is supplied.

NOTE: This highlights a difference we are introducing here. Previously,
if we specified ssl-context, but the protocol wasn't https, we would just
ignore the ssl-context. Currently, we are coercing it ahead-of-time,
before knowing the request protocol. This could be alleviated by wrapping
the coercion in a `delay`, so it won't happen until needed. However, given
how unlikely this scenario seems, I have doubts whether it'd be worth it.

I slightly dislike the repetition of `[:http1]` default value,
but since it server as a documentation in `http-connection`,
I decided to keep it as is rather than to extract it out.

Also, I slightly dislike the repetition of a pattern to call
`ensure-consistent-alpn-config` and then `coerce-ssl-client-context`
but it's only now in 2 places, which I think is a better alternative
than adding yet another ssl-coercion layer/wrapping function.
Obviously, we cannot just move `ensure-consistent-alpn-config` to
`ssl-client-context`, since ALPN is only for HTTP.
PawelStroinski added a commit to PawelStroinski/aleph that referenced this issue Jul 27, 2024
As suggested by @DerGuteMoritz in clj-commons#728. This fixes the issue and makes
the test added in the previous commit pass.

Keeping the `client-ssl-context` call in `http-connection` as is,
even though it might seem superfluous considering the code path taken in
the test, but `http-connection` is a public API, so we have to keep
the call (which for us is a no-op, if we ignore the repeated ALPN check)
even for our case when the protocol is https and `ssl-context` is supplied.

NOTE: This highlights a difference we are introducing here. Previously,
if we specified ssl-context, but the protocol wasn't https, we would just
ignore the ssl-context. Currently, we are coercing it ahead-of-time,
before knowing the request protocol. This could be alleviated by wrapping
the coercion in a `delay`, so it wouldn't happen until needed. Yet, given
how unlikely this scenario seems, I have doubts whether it'd be worth it.

I slightly dislike the repetition of `[:http1]` default value,
but since it serves as documentation in `http-connection`,
I decided to keep it as is rather than to extract it out.

Also, I slightly dislike the repetition of a pattern to call
`ensure-consistent-alpn-config` and then `coerce-ssl-client-context`
but it's only now in 2 places, which I think is a better alternative
than adding yet another ssl-coercion layer/wrapping function.
Obviously, we cannot just move `ensure-consistent-alpn-config` to
`ssl-client-context`, since ALPN is only for HTTP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants