Using InputStream as :ssl-context :trust-store is buggy #728

DerGuteMoritz · 2024-06-13T09:54:35Z

Quoting the original report by @David-Ongaro from the addendum of #727:

As per the docs, instead of java.io.File instances, java.io.InputStream instances are also supported as keys for the :ssl-context map. But I can't figure out how this is supposed to be used, as I regularly get java.lang.IllegalArgumentException: Input stream does not contain valid certificates. exceptions. I.e., preparing a single request just works fine, but preparing them in quick succession may fail:

(def ssl-context {:trust-store (io/input-stream client-ca)})

(def pool (http/connection-pool {:connection-options {:ssl-context ssl-context}}))

(http/get "https://example.com" {:pool pool}) => #<Deferred@6fb6283a: :not-delivered>

[(http/get "https://example.com" {:pool pool}) (http/get "https://example.com" {:pool pool})] =>
[#<Deferred@58ac8429: Error printing return value (CertificateException) at io.netty.handler.ssl.PemReader/readCertificates (PemReader.java:114).
Error printing return value (CertificateException) at io.netty.handler.ssl.PemReader/readCertificates (PemReader.java:114).
found no certificates in input stream
            PemReader.java:  114  io.netty.handler.ssl.PemReader/readCertificates
           SslContext.java: 1263  io.netty.handler.ssl.SslContext/toX509Certificates
    SslContextBuilder.java:  276  io.netty.handler.ssl.SslContextBuilder/trustManager
                 netty.clj:  917  aleph.netty/eval23500/add-ssl-trust-manager!
                 netty.clj: 1026  aleph.netty/eval23500/ssl-client-context
                 netty.clj: 1193  aleph.netty/coerce-ssl-context
                 netty.clj: 1179  aleph.netty/coerce-ssl-context
                  core.clj: 2641  clojure.core/partial/fn
                client.clj:  699  aleph.http.client/client-ssl-context
                client.clj:  690  aleph.http.client/client-ssl-context
                client.clj:  799  aleph.http.client/http-connection
                client.clj:  752  aleph.http.client/http-connection
                  http.clj:  104  aleph.http/create-connection
                  http.clj:   97  aleph.http/create-connection
                  http.clj:  239  aleph.http/connection-pool/fn
                  flow.clj:   47  aleph.flow/instrumented-pool/reify
                 Pool.java:  273  io.aleph.dirigiste.Pool/addObject
                 Pool.java:  466  io.aleph.dirigiste.Pool/acquire
                  flow.clj:   74  aleph.flow/acquire/fn
                  flow.clj:   73  aleph.flow/acquire
                  flow.clj:   68  aleph.flow/acquire
                  http.clj:  377  aleph.http/eval30155/request/fn/fn
                  http.clj:  371  aleph.http/eval30155/request/fn
                  http.clj:  370  aleph.http/eval30155/request
                  http.clj:  481  aleph.http/req
                  http.clj:  477  aleph.http/req
                  core.clj: 2642  clojure.core/partial/fn

I didn't look into the implementation, but I suspect what's happening here is that when the first thread of the thread pool is initialized, it's exhausting the input-stream instance and this instance is reused during the initialization of a second thread. (At least that's what I hope is happening, since the alternative would be that each thread tries to reread the certificate on each request.)

So the question is, if this doesn't work, why is it even supported? But if this is indeed an issue, it probably should be handled in a separate ticket, since this behavior already applies to Aleph 0.6.4 and therefore can't be considered a regression.

The text was updated successfully, but these errors were encountered:

KingMob · 2024-06-13T12:40:21Z

I don't recall ever using an InputStream, but as you can see here, Aleph doesn't do much with it by default.

It's interesting that they fail "in quick succession". Maybe there's some delayed init that's not triggered until needed, and then if two conns both try to setup the sslcontext/trust store, they end up racing on the InputStream, and one or both fail.

It looks like that happens on the client-side. It makes a new client context for each conn, which is necessary in case the context is actually just a map of options. And if you make a call slowly, the same conn will get reused, so it's not an issue there. But too fast, and it'll spawn multiple conns, corresponding multiple sslcontexts, and try to read from an exhausted stream or in the middle of an earlier conn.

Solution is to either (1) force the stream into another format ASAP, or (2) disallow streams.

DerGuteMoritz · 2024-06-18T15:34:58Z

@KingMob My reading of the code agrees with your analysis 👍 I'll come up with a test case to reproduce it.

It looks like that happens on the client-side. It makes a new client context for each conn, which is necessary in case the context is actually just a map of options

I think this might actually not be necessary: It should be possible to lift the construction of the context to the level of the pool instead which would solve this bug as well as reduce allocation. Will give this a try!

KingMob · 2024-06-19T14:15:13Z

Yeah, even when SslContext construction is idempotent, why do it multiple times?

bitti · 2024-06-20T22:59:07Z

Yeah, even when SslContext construction is idempotent, why do it multiple times?

If these instances are not thread-safe, that could be a reason. But since they are immutable, I suppose they also should be thread safe. Furthermore, I think the netty SslContext instances are based on the JDK SslContext implementations and if these weren't thread-safe it would be a widespread common problem (even though I find it hard to find explicit documentation about this).

KingMob · 2024-06-21T17:44:05Z

@bitti Good point. Though I've never really considered, should thread safety be implicit in the definition of "idempotent"? I assumed so, but we programmers are much looser about the definition than mathematicians.

bitti · 2024-06-21T18:44:51Z

@bitti Good point. Though I've never really considered, should thread safety be implicit in the definition of "idempotent"? I assumed so, but we programmers are much looser about the definition than mathematicians.

I think even in the mathematical sense, you can't 'define' it like that, since thread-unsafety implies undefined behavior. So no, neither idempotency nor immutability implies thread-safety.

But I think in this case we can safely assume the JDK/netty implementations are thread-safe since otherwise it would, it make it more difficult to share connections (at least that's what I gather from the SO discussions). I gather the reason why the JDK docs don't explicitly state this is because they can't make a guarantee for the millions of potential SslContext implementations out there.

Both testing contexts are failing. The serial one is to demonstrate that the InputStream cannot be read twice without resetting, which obviously is not done by Netty/Aleph. This is also the case in the concurrent context, which was intended to resemble the original report in clj-commons#728 and is a more likely scenario, since it doesn't disable keep-alive. IIUC, the concurrent scenario could fail in an even more unpleasant way, if the test certificate file was greater than the 8192-byte buffer used to read it, but ours is not (the fix would be the same). NB: `with-http-ssl-servers` already runs things twice, so `repeatedly` is not required to make it fail, but that would be harder to read and wouldn't cover (at some level, at least) both servers.

@DerGuteMoritz

As suggested by @DerGuteMoritz in clj-commons#728. This fixes the issue and makes the test added in the previous commit pass. Keeping the `client-ssl-context` call in `http-connection` as is, even though it might seem superfluous considering the code path taken in the test, but `http-connection` is a public API, so we have to keep the call (which for us is a no-op, if we ignore the repeated ALPN check) even for our case when the protocol is https and `ssl-context` is supplied. NOTE: This highlights a difference we are introducing here. Previously, if we specified ssl-context, but the protocol wasn't https, we would just ignore the ssl-context. Currently, we are coercing it ahead-of-time, before knowing the request protocol. This could be alleviated by wrapping the coercion in a `delay`, so it won't happen until needed. However, given how unlikely this scenario seems, I have doubts whether it'd be worth it. I slightly dislike the repetition of `[:http1]` default value, but since it server as a documentation in `http-connection`, I decided to keep it as is rather than to extract it out. Also, I slightly dislike the repetition of a pattern to call `ensure-consistent-alpn-config` and then `coerce-ssl-client-context` but it's only now in 2 places, which I think is a better alternative than adding yet another ssl-coercion layer/wrapping function. Obviously, we cannot just move `ensure-consistent-alpn-config` to `ssl-client-context`, since ALPN is only for HTTP.

@DerGuteMoritz

As suggested by @DerGuteMoritz in clj-commons#728. This fixes the issue and makes the test added in the previous commit pass. Keeping the `client-ssl-context` call in `http-connection` as is, even though it might seem superfluous considering the code path taken in the test, but `http-connection` is a public API, so we have to keep the call (which for us is a no-op, if we ignore the repeated ALPN check) even for our case when the protocol is https and `ssl-context` is supplied. NOTE: This highlights a difference we are introducing here. Previously, if we specified ssl-context, but the protocol wasn't https, we would just ignore the ssl-context. Currently, we are coercing it ahead-of-time, before knowing the request protocol. This could be alleviated by wrapping the coercion in a `delay`, so it wouldn't happen until needed. Yet, given how unlikely this scenario seems, I have doubts whether it'd be worth it. I slightly dislike the repetition of `[:http1]` default value, but since it serves as documentation in `http-connection`, I decided to keep it as is rather than to extract it out. Also, I slightly dislike the repetition of a pattern to call `ensure-consistent-alpn-config` and then `coerce-ssl-client-context` but it's only now in 2 places, which I think is a better alternative than adding yet another ssl-coercion layer/wrapping function. Obviously, we cannot just move `ensure-consistent-alpn-config` to `ssl-client-context`, since ALPN is only for HTTP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using InputStream as :ssl-context :trust-store is buggy #728

Using InputStream as :ssl-context :trust-store is buggy #728

DerGuteMoritz commented Jun 13, 2024

KingMob commented Jun 13, 2024

DerGuteMoritz commented Jun 18, 2024

KingMob commented Jun 19, 2024

bitti commented Jun 20, 2024

KingMob commented Jun 21, 2024 •

edited

Loading

bitti commented Jun 21, 2024 •

edited

Loading

Using InputStream as :ssl-context :trust-store is buggy #728

Using InputStream as :ssl-context :trust-store is buggy #728

Comments

DerGuteMoritz commented Jun 13, 2024

KingMob commented Jun 13, 2024

DerGuteMoritz commented Jun 18, 2024

KingMob commented Jun 19, 2024

bitti commented Jun 20, 2024

KingMob commented Jun 21, 2024 • edited Loading

bitti commented Jun 21, 2024 • edited Loading

KingMob commented Jun 21, 2024 •

edited

Loading

bitti commented Jun 21, 2024 •

edited

Loading