Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP client connection pool timeout doesn't free up queue #713

Closed
DerGuteMoritz opened this issue Feb 12, 2024 · 4 comments
Closed

HTTP client connection pool timeout doesn't free up queue #713

DerGuteMoritz opened this issue Feb 12, 2024 · 4 comments

Comments

@DerGuteMoritz
Copy link
Collaborator

Problem

The pool-timeout option of aleph.http/request controls how long the caller is willing to await a connection to be acquired from the connection pool. When it expires, the response deferred is rejected with a PoolTimeoutException. However, the "acquire" operation will still clog up the pool's queue until it gets its (by then unnecessary) turn.

Reproduction test case (in aleph.http-test):

(deftest test-pool-timeout
  (let [starved-pool (http/connection-pool
                      {:total-connections 0
                       :max-queue-size 1})]
    (try
      (is (thrown? PoolTimeoutException
                   @(http-get "/" {:pool starved-pool :pool-timeout 2})))
      (is (thrown? PoolTimeoutException
                   @(http-get "/"  {:pool starved-pool :pool-timeout 2})))
      (finally
        (.shutdown ^Pool starved-pool)))))

The second assertion fails with:

expected: (thrown?
           PoolTimeoutException
           @(http-get "/" {:pool starved-pool, :pool-timeout 2}))
   error: java.util.concurrent.RejectedExecutionException

Solution

Eagerly removing the acquire operation from the pool's queue would require changing io.aleph.dirigiste.Pool#acquire to support this. Its (internal) queue already supports this via cancelTake, it just isn't reachable from the public API.

Alternatively: Perhaps the current behavior is fine / good enough? It just happened to surprise me but that could also be addressed with a docstring change.

@DerGuteMoritz DerGuteMoritz changed the title HTTP connection pool timeout doesn't free up queue HTTP client connection pool timeout doesn't free up queue Feb 12, 2024
@KingMob
Copy link
Collaborator

KingMob commented Feb 15, 2024

Roping in @alexander-yakushev in case they have a comment.

It would be nice to cancel the pending acquire if it's taken too long, but it would also be fine w me to xform the rejected ex into a pool timeout ex.

If you exhaust all conns in reality, you're going to get exceptions of some sort.

@alexander-yakushev
Copy link
Contributor

alexander-yakushev commented Feb 15, 2024

I have a few comments about this indeed.

I found that in practice (in my work, I can be biased), Aleph's approach to having a separate queue for acquiring connections is unnecessary. Caching connections is vital because you can't keep creating and disposing them at a high rate (due to TIME_WAIT), but limiting the total number of them is not as important. I personally set the connection limit to 16k per host and pool queue size to 0, so that a fresh connection is immediately created for the acquirer each time there are no free connections in the pool, up until the limit is hit at which point it's an error.

There are some cases where you would use the pool with the queue as a parallelism controller. I think all those cases could be rewritten to something like Semaphores if Aleph connection pools offered no queueing.

So, this is my take. In my world, the breaking Aleph2 would have connection pools with no acquisition queues. Such pools are much much easier to implement, way more performant at high loads, and give the user one less option and timeout to think about. But I understand that people can have other usecases where this won't fly.

@alexander-yakushev
Copy link
Contributor

alexander-yakushev commented Feb 15, 2024

Sorry I haven't answered the original question. I think the current behavior is fine. I doubt anybody fine-tunes :pool-timeout and handles it somehow specially, it is an awkward option to give to the user and work with.

@DerGuteMoritz
Copy link
Collaborator Author

DerGuteMoritz commented Feb 16, 2024

Thanks for chiming in @alexander-yakushev, very good insights! I agree that the acquire queue and timeout are quite awkward and needlessly complicate matters. I also agree that limiting parallelism should rather be handled by the caller/application itself. For example, if I'm already using core.async, I'd rather use that instead of having to also deal with RejectedExecutionException and PoolTimeoutException. I wonder whether this must necessarily be considered a breaking change: Most users probably won't notice much of a difference and, as pointed out by @KingMob, they will already have to deal with exceptions due connection exhaustion in some manner anyway. And we could just keep around the old pool implementation (maybe move to a namespace which isn't loaded by default) so users who do rely on that specific behavior can still swap it back in. I filed #716 to track this idea.

As for status quo: Given all of the above, I think it's indeed fine to keep the behavior as-is. Closing but @KingMob feel free to re-open if you'd still like to see a change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants