Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default synchronous connection pool blocks forever? #407

Open
ejschoen opened this issue Oct 13, 2017 · 14 comments
Open

Default synchronous connection pool blocks forever? #407

ejschoen opened this issue Oct 13, 2017 · 14 comments

Comments

@ejschoen
Copy link

@ejschoen ejschoen commented Oct 13, 2017

We've been using clj-http version 3.5 for a while, and are very happy with it. But today, we ran into an issue we haven't seen before. A specific pattern of requests--I can't figure out yet how to isolate this into a unit test--hangs, in what appears to be a wait for a connection to become available from the PoolingHttpClientConnectionManager.

The thread's stack looks like this (with some elision--I am using default middleware):

        java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f0ce9c70> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:377)
        
       ...

        at clj_http.core$request.invoke(core.clj:412)
        at clj_http.core$request.invoke(core.clj:347)
        at clojure.lang.Var.invoke(Var.java:379)
        at clj_http.client$wrap_request_timing$fn__3085.invoke(client.clj:1030)
        at clj_http.client$wrap_async_pooling$fn__3093.invoke(client.clj:1063)
        
      ...

        at clj_http.client$request_STAR_.invoke(client.clj:1165)
        at clj_http.client$get.doInvoke(client.clj:1171)

There are no other threads actively requesting network content.

I enabled pool logging, and here's the end of the console log from the pool:

DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 2][route: {}->http://bit.ly:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 3 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {}->http://www.gov.mb.ca:80][total kept alive: 3; route allocated: 0 of 2; total allocated: 3 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 3][route: {}->http://www.gov.mb.ca:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (DefaultHttpClientConnectionOperator.java:139)  Connecting to www.gov.mb.ca/198.163.15.25:80 
DEBUG: (DefaultHttpClientConnectionOperator.java:146)  Connection established 192.168.0.200:48939<->198.163.15.25:80 
DEBUG: (LoggingManagedHttpClientConnection.java:88)  http-outgoing-3: set socket timeout to 5000 
DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 3][route: {}->http://www.gov.mb.ca:80] can be kept alive for 5.0 seconds 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 3][route: {}->http://www.gov.mb.ca:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 2][route: {}->http://bit.ly:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 2][route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 0 of 2; total allocated: 4 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 4][route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (DefaultHttpClientConnectionOperator.java:139)  Connecting to bitly.com/67.199.248.15:443 
DEBUG: (DefaultHttpClientConnectionOperator.java:146)  Connection established 192.168.0.200:57998<->67.199.248.15:443 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 2][route: {}->http://bit.ly:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 2][route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 1 of 2; total allocated: 5 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 5][route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 2 of 2; total allocated: 6 of 10] 
DEBUG: (DefaultHttpClientConnectionOperator.java:139)  Connecting to bitly.com/67.199.248.15:443 
DEBUG: (DefaultHttpClientConnectionOperator.java:146)  Connection established 192.168.0.200:57999<->67.199.248.15:443 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 6 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:288)  Connection leased: [id: 2][route: {}->http://bit.ly:80][total kept alive: 3; route allocated: 1 of 2; total allocated: 6 of 10] 
DEBUG: (LoggingManagedHttpClientConnection.java:88)  http-outgoing-2: set socket timeout to 5000 
DEBUG: (PoolingHttpClientConnectionManager.java:320)  Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely 
DEBUG: (PoolingHttpClientConnectionManager.java:326)  Connection released: [id: 2][route: {}->http://bit.ly:80][total kept alive: 4; route allocated: 1 of 2; total allocated: 6 of 10] 
DEBUG: (PoolingHttpClientConnectionManager.java:255)  Connection request: [route: {s}->https://bitly.com:443][total kept alive: 4; route allocated: 2 of 2; total allocated: 6 of 10] 

I was able to work around the problem by wrapping the code related to this scenario with a (with-connection-pool ...) wrapper. I just wonder why this was necessary. Never seen this before. It isn't obvious that there's more than one outgoing request when the thread blocks indefinitely.

I do wonder about the Connection [id: 2][route: {}->http://bit.ly:80] can be kept alive indefinitely. Are these connections not being released?

For what it's worth, there was no difference if I tried clj-http 3.7.0.

Thanks,
Eric

@dakrone

This comment has been minimized.

Copy link
Owner

@dakrone dakrone commented Oct 15, 2017

Interesting, how are you invoking the http call? Are you passing in your own connection manager then?

@ejschoen

This comment has been minimized.

Copy link
Author

@ejschoen ejschoen commented Oct 15, 2017

I wasn't using an explicit connection manager, and the workaround works if I simply use a unique connection pool in the context of the call that was hanging. I'm wondering naively if this could be a redirection chain that's just too deep for the default pool.

@dakrone

This comment has been minimized.

Copy link
Owner

@dakrone dakrone commented Oct 26, 2017

I'm wondering naively if this could be a redirection chain that's just too deep for the default pool.

If you disable redirection on the clj-http level does the problem still occur?

@ejschoen

This comment has been minimized.

Copy link
Author

@ejschoen ejschoen commented Oct 27, 2017

Good question. I can give that a try. I can't reproduce the behavior in isolation, but can implement my own redirection following to see if problem recurs when the system is used in normal production.

@dakrone

This comment has been minimized.

Copy link
Owner

@dakrone dakrone commented Feb 27, 2018

@ejschoen have you had a chance to try this again?

@ejschoen

This comment has been minimized.

Copy link
Author

@ejschoen ejschoen commented Feb 27, 2018

Apologies. I haven't had time to try implementing my own redirection here. I did pull the latest 3.8.0-SNAPSHOT this morning, and am testing in a development branch (primarily to confirm that #416 is fixed), without the explicit connection pool workaround I described above.

Do you think any work you have done between 3.7.0 and now would have addressed this issue?

@dakrone

This comment has been minimized.

Copy link
Owner

@dakrone dakrone commented Feb 27, 2018

Do you think any work you have done between 3.7.0 and now would have addressed this issue?

That is possible but I can't say for sure. I did have another question, can you share the code you're invoking clj-http with? (or a reasonable replica). Also, how are you consuming the body of the response?

@ejschoen

This comment has been minimized.

Copy link
Author

@ejschoen ejschoen commented Feb 28, 2018

Not sure this'll help, but abstractly, the code is used to scrape content from pages.

(let [request-headers (merge headers {"User-Agent" (str "i2kconnect/" @i2ksource-version)})
        page (assoc
                      (client/get url (merge
                                                   {:socket-timeout 15000 
                                                    :insecure? insecure?
                                                    :throw-exceptions false
                                                    :headers request-headers
                                                    :conn-timeout 15000
                                                    :cookie-store cookie-store}
                                                  (when as {:as as})))
                        :request-headers request-headers)]
 ...)

The as parameter is normally nil, insecure is normally false, and cookie-store is from (clj-http.cookies/cookie-store). So the body comes back as a string and we feed it into JSoup and snacktory when we get a good :status value.

@dakrone

This comment has been minimized.

Copy link
Owner

@dakrone dakrone commented Mar 9, 2018

The as parameter is normally nil

When you say "normally", what is the other option? Something I was originally
wondering about was whether the body could be returned as a stream that was not
being consumed, leading to consuming all of the available connection pool.

The other weird this is that a Pooling connection manager would be used at all,
looking at the code, if no connection pool is specified, a
BasicHttpClientConnectionManager should be used instead:

(defn ^BasicHttpClientConnectionManager make-regular-conn-manager
[{:keys [keystore trust-store] :as req}]
(cond
(or keystore trust-store)
(BasicHttpClientConnectionManager. (get-keystore-scheme-registry req))
(opt req :insecure) (BasicHttpClientConnectionManager.
@insecure-scheme-registry)
:else (BasicHttpClientConnectionManager. regular-scheme-registry)))

Is it possible someone has bound *connection-manager* or is providing one to
the request?

@RutledgePaulV

This comment has been minimized.

Copy link

@RutledgePaulV RutledgePaulV commented Mar 22, 2018

This sounds kinda similar to the issues in #251. Basically you need to take special care to cleanup connections after the reading of a response is interrupted otherwise that response continues to be downloaded (consuming one of the available pool threads).

@ejschoen

This comment has been minimized.

Copy link
Author

@ejschoen ejschoen commented Mar 22, 2018

Apologies for not responding sooner. Two answers:

  1. I looked back at our code from when I originally reported this, and I can confirm that none of our code was binding *connection-manager* at that time.
  2. The :as parameter in the clj-http.client/get call is :stream when we are fetching an RSS feed. We pass the stream onto feedparser-clj's parse-feed function. So... if we get a bad feed (e.g., bad XML that can't parse) and feedparser-clj fails and throws an exception, does the cleanup that Paul mentioned above happen?
@dakrone

This comment has been minimized.

Copy link
Owner

@dakrone dakrone commented Mar 25, 2018

So... if we get a bad feed (e.g., bad XML that can't parse) and feedparser-clj fails and throws an exception, does the cleanup that Paul mentioned above happen?

It does not, in that case, the stream would still be around and the connection pool can't close until the stream has been read, see:

https://github.com/dakrone/clj-http/blob/master/src/clj_http/core.clj#L347-L358

Where the connection pool is only closed when the stream is closed.

Does this still occur if you use with-open to handle closing the stream in the event an exception is thrown, such as:

(let [request-headers (merge headers {"User-Agent" (str "i2kconnect/" @i2ksource-version)})
      page (assoc
            (client/get url (merge
                             {:socket-timeout 15000
                              :insecure? insecure?
                              :throw-exceptions false
                              :headers request-headers
                              :conn-timeout 15000
                              :cookie-store cookie-store}
                             (when as {:as as})))
            :request-headers request-headers)]
  (with-open [stream (:body page)]
    (parse-feed stream))
@ejschoen

This comment has been minimized.

Copy link
Author

@ejschoen ejschoen commented Mar 29, 2018

This all makes sense, but I'm still worried that I haven't found the case that was causing the pool starvation.

Here is some test code (warn, error, etc are from com.taoensso/timbre):

(defn- show-cm
  [cm message]
  (let [stats (.getTotalStats cm)]
    (println "*****" message)
    (println "***** Available connections:" (.getAvailable stats))
    (println "***** Leased connections:" (.getLeased stats))
    (println "***** Max connections:" (.getMax stats))
    (println "***** Pending:" (.getPending stats))))

(defn test-rss-feed
  [feed-uri]
  (http-client/with-connection-pool {:timeout 5 :threads 4 :insecure? true :default-per-route 4}
    (try (let [cs (cookies/cookie-store)
               feed (try
                      (http-client/get feed-uri {:throw-exceptions false
                                                 :cookie-store cs
                                                 :insecure? true
                                                 :as :stream
                                                 :headers {"User-Agent" (str "i2kconnect/" @i2ksource-version)}})
                      (catch Exception e
                        {:status 500 :body (.getMessage e)}))
               feed-stream (when (= (:status feed) 200) (:body feed))]
           (if (nil? feed-stream)
             (do (error "No content for RSS feed returned.")
                 (error (format "Status %d, body %s" (:status feed) (:body feed))))
             (with-open [stream feed-stream]
               (let [cm conn/*connection-manager*
                     _ (show-cm cm "Before feedparser")
                     entries (try (:entries (feedparser/parse-feed stream))
                                  (catch java.lang.IllegalArgumentException e
                                    (error (format "Unable to parse feed: %s" (.getMessage e)))
                                    (error feed-stream)
                                    []))]
                 (info (format "Found %d entries in feed" (count entries)))
                 (doseq [entry entries
                         ;; to repro with this code, define fix-entry-url to be identity
                         :let [url (fix-entry-url entry)]]
                   (try (let [result (http-client/get url {:throw-exceptions false
                                                           :cookie-store cs
                                                           :headers {"User-Agent" (str "i2kconnect/" @i2ksource-version)}})]
                          (if (= 200 (:status result))
                            (info (format "%s: OK" url))
                            (warn (format "%s: Not OK, status %d" url (:status result)))))
                        (catch Exception e
                          (error (format "Unable to fetch entry at %s: %s" url (.getMessage e))))))))))
         (finally
           (show-cm conn/*connection-manager* "In cleanup")))))

I tried this with a feed that our system thinks is bad: http://feeds.feedburner.com/gigalaw
Here's what got printed on the console:

***** Before feedparser
***** Available connections: 0
***** Leased connections: 1
***** Max connections: 4
***** Pending: 0
18-Mar-29 14:16:09 debian-monster ERROR [i2ksource.rss] - Unable to parse feed: Invalid document
18-Mar-29 14:16:09 debian-monster ERROR [i2ksource.rss] - clj_http.core.proxy$java.io.FilterInputStream$ff19274a@34414ffc
18-Mar-29 14:16:09 debian-monster INFO [i2ksource.rss] - Found 0 entries in feed
***** In cleanup
***** Available connections: 1
***** Leased connections: 0
***** Max connections: 4
***** Pending: 0

This is what I'd expect. The with-open scope closes the stream, and what was the leased connection from the pool becomes available. However, if I change the with-open to let, I get exactly the same output, even though nothing is closing the stream when the feedparser throws the illegal argument exception. I did look into the source for feedparser up through com.sun.syndication.io.WireFeedInput and I can't see any code along that call chain that closes the stream in the case of an exception. (I don't think I can test the actual proxy FilterInputStream that clj-http creates for being open to confirm this, can I?)

@dakrone

This comment has been minimized.

Copy link
Owner

@dakrone dakrone commented Apr 1, 2018

This all makes sense, but I'm still worried that I haven't found the case that was causing the pool starvation.

Okay, let's try to get some logging from Apache, I've documented how to hook up the apache client's http logging at https://github.com/dakrone/clj-http/#logging can you give that a try with DEBUG logging turned on, that should give us intimate details about connection reuse, closing, etc for the connection pool and request objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.