Skip to content
Permalink
Browse files

Add HTTP response caching to the core client

This adds the option to configure Apache's httpclient-cache library in a
request. When multiple requests reuse a client and connection manager, the
response can be cached and returned for a much faster response for multiple
retrievals of the same URL.

This caching is transparent to the requestor, and a user can see whether the
response was returned from a cache using the `:cached` key in the response.

Caching is turned off by default and can be turned on by setting `:cache` to
`true`. Cache options can be configured through a `:cache-config` map in the
request option map. See `core/build-cache-config` for all available options.

Resolves #445
  • Loading branch information
dakrone committed Apr 19, 2018
1 parent 2608c30 commit 2843e45023850f63906aa976cbcffbb94204d8f0
Showing with 229 additions and 11 deletions.
  1. +57 −0 README.org
  2. +5 −1 changelog.org
  3. +1 −0 project.clj
  4. +102 −10 src/clj_http/core.clj
  5. +64 −0 test/clj_http/test/core_test.clj
@@ -38,6 +38,7 @@
- [[#exceptions][Exceptions]]
- [[#decompression][Decompression]]
- [[#debugging][Debugging]]
- [[#caching][Caching]]
- [[#authentication][Authentication]]
- [[#basic-auth][Basic Auth]]
- [[#digest-auth][Digest Auth]]
@@ -997,6 +998,62 @@ This provides both the data sent and received on the wire for debugging purposes
I've also provided an example for changing the log level from clojure in
=examples/logging-apache-requests.clj=.

* Caching
:PROPERTIES:
:CUSTOM_ID: h-2c4ee611-ca22-432e-9c33-18040566661e
:END:

clj-http supports Apache's caching client, essentially it "provides an HTTP/1.1-compliant caching
layer to be used with HttpClient--the Java equivalent of a browser cache." (see [[https://hc.apache.org/httpcomponents-client-ga/tutorial/html/caching.html][the explanation in
the apache docs]]). In order to use the cache, a reusable connection manager *and* http-client must be
used.

An example of basic usage with the default options:

#+BEGIN_SRC clojure
(let [cm (conn/make-reusable-conn-manager {})
client (:http-client (http/get "http://example.com"
{:connection-manager cm :cache true}))]
(http/get "http://example.com"
{:connection-manager cm :http-client client :cache true})
(http/get "http://example.com"
{:connection-manager cm :http-client client :cache true})
(http/get "http://example.com"
{:connection-manager cm :http-client client :cache true}))
#+END_SRC

You can build your own cache config by providing either a map of caching configuration options, or
by providing a =CacheConfig= object, as seen below:

#+BEGIN_SRC clojure
(let [cm (conn/make-reusable-conn-manager {})
cache-config (core/build-cache-config
{:cache-config {:max-object-size 4096}})
client (:http-client (http/get "http://example.com"
{:connection-manager cm :cache true}))]
(http/get "http://example.com"
;; Use the default cache config settings
{:connection-manager cm :http-client client :cache true})
(http/get "http://example.com"
{:connection-manager cm :http-client client :cache true
;; Provide cache configuration options as a map
:cache-config {:max-object-size 9152
:max-cache-entries 100}})
(http/get "http://example.com"
{:connection-manager cm :http-client client :cache true
;; Provide the cache configuration as a CacheConfig object
:cache-config cache-config}))
#+END_SRC

In the response, clj-http provides the =:cached= key to indicate whether the response was cached,
missed, etc:

- nil :: Caching was not used for this request
- =:CACHE_HIT= :: A response was generated from the cache with no requests sent upstream.
- =:CACHE_MISS= :: The response came from an upstream server.
- =:CACHE_MODULE_RESPONSE= :: The response was generated directly by the caching module.
- =:VALIDATED= :: The response was generated from the cache after validating the entry with the origin server.

* Authentication
:PROPERTIES:
:CUSTOM_ID: h-87f38469-36b4-44c6-ae74-0d8f5e80c2ed
@@ -16,7 +16,7 @@ List of user-visible changes that have gone into each release
- Added an option to capture socket data as it is written out -
https://github.com/dakrone/clj-http/pull/440

** 3.9.0
** 3.9.0 (Unreleased)
- Add support for reusable http clients, returning the client in =:http-client= and allowing one to
be specified (with the same setting) - https://github.com/dakrone/clj-http/issues/441
- Cancelling the =Future= returned from an async http request now also aborts the HttpRequest object
@@ -26,6 +26,10 @@ List of user-visible changes that have gone into each release
advanced users who wish to add their own cookie validation, or use Apache's handling instead of
clj-http's. It also allows a user who wants to registry a custom spec to reuse the spec without
creating it for every request. Semi-related to https://github.com/dakrone/clj-http/issues/444
- Added support for caching HTTP responses from a server. This can dramatically speed up requests to
the same URL. Filling and invalidating the cache is handled by Apache's httpclient-cache project,
with configuration exposed under the =:cache= and =:cache-config= parameters in the option map.
https://github.com/dakrone/clj-http/issues/445

** 3.8.0
- Reintroduce the =:save-request= and =:debug-body= options
@@ -9,6 +9,7 @@
:exclusions [org.clojure/clojure]
:dependencies [[org.apache.httpcomponents/httpcore "4.4.9"]
[org.apache.httpcomponents/httpclient "4.5.5"]
[org.apache.httpcomponents/httpclient-cache "4.5.5"]
[org.apache.httpcomponents/httpasyncclient "4.1.3"]
[org.apache.httpcomponents/httpmime "4.5.5"]
[commons-codec "1.11"]
@@ -37,6 +37,9 @@
CloseableHttpClient HttpClients
DefaultRedirectStrategy
LaxRedirectStrategy HttpClientBuilder)
(org.apache.http.client.cache HttpCacheContext)
(org.apache.http.impl.client.cache CacheConfig
CachingHttpClientBuilder)
(org.apache.http.impl.cookie DefaultCookieSpecProvider)
(org.apache.http.impl.conn SystemDefaultRoutePlanner
DefaultProxyRoutePlanner)
@@ -223,6 +226,78 @@
(DefaultProxyRoutePlanner. (construct-http-host proxy-host proxy-port))
(SystemDefaultRoutePlanner. (ProxySelector/getDefault)))))

(defn build-cache-config
"Given a request with :cache-config as a map or a CacheConfig object, return a
CacheConfig object, or nil if no cache config is found. If :cache-config is a
map, it checks for the following options:
- :allow-303-caching
- :asynchronous-worker-idle-lifetime-secs
- :asynchronous-workers-core
- :asynchronous-workers-max
- :heuristic-caching-enabled
- :heuristic-coefficient
- :heuristic-default-lifetime
- :max-cache-entries
- :max-object-size
- :max-update-retries
- :never-cache-http10-responses-with-query-string
- :revalidation-queue-size
- :shared-cache
- :weak-etag-on-put-delete-allowed"
[request]
(when-let [cc (:cache-config request)]
(if (instance? CacheConfig cc)
cc
(let [config (CacheConfig/custom)
{:keys [allow-303-caching
asynchronous-worker-idle-lifetime-secs
asynchronous-workers-core
asynchronous-workers-max
heuristic-caching-enabled
heuristic-coefficient
heuristic-default-lifetime
max-cache-entries
max-object-size
max-update-retries
never-cache-http10-responses-with-query-string
revalidation-queue-size
shared-cache
weak-etag-on-put-delete-allowed]} cc]
(when (boolean? allow-303-caching)
(.setAllow303Caching config allow-303-caching))
(when asynchronous-worker-idle-lifetime-secs
(.setAsynchronousWorkerIdleLifetimeSecs
config asynchronous-worker-idle-lifetime-secs))
(when asynchronous-workers-core
(.setAsynchronousWorkersCore config asynchronous-workers-core))
(when asynchronous-workers-max
(.setAsynchronousWorkersMax config asynchronous-workers-max))
(when (boolean? heuristic-caching-enabled)
(.setHeuristicCachingEnabled config heuristic-caching-enabled))
(when heuristic-coefficient
(.setHeuristicCoefficient config heuristic-coefficient))
(when heuristic-default-lifetime
(.setHeuristicDefaultLifetime config heuristic-default-lifetime))
(when max-cache-entries
(.setMaxCacheEntries config max-cache-entries))
(when max-object-size
(.setMaxObjectSize config max-object-size))
(when max-update-retries
(.setMaxUpdateRetries config max-update-retries))
;; I would add this option, but there is a bug in 4.x CacheConfig that
;; it does not actually correctly use the object from the builder.
;; It's fixed in 5.0 however
;; (when (boolean? never-cache-http10-responses-with-query-string)
;; (.setNeverCacheHTTP10ResponsesWithQueryString
;; config never-cache-http10-responses-with-query-string))
(when revalidation-queue-size
(.setRevalidationQueueSize config revalidation-queue-size))
(when (boolean? shared-cache)
(.setSharedCache config shared-cache))
(when (boolean? weak-etag-on-put-delete-allowed)
(.setWeakETagOnPutDeleteAllowed config weak-etag-on-put-delete-allowed))
(.build config)))))

(defn build-http-client
"Builds an Apache `HttpClient` from a clj-http request map. Optional arguments
`http-url` and `proxy-ignore-hosts` are used to specify the host and a list of
@@ -233,9 +308,14 @@
http-builder-fns cookie-spec
cookie-policy-registry]
:as req}
conn-mgr & [http-url proxy-ignore-hosts]]
caching?
conn-mgr
& [http-url proxy-ignore-hosts]]
;; have to let first, otherwise we get a reflection warning on (.build)
(let [^HttpClientBuilder builder (-> (HttpClients/custom)
(let [cache? (opt req :cache)
^HttpClientBuilder builder (-> (if caching?
(CachingHttpClientBuilder/create)
(HttpClients/custom))
(.setConnectionManager conn-mgr)
(.setRedirectStrategy
(get-redirect-strategy req))
@@ -246,6 +326,8 @@
(get-route-planner
proxy-host proxy-port
proxy-ignore-hosts http-url)))]
(when cache?
(.setCacheConfig builder (build-cache-config req)))
(when (or cookie-policy-registry cookie-spec)
(if cookie-policy-registry
;; They have a custom registry they'd like to re-use, so use that
@@ -357,9 +439,11 @@
((make-proxy-method-with-body request-method) http-url)
(make-proxy-method request-method http-url))))

(defn ^HttpClientContext http-context [request-config http-client-context]
(defn ^HttpClientContext http-context [caching? request-config http-client-context]
(let [^HttpClientContext typed-context (or http-client-context
(HttpClientContext/create))]
(if caching?
(HttpCacheContext/create)
(HttpClientContext/create)))]
(doto typed-context
(.setRequestConfig request-config))))

@@ -433,7 +517,10 @@
:major (.getMajor protocol-version)
:minor (.getMinor protocol-version)}
:reason-phrase (.getReasonPhrase status)
:trace-redirects (mapv str (.getRedirectLocations context))}]
:trace-redirects (mapv str (.getRedirectLocations context))
:cached (when (instance? HttpCacheContext context)
(when-let [cache-resp (.getCacheResponseStatus context)]
(-> cache-resp str keyword)))}]
(if (opt req :save-request)
(-> response
(assoc :request req)
@@ -475,6 +562,7 @@
proxy-ignore-hosts proxy-user proxy-pass digest-auth ntlm-auth]
:as req} respond raise]
(let [async? (opt req :async)
cache? (opt req :cache)
scheme (name scheme)
http-url (str scheme "://" server-name
(when server-port (str ":" server-port))
@@ -486,8 +574,8 @@
#{"localhost" "127.0.0.1"})
^RequestConfig request-config (or http-request-config
(request-config req))
^HttpClientContext context (http-context
request-config http-client-context)
^HttpClientContext context
(http-context cache? request-config http-client-context)
^HttpUriRequest http-req (http-request-for
request-method http-url body)]
(when-not (conn/reusable? conn-mgr)
@@ -530,9 +618,10 @@
(.addHeader http-req header-n (str header-v))))
(when (opt req :debug) (print-debug! req http-req))
(if-not async?
(let [^CloseableHttpClient
client (or http-client
(build-http-client req conn-mgr http-url proxy-ignore-hosts))]
(let [^CloseableHttpClient client
(or http-client
(build-http-client req cache?
conn-mgr http-url proxy-ignore-hosts))]
(try
(build-response-map (.execute client http-req context)
req http-req http-url conn-mgr context client)
@@ -542,6 +631,9 @@
(throw t))))
(let [^CloseableHttpAsyncClient client
(build-async-http-client req conn-mgr http-url proxy-ignore-hosts)]
(when cache?
(throw (IllegalArgumentException.
"caching is not yet supported for async clients")))
(.start client)
(.execute client http-req context
(reify org.apache.http.concurrent.FutureCallback
@@ -35,6 +35,9 @@
(condp = [(:request-method req) (:uri req)]
[:get "/get"]
{:status 200 :body "get"}
[:get "/dont-cache"]
{:status 200 :body "nocache"
:headers {"cache-control" "private"}}
[:get "/empty"]
{:status 200 :body nil}
[:get "/empty-gzip"]
@@ -834,3 +837,64 @@
;; Format a list of cookies into a list of headers
(formatCookies [cookies] (java.util.ArrayList.))))})]
(is (= @validated true))))


(deftest t-cache-config
(let [cc (core/build-cache-config
{:cache-config {:allow-303-caching true
:asynchronous-worker-idle-lifetime-secs 10
:asynchronous-workers-core 2
:asynchronous-workers-max 3
:heuristic-caching-enabled true
:heuristic-coefficient 1.5
:heuristic-default-lifetime 12
:max-cache-entries 100
:max-object-size 123
:max-update-retries 3
:revalidation-queue-size 2
:shared-cache false
:weak-etag-on-put-delete-allowed true}})]
(is (= true (.is303CachingEnabled cc)))
(is (= 10 (.getAsynchronousWorkerIdleLifetimeSecs cc)))
(is (= 2 (.getAsynchronousWorkersCore cc)))
(is (= 3 (.getAsynchronousWorkersMax cc)))
(is (= true (.isHeuristicCachingEnabled cc)))
(is (= 1.5 (.getHeuristicCoefficient cc)))
(is (= 12 (.getHeuristicDefaultLifetime cc)))
(is (= 100 (.getMaxCacheEntries cc)))
(is (= 123 (.getMaxObjectSize cc)))
(is (= 3 (.getMaxUpdateRetries cc)))
(is (= 2 (.getRevalidationQueueSize cc)))
(is (= false (.isSharedCache cc)))
(is (= true (.isWeakETagOnPutDeleteAllowed cc)))))

(deftest ^:integration t-client-caching
(run-server)
(let [cm (conn/make-reusable-conn-manager {})
r1 (client/get (localhost "/get")
{:connection-manager cm :cache true})
client (:http-client r1)
r2 (client/get (localhost "/get")
{:connection-manager cm :http-client client :cache true})
r3 (client/get (localhost "/get")
{:connection-manager cm :http-client client :cache true})
r4 (client/get (localhost "/get")
{:connection-manager cm :http-client client :cache true})]
(is (= :CACHE_MISS (:cached r1)))
(is (= :VALIDATED (:cached r2)))
(is (= :VALIDATED (:cached r3)))
(is (= :VALIDATED (:cached r4))))
(let [cm (conn/make-reusable-conn-manager {})
r1 (client/get (localhost "/dont-cache")
{:connection-manager cm :cache true})
client (:http-client r1)
r2 (client/get (localhost "/dont-cache")
{:connection-manager cm :http-client client :cache true})
r3 (client/get (localhost "/dont-cache")
{:connection-manager cm :http-client client :cache true})
r4 (client/get (localhost "/dont-cache")
{:connection-manager cm :http-client client :cache true})]
(is (= :CACHE_MISS (:cached r1)))
(is (= :CACHE_MISS (:cached r2)))
(is (= :CACHE_MISS (:cached r3)))
(is (= :CACHE_MISS (:cached r4)))))

0 comments on commit 2843e45

Please sign in to comment.
You can’t perform that action at this time.