Skip to content

Latest commit

 

History

History
154 lines (92 loc) · 8.4 KB

Caching.markdown

File metadata and controls

154 lines (92 loc) · 8.4 KB

Caching in Wrest

RFC 2616's Caching section describes in detail how Caching is to be implemented by the clients.

A response should obey the following conditions to be considered cacheable by Wrest:

  • Only responses to GET requests are cached.
  • The response code must be 200, 203, 300, 301, 302, 304 or 307.
  • The Cache-Control headers should not have neither no-cache nor no-store flag.
  • There should not be Pragma: no-cache header. (this header is only used by HTTP 1.0 servers)
  • Either Cache-Control: max-age or the Expires headers (or both) should be set. (Cache-control: max-age always take priority over Expires header.)
  • If only Expires header is set, it should not be lesser than the response's Date header. It should also be greater than the time when the response was received by the client.
  • The date headers (Date, Expires) should be in RFC 1123 format.
  • The Vary header should not be present at all. (The Vary mechanism is used to conditionally control caching, which Wrest does not currently implement. Section 14.44 of the RFC 2616 describes the Vary tag in detail)

Whenever a GET request is sent to Wrest, it consults the Cache Store for a matching entry. If an entry is found and has not expired, it is returned back as the response without making a request to the server.

A cache entry is considered to be fresh (not expired) if:

  • Its freshness lifetime is greater than zero.

    • Freshness lifetime of a cache entry is its Cache-control: max-age if max-age is defined. If max-age is not defined, it would be the cache entry's Expires header-Current Time. (note: either max-age or Expires header is liable to be present for the cache entry since only such response's are cached at all).

    AND

  • Its freshness lifetime is greater than the cache entry's age.

    • Age of a cache entry is: Current Date & Time - the cached response's Date header, or the value of the Age header in the cached response, whichever is greater.

If a cache entry is available, but expired, Wrest sees if the entry can be validated. A cache entry can be validated if:

  • It has a Last-Modified header, or an ETag header, or both.

If a cache-entry can be validated, Wrest sends the actual GET request to the server, alongwith:

  • If-Modified-Since : (if the header Last-Modified was present in the cache entry), and/or
  • If-None-Match: (if ETag was present in the cache entry)

The server determines whether the response cached at the client is still valid by looking at the values of the If-Modified-Since/If-None-Match headers. It sends a 304 (Not Modified) response without a body, if the response available with the client is still valid.

Wrest, upon receiving the 304 will update the existing cache entry with the headers provided in the 304 (RFC 2616 13.5.3 Combining Headers) and return the cached response to the client.

If the server determines the cached entry at the client side is invalid, it sends a full response (usually 200 Ok), which Wrest passes to the client after updating the existing cache entry with the new response.

If the cache-entry is expired, but cannot be validated, then Wrest sends a full blown GET request to the server. The response is passed to the client after updating the existing cache entry with the new response.

Edge Case for HTML documents

   <META HTTP-EQUIV="Pragma" CONTENT="no-cache">

Firefox respects the Pragma header in the HTML document (nsHttpResponseHead.h:NoCache). Wrest cannot since it does not parse the response body.

A Rough note on how the browsers (Firefox and Chrome) implement caching

Browsers usually cache all responses including non-cacheable ones. These are for use in the browser History (Forward, Back buttons). [ RFC 2616 13.13 History Lists] The non-cachebility restriction is usually observed after fetching a cache entry - if the stored response was not cacheable, it is not used.

A large chunk of caching logic for Firefox 3 is in the file netwerk/protcols/http/nsHttpChannel.cpp inside its source tree.

The browsers are optimistic with respect to caching - if a response does not explicitly specify an Expiration mechanism, it uses its own heuristics to calculate an Expiry time. However Wrest is pessimistic - if a document does not specifiy an explicit cache expiration mechanism, the response is not cached at all.

The following is a rough outline that I'd written to understand how the browsers implement caching. However, they do not necessarily reflect the browsers' behaviour accurately and has been heaviliy adapted to suit Wrest.

Firefox: nsHttpChannell::CheckCache()

do_fetch if method.head != cache.head do_fetch if not (method.head = 'GET' || method.head = 'HEAD')

use_cache if Cache-Control: max-age validates. Refer cache_expired?

re_validate if:

  • Expires: header is a past date OR cache_expired?
  • the cache entry has 'must-revalidate' header. RFC 2616 14.9.4

doValidation

Add an If-Modified-Since to the request if the cache has a Last-Modified value. Add an If-None-Match to the request if the cache had an ETag

Send Request.

If a full response is received, update cache and return the result. If a Not-Modified received, return the cache itself.

Do Not Store in Cache If

cache_expired?

Firefox: nsHttpResponseHead.cpp: ComputeCurrentAge Chrome: RequiresValidation in http_response_headers.cc

freshness_time=freshness_lifetime
if fresh <= 0
  return true
end

return freshness_time <= current_age

current_age

Verbatim from Chrome's http_response_headers.cc

date_value = headers['Date'] || response_time;
age_value=headers['Age'] || 0

apparent_age = response_time - date_value
corrected_received_age = max(apparent_age, age_value);
response_delay = response_time - request_time;
corrected_initial_age = corrected_received_age + response_delay;
resident_time = Time.now - response_time;

corrected_initial_age + resident_time;

freshness_lifetime

This is a link to Chrome source code where freshness_lifetime is defined.

References

Alternate Cache Implementations

Resourceful - Ruby HTTP client that does caching

Python Httplib2 library