New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HTTP proxy caching instead of file/disk caching #296

Closed
knowtheory opened this Issue Nov 10, 2015 · 10 comments

Comments

Projects
None yet
2 participants
@knowtheory
Member

knowtheory commented Nov 10, 2015

DocumentCloud has since its outset used Rails's page caching mechanism to store JS and JSON blobs to feed our frontend javascript apps.

Page caching is nice for several reasons.

  • It's easy to understand: a file is either cached on disk or it isn't
  • it's easy to control: if you need to expire the cache you delete the file
  • and the configuration is simple: so long as the rails app writes out files to disk into a directory structure that matches your rails routes, NGINX will happily serve files located in those directories instead of making requests from rails.

Page caching also has a variety of limitations:

  • caching is limited to uris that map to file paths (e.g. query params aren't respected)
  • caching is also limited to uri paths which are shorter than the file system character limit on file names

The problem

These limitations mean that caching fails in a variety of circumstances ranging from very long urls (which can happen when a user attempts to embed a document collection specified by a very long search query) or in the case of JSONP resources.

Proposal

We can/should switch to using NGINX as a caching proxy to the app instead of disk caching resource urls.

What'll this entail?

This will entail rewriting our nginx configs to set up the front end proxy. It will also require reworking a few parts of the app platform:

  • prefer_secure and secure_only and the structure of the proxy<->app relationship need to be addressed so that the app knows when/how to redirect users to secure resources.
  • Rework caching itself to be specified with cache control headers
  • Rework cache expiry to use proxy_cache_bypass in order to update cached resources in place.
  • benchmark endpoints prior to implementing HTTP caching
  • Tests to ensure that caching ALWAYS respects login status AND caching expiry is respected AND can withstand high volumes of requests.
  • benchmark endpoints after HTTP caching is implemented

Potential risks/drawbacks

  • HTTP caching is not a panacea and over-caching can produce problematic end user experiences.

More discussion to follow.

@knowtheory

This comment has been minimized.

Show comment
Hide comment
@knowtheory

This comment has been minimized.

Show comment
Hide comment
@knowtheory

knowtheory Nov 13, 2015

Member

forgot to drop these stats demonstrating the difference between a cached & uncached endpoint: https://gist.github.com/knowtheory/307fcf34acd6a9427787

Member

knowtheory commented Nov 13, 2015

forgot to drop these stats demonstrating the difference between a cached & uncached endpoint: https://gist.github.com/knowtheory/307fcf34acd6a9427787

@reefdog

This comment has been minimized.

Show comment
Hide comment
@reefdog

reefdog Nov 13, 2015

Contributor

Issues that are affected/solved by this:

Contributor

reefdog commented Nov 13, 2015

Issues that are affected/solved by this:

esthervillars added a commit that referenced this issue Mar 8, 2016

Adds jmeter for #296 caching, initial test for endpoints at dci 104
Adds listeners to the testplan in order to output reports and graphs after the test runs.

esthervillars added a commit that referenced this issue Mar 8, 2016

Adds jmeter for #296 caching, initial test for endpoints at dci 104
Adds listeners to the testplan in order to output reports and graphs after the test runs.
@reefdog

This comment has been minimized.

Show comment
Hide comment
@reefdog

reefdog Mar 21, 2016

Contributor

Nginx reserves key-based cache expiration for the $1,000/year platform, so we're starting with a simple five-second cache on all resources, with no platform expiration. This gets us cache performance for popular resources, but one downside: unpopular resources will always be cold.

We should factor this into our benchmarks and consider how this affects real-world use.

Contributor

reefdog commented Mar 21, 2016

Nginx reserves key-based cache expiration for the $1,000/year platform, so we're starting with a simple five-second cache on all resources, with no platform expiration. This gets us cache performance for popular resources, but one downside: unpopular resources will always be cold.

We should factor this into our benchmarks and consider how this affects real-world use.

@knowtheory

This comment has been minimized.

Show comment
Hide comment
@knowtheory

knowtheory Mar 21, 2016

Member

Right so, generally speaking response times tend to follow a power law distribution.

Front end caching doesn't obviate work that has to be done backend services, whether database, processing queue or search server.

What front end caching does is reduce duplicate requests. For now, this generally matches DocumentCloud's behavior in several important cases (for the purpose of serving web traffic).

Member

knowtheory commented Mar 21, 2016

Right so, generally speaking response times tend to follow a power law distribution.

Front end caching doesn't obviate work that has to be done backend services, whether database, processing queue or search server.

What front end caching does is reduce duplicate requests. For now, this generally matches DocumentCloud's behavior in several important cases (for the purpose of serving web traffic).

@reefdog

This comment has been minimized.

Show comment
Hide comment
@reefdog

reefdog Mar 21, 2016

Contributor

Yeah, just wanted us to stay aware since we're transfering from a cache system that, while overall vastly inferior, was equitable in its performance distribution. Realized our benchmarks should try to take both use cases into consideration. Like BitTorrent, the new cache system will work best when it counts most, but could regress performance of the long tail.

Contributor

reefdog commented Mar 21, 2016

Yeah, just wanted us to stay aware since we're transfering from a cache system that, while overall vastly inferior, was equitable in its performance distribution. Realized our benchmarks should try to take both use cases into consideration. Like BitTorrent, the new cache system will work best when it counts most, but could regress performance of the long tail.

@reefdog

This comment has been minimized.

Show comment
Hide comment
@reefdog

reefdog Apr 26, 2016

Contributor

@knowtheory We can close this, yeah?

Contributor

reefdog commented Apr 26, 2016

@knowtheory We can close this, yeah?

@reefdog

This comment has been minimized.

Show comment
Hide comment
@reefdog

reefdog Oct 25, 2016

Contributor

Ping. We can close, yeah? There's a list of unchecked things above, is why I ask before just doing it.

Contributor

reefdog commented Oct 25, 2016

Ping. We can close, yeah? There's a list of unchecked things above, is why I ask before just doing it.

@knowtheory

This comment has been minimized.

Show comment
Hide comment
@knowtheory

knowtheory Oct 25, 2016

Member

Working from the list above backwards:

  • Caching was deployed and helped us withstand the attention focused on the panama papers.
  • I made caching configurable per request/endpoint in order to give public search caches a shelf life longer than 10 seconds (the default).
  • I benchmarked NGINX as a reverse proxy cache for all public & anonymous resources with the NGINX configuration in the repo. We used jmeter to test this and saw substantial performance benefits (matching disk caching performance).
  • I set up our NGINX reverse proxy configuration and platform to distinguish between authenticated and unauthenticated users via a cookie (see the cache config and cookie settings for it )
Member

knowtheory commented Oct 25, 2016

Working from the list above backwards:

  • Caching was deployed and helped us withstand the attention focused on the panama papers.
  • I made caching configurable per request/endpoint in order to give public search caches a shelf life longer than 10 seconds (the default).
  • I benchmarked NGINX as a reverse proxy cache for all public & anonymous resources with the NGINX configuration in the repo. We used jmeter to test this and saw substantial performance benefits (matching disk caching performance).
  • I set up our NGINX reverse proxy configuration and platform to distinguish between authenticated and unauthenticated users via a cookie (see the cache config and cookie settings for it )

@knowtheory knowtheory closed this Oct 25, 2016

@reefdog

This comment has been minimized.

Show comment
Hide comment
@reefdog

reefdog Oct 25, 2016

Contributor

Caching was deployed and helped us withstand the attention focused on the panama papers.

Haha, forgot that we got that out just before those hit. Great timing.

Right, so one thing I'd forgotten is that we only serve cached responses to unauthenticated users. Which is fine, it covers the "keep the servers up when slammed by Panama Paper traffic" intended use case, but I was just getting some unexpected test results until I remembered that.

Contributor

reefdog commented Oct 25, 2016

Caching was deployed and helped us withstand the attention focused on the panama papers.

Haha, forgot that we got that out just before those hit. Great timing.

Right, so one thing I'd forgotten is that we only serve cached responses to unauthenticated users. Which is fine, it covers the "keep the servers up when slammed by Panama Paper traffic" intended use case, but I was just getting some unexpected test results until I remembered that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment