[RFC] Consider caching data per-layer rather than per-request #25

jerluc · 2019-12-31T02:13:53Z

Currently, we cache data using the request path as a key and store the full response body as the value. This has the nice side effect of being very simple to implement and maintain, but comes with its drawbacks:

When a request to /_all/{z}/{x}/{y}.mvt is made, another call to /layer1/{z}/{x}/{y}.mvt will miss the cache, because the cache key is based purely on the request path
When a request to /_all/{z}/{x}/{y}.mvt is made, and a partial failure occurs, not only does the entire request fail, but none of the successful layer responses are cached, meaning a subsequent call would have to recompute the entire response, rather than only the failed responses

To fix these problems, we should consider using something like {layer}/{z}/{x}/{y} as a cache key, and caching individual feature collections per layer response. Then, in the above two scenarios:

When a request to /_all/{z}/{x}/{y}.mvt is made, all layer responses get cached, and another call to /layer1/{z}/{x}/{y}.mvt will hit the cache, because the cache key is based on the layer name
When a request to /_all/{z}/{x}/{y}.mvt is made, and a partial failure occurs, the successful layer responses are cached, meaning a subsequent call would only have to recompute the failed layers

The text was updated successfully, but these errors were encountered:

jerluc · 2021-02-03T00:53:34Z

As a further improvement to this logic, we may even want to encode some content-based hashing of the layer configuration itself as a pseudo version number that is encoded in the cache key, so that layer configuration changes are picked up instantaneously, rather than only at cache entry timeout.

)

This changeset closes #25 by implementing a simple cache-per-layer approach (as opposed to the current approach of cache-per-URL). At a high level, this is how the new caching approach works: - When a request comes in for one or more layers, we do the following for each layer: - We create a cache key, by composing together a stringified representation of the layer and the tile request - A layer is stringified by composing its name and a content-based hash (SHA256) of its effective configuration; the idea here is that the hash should only change when its configuration changes, but should be the same across restarts - A tile request is stringified by composing the `z`, `x`, and `y`, values, along with any additional query parameters - The cache key format roughly resembles: `{layer}@{layer-sha256}/{z}/{x}/{y}?{args}` - If the current layer is cacheable (configured with `nocache: false`, which is the default when omitted), then we first check to see if the layer data is stored in the cache at the computed key; otherwise, we pull the layer data from its backend source - Also, if the current layer is cacheable, we then marshal the layer data into the raw GZIPed MVT binary format and store it at the computed cache key - Lastly, we return the in-memory layer data to the caller With these changes in place, we get a few major improvements: 1. **Improved cache hit ratio for multi-layer configurations**: because this now caches per layer vs. per URL/request, we should see moderate performance improvements for mixed-combination tile requests, e.g. a request to something like `/_all/z/x/y.mvt`, followed by `/layer1/z/x/y.mvt`, should hit the `layer1` cache on the second request, since `layer1` gets cached in the first request at a more granular cache key 2. **More fine-grained control for the caching behavior of multi-layer configurations**: there's still room for improvement, but between computing hash keys per layer (vs. per URL), and exposing the optional `nocache` option for each configured layer, this adds an extra level of flexibility 3. **Better path forward for reconfigurations**: previously there was some buggy cache behavior when reconfiguring a layer under the cache-per-URL implementation, as the URL doesn't change even if your layer configuration does; by using the configuration in the cache key (using the configuration hash digest), we can ensure that layer data is freshly retrieved whenever its configuration changes, and that layer cache data is reused when the configuration is the same 4. **Better resilience to cache failure**: previously, if retrieving data from the cache failed for any reason, we would fail the entire request; in this revised implementation, cache retrieval failures are treated the same as a cache miss, which should fix a broad class of potential issues (e.g. temporary network failures, bad data/deserialization issues, etc.) That said, there are a couple of potential new thoughts that come out of this: * The cache configuration is still global (including things like TTL); since we now have additional cache controls per layer, should we instead consider moving the cache configuration as a whole into each layer? Is there some hybrid approach where each layer could instead reference a global cache? * In this implementation, there is some additional serialization/deserialization overhead that has been introduced between the tilenol server and the backend cache; what is the impact of this overhead and are there alternative ways to approach layer data serialization to avoid this overhead?

jerluc added the enhancement New feature or request label Dec 31, 2019

jerluc added perf Performance-related rfc Request For Comments and removed enhancement New feature or request labels Nov 2, 2020

jerluc mentioned this issue Feb 3, 2021

Add an option to auto-reload configuration file on change #41

Open

jerluc self-assigned this Feb 3, 2021

jerluc added a commit that referenced this issue Feb 3, 2021

[WIP] Implements cache-per-layer logic instead of cache-per-request (#25

3e3daa8

)

jerluc changed the title ~~Consider caching data per-layer rather than per-request~~ [RFC] Consider caching data per-layer rather than per-request Feb 4, 2021

jerluc added this to the v1.1.0 milestone Feb 4, 2021

jerluc added a commit that referenced this issue Feb 10, 2021

[WIP] Implements cache-per-layer logic instead of cache-per-request (#25

c735be6

)

jerluc mentioned this issue Feb 10, 2021

Cache-per-layer #45

Merged

jerluc closed this as completed in #45 Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Consider caching data per-layer rather than per-request #25

[RFC] Consider caching data per-layer rather than per-request #25

jerluc commented Dec 31, 2019

jerluc commented Feb 3, 2021