-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Consider caching data per-layer rather than per-request #25
Comments
jerluc
added
perf
Performance-related
rfc
Request For Comments
and removed
enhancement
New feature or request
labels
Nov 2, 2020
As a further improvement to this logic, we may even want to encode some content-based hashing of the layer configuration itself as a pseudo version number that is encoded in the cache key, so that layer configuration changes are picked up instantaneously, rather than only at cache entry timeout. |
jerluc
added a commit
that referenced
this issue
Feb 3, 2021
jerluc
changed the title
Consider caching data per-layer rather than per-request
[RFC] Consider caching data per-layer rather than per-request
Feb 4, 2021
jerluc
added a commit
that referenced
this issue
Feb 10, 2021
Merged
jerluc
added a commit
that referenced
this issue
Feb 10, 2021
This changeset closes #25 by implementing a simple cache-per-layer approach (as opposed to the current approach of cache-per-URL). At a high level, this is how the new caching approach works: - When a request comes in for one or more layers, we do the following for each layer: - We create a cache key, by composing together a stringified representation of the layer and the tile request - A layer is stringified by composing its name and a content-based hash (SHA256) of its effective configuration; the idea here is that the hash should only change when its configuration changes, but should be the same across restarts - A tile request is stringified by composing the `z`, `x`, and `y`, values, along with any additional query parameters - The cache key format roughly resembles: `{layer}@{layer-sha256}/{z}/{x}/{y}?{args}` - If the current layer is cacheable (configured with `nocache: false`, which is the default when omitted), then we first check to see if the layer data is stored in the cache at the computed key; otherwise, we pull the layer data from its backend source - Also, if the current layer is cacheable, we then marshal the layer data into the raw GZIPed MVT binary format and store it at the computed cache key - Lastly, we return the in-memory layer data to the caller With these changes in place, we get a few major improvements: 1. **Improved cache hit ratio for multi-layer configurations**: because this now caches per layer vs. per URL/request, we should see moderate performance improvements for mixed-combination tile requests, e.g. a request to something like `/_all/z/x/y.mvt`, followed by `/layer1/z/x/y.mvt`, should hit the `layer1` cache on the second request, since `layer1` gets cached in the first request at a more granular cache key 2. **More fine-grained control for the caching behavior of multi-layer configurations**: there's still room for improvement, but between computing hash keys per layer (vs. per URL), and exposing the optional `nocache` option for each configured layer, this adds an extra level of flexibility 3. **Better path forward for reconfigurations**: previously there was some buggy cache behavior when reconfiguring a layer under the cache-per-URL implementation, as the URL doesn't change even if your layer configuration does; by using the configuration in the cache key (using the configuration hash digest), we can ensure that layer data is freshly retrieved whenever its configuration changes, and that layer cache data is reused when the configuration is the same 4. **Better resilience to cache failure**: previously, if retrieving data from the cache failed for any reason, we would fail the entire request; in this revised implementation, cache retrieval failures are treated the same as a cache miss, which should fix a broad class of potential issues (e.g. temporary network failures, bad data/deserialization issues, etc.) That said, there are a couple of potential new thoughts that come out of this: * The cache configuration is still global (including things like TTL); since we now have additional cache controls per layer, should we instead consider moving the cache configuration as a whole into each layer? Is there some hybrid approach where each layer could instead reference a global cache? * In this implementation, there is some additional serialization/deserialization overhead that has been introduced between the tilenol server and the backend cache; what is the impact of this overhead and are there alternative ways to approach layer data serialization to avoid this overhead?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, we cache data using the request path as a key and store the full response body as the value. This has the nice side effect of being very simple to implement and maintain, but comes with its drawbacks:
/_all/{z}/{x}/{y}.mvt
is made, another call to/layer1/{z}/{x}/{y}.mvt
will miss the cache, because the cache key is based purely on the request path/_all/{z}/{x}/{y}.mvt
is made, and a partial failure occurs, not only does the entire request fail, but none of the successful layer responses are cached, meaning a subsequent call would have to recompute the entire response, rather than only the failed responsesTo fix these problems, we should consider using something like
{layer}/{z}/{x}/{y}
as a cache key, and caching individual feature collections per layer response. Then, in the above two scenarios:/_all/{z}/{x}/{y}.mvt
is made, all layer responses get cached, and another call to/layer1/{z}/{x}/{y}.mvt
will hit the cache, because the cache key is based on the layer name/_all/{z}/{x}/{y}.mvt
is made, and a partial failure occurs, the successful layer responses are cached, meaning a subsequent call would only have to recompute the failed layersThe text was updated successfully, but these errors were encountered: