From 00976d7398ab4557c05bf075cbe93c146f698c9a Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich+github@gmail.com>
Date: Thu, 13 Jul 2023 18:42:48 -0700
Subject: [PATCH 01/12] Create advanced-kv-guide.md

Write up tips and tricks that might be relevant to advanced users.
---
 content/workers/learning/advanced-kv-guide.md | 135 ++++++++++++++++++
 1 file changed, 135 insertions(+)
 create mode 100644 content/workers/learning/advanced-kv-guide.md
diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
new file mode 100644
index 00000000000000..5d6b8c868387c0
--- /dev/null
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -0,0 +1,135 @@
+---
+pcx_content_type: concept
+title: Advanced Workers KV Topics
+weight: 7
+---
+
+# Background
+
+To get the best possible performance out of your usage of Workers KV, this document contains some tips.
+As background, it's best to review [how KV works](/workers/runtime-apis/kv/#writing-data-in-bulk).
+
+# Performance optimizations
+
+## Optimizing .get long tail performance
+
+### Embrace long cacheTtl
+
+To optimize the long-tail performance of infrequently accessed keys, specify a longer cacheTtl value (e.g. 1 day).
+Historically, a blocker for many customers was that this meant that your reads wouldn't see writes for the duration of the cacheTtl.
+However, as described in our [architecture blog post](https://blog.cloudflare.com/faster-workers-kv-architecture/), most customers today
+are using the new architecture where you will see updated values within a minute of the write, regardless of the cacheTtl value.
+
+{{<Aside type="note" header="Security considerations"> }}
+Some customers of Workers KV store authorization tokens. Often time such applications rely on having a strict guarantee on revocation.
+For example, if your service SLA is that a revoked token must be globally revoked within 5 minutes of revocation, your cacheTtl should
+not be longer than 5 minutes. While the write will be noticed within a minute, writes are only noticed due to misses or reads triggering
+a background refresh. If your key is accessed once every 4 minutes and you set a cacheTTL of 10 minutes, it's possible that you will
+exceed your SLA by a few minutes.
+{{</Aside>}}
+
+{{<Aside type="note" header="Availability of long cache TTL noticing writes quickly"> }}
+Certain namespaces part of early closed betas and larger ENT customers are currently excluded. If you want to use the new architecture
+but think you may not be enabled, please contact support.
+{{</Aside>}}
+
+### Reducing cardinality by coalescing keys
+
+If you have a set of related key-value pairs that have a mixed usage pattern (some hot keys and some cold keys), consider
+coalescing them.
+
+#### Merging into a "super" KV entry
+One coalescing technique is to make all the keys and values part of a super key/value object. For example, something like this:
+```
+key1: value1
+key2: value2
+key3: value3
+```
+becomes
+```
+coalesced: {
+  key1: value1,
+  key2: value2,
+  key3: value3,
+}
+```
+
+By coalescing the values, the cold keys benefit from being kept alive in the cache because of acccess patterns of the warmer keys.
+
+This works best if you don't think you'll need to update the values independently of each other which can pose race conditions unless you're
+careful about how you synchronize.
+
+**Pros**: Infrequently accessed keys are kept in the cache.
+**Cons**: Size of the resultant value can easily push your worker out of it's memory limits. Safely updating the value requires a [locking mechanism](#concurrent-writers) of some kind.
+
+#### Storing in metadata and shared prefix
+
+If you don't want to merge into a single KV entry as described above and your associated values fit within the [metadata limit](/workers/platform/limits/#kv-limits),
+then you can store the values within the metadata instead of the body. If you then name the keys with a shared unique prefix, your list operation will contain
+the value letting you bulk read multiple keys at once through a single, cacheable list operation.
+
+{{ <Aside type="note" header="List performance note"> }}
+List operations are not "write aware". This means that while they are subject to tiering, they only stay cached for up to one minute past when it was last read, even
+at upper tiers. By comparison, get operations are cached at the upper tiers for a service managed duration that is always longer than your cacheTtl. Additionally, the cacheTtl
+lets you extend the duration of a single key lookup at the data center closest to the request.
+{{ </Aside> }}
+
+## Read the values as part of the list
+
+If you have small values that fit within the [metadata limit](/workers/platform/limits/#kv-limits), you can store the value within the metadata instead.
+This makes the value accessible during the list, avoiding the need to do a second I/O round-trip while iterating in case a lookup ends up missing the local cache.
+
+{{ <Aside type="note" header="List performance note"> }}
+See above about the implications of cache duration and list operations.
+{{ </Aside> }}
+
+## Avoid using the GET REST API at the Edge
+
+Today, the REST API for Workers KV goes all the way to central Cloudflare data centers. This is fine for PUT/DELETE requests as
+those are typically not latency sensitive and aren't cacheable so they need to transit to central locations anyway. However,
+using the REST API to read a key or performance a list isn't going to perform as well because you always have to do a long
+distance round trip before you hit a cache. Conversely, a `.get/.getWithMetadata` and `.list` operation within your Worker
+will access the cache closest to where the request originated from.
+
+# Concurrent writers to a single key
+
+Today, we do not offer any native conditional put features. `.put` will always clobber the value and if you have multiple
+writers, there's no guarantee about a winner. This is even more problematic if you have structured data and want to do a
+partial update of a value. A lot of customers have success creating a [Durable Object](/workers/learning/using-durable-objects)
+and making it responsible for all the writes to your KV namespace. This way, you can serialize access for writing the value.
+
+**Caution**: Workers KV is an eventually consistent system. If you try to do a read/modify/write operation where the read is
+coming from KV, you can cause modifications to be lost because there's no guarantee that you will always read the most recent
+value written, even if the write is from the same data center. Additionally, where a Durable Object is running moves around
+outside of your control.
+
+If guaranteeing that you never lose a mutation is important, consider making a strongly consistent storage system your
+ground truth that you read/modify/write and then write the updated value to KV to broadcast it out (Durable Object storage or
+R2 with conditional upload). That way your value is updated in a strongly consistent fashion and once that happens, you publish
+it to KV for reading.
+
+# Noticing updated values within seconds
+
+Currently, reads have a "refreshTtl" of 1 minute. This means that a write is noticed within 1 minute of a read being issued.
+While we aren't yet ready to let customers customize the refreshTtl themselves within the Runtime API, if this is important
+to your use-case, please contact support to change the default for your namespace and we can work with you.
+
+# Benchmarking Workers KV
+
+Benchmarking to predict what your Workers KV performance will look like in production is tricky and nuanced. It's best to try
+to put production load onto the system and then measure real-world performance rather than trying to do a synthetic test.
+Examples of issues that can trip up even internal engineers who know all the technical details:
+
+* A low traffic Worker is more subject to cold starts.
+* Within something we call "MCP"s, we have multiple virtual data centers within a single PoP. Which virtual data center you hit
+is random and currently such data centers have disjoint caches and require even more traffic to keep the isolate for your Worker
+warm.
+* [wrk](https://github.com/wg/wrk) can typically generate substantial enough load from a single machine (thousands of requests
+per second) which should probably be enough to representative and overcome such issues, but it requires careful tuning of
+parameters to achieve max throughput.
+* Synthetic tests are typically hand-written and often fail to reproduce real-world access patterns for keys (if you have multiple keys).
+If you have a recording you can play through of the access patterns, that might work well. A representative recording is difficult
+to capture in practice because of the global nature of Cloudflare Workers.
+
+In essence, Cloudflare's infrastructure gets faster the more traffic you put on them, and synthetic tests often cannot generate
+enough load to simulate that properly.

From 902a415a219a06372020b39f6aac5b0aba3cde7c Mon Sep 17 00:00:00 2001
From: Matt Silverlock <matt@eatsleeprepeat.net>
Date: Fri, 14 Jul 2023 08:47:04 -0400
Subject: [PATCH 02/12] kv: Apply suggestions from code review

---
 content/workers/learning/advanced-kv-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index 5d6b8c868387c0..52390ea7fbaaee 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -20,7 +20,7 @@ Historically, a blocker for many customers was that this meant that your reads w
 However, as described in our [architecture blog post](https://blog.cloudflare.com/faster-workers-kv-architecture/), most customers today
 are using the new architecture where you will see updated values within a minute of the write, regardless of the cacheTtl value.
 
-{{<Aside type="note" header="Security considerations"> }}
+{{<Aside type="note" header="Security considerations">}}
 Some customers of Workers KV store authorization tokens. Often time such applications rely on having a strict guarantee on revocation.
 For example, if your service SLA is that a revoked token must be globally revoked within 5 minutes of revocation, your cacheTtl should
 not be longer than 5 minutes. While the write will be noticed within a minute, writes are only noticed due to misses or reads triggering
@@ -28,7 +28,7 @@ a background refresh. If your key is accessed once every 4 minutes and you set a
 exceed your SLA by a few minutes.
 {{</Aside>}}
 
-{{<Aside type="note" header="Availability of long cache TTL noticing writes quickly"> }}
+{{<Aside type="note" header="Availability of long cache TTL noticing writes quickly">}}
 Certain namespaces part of early closed betas and larger ENT customers are currently excluded. If you want to use the new architecture
 but think you may not be enabled, please contact support.
 {{</Aside>}}

From 69b4a75b5c6f8b7e772c19caa735aa52dd174462 Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 14 Jul 2023 08:29:29 -0700
Subject: [PATCH 03/12] Fix typo

---
 content/workers/learning/advanced-kv-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index 52390ea7fbaaee..7df428d4664673 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -54,7 +54,7 @@ coalesced: {
 }
 ```
 
-By coalescing the values, the cold keys benefit from being kept alive in the cache because of acccess patterns of the warmer keys.
+By coalescing the values, the cold keys benefit from being kept alive in the cache because of access patterns of the warmer keys.
 
 This works best if you don't think you'll need to update the values independently of each other which can pose race conditions unless you're
 careful about how you synchronize.

From 75694c479ac4dde4ddaab6884369dbb1730bc7de Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 14 Jul 2023 11:29:14 -0700
Subject: [PATCH 04/12] Flush out more topics

---
 content/workers/learning/advanced-kv-guide.md | 58 ++++++++++++++++---
 1 file changed, 49 insertions(+), 9 deletions(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index 7df428d4664673..ef7b3decf142b2 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -33,6 +33,43 @@ Certain namespaces part of early closed betas and larger ENT customers are curre
 but think you may not be enabled, please contact support.
 {{</Aside>}}
 
+### Avoid hand-rolling Cache in front of Workers KV
+
+Workers KV is optimized to transparently refresh values in the background on your behalf based on actual access patterns to keep the
+values read lively when there's a write. If you put a cache in front of KV, then KV doesn't see any accesses for keys; every time you do
+end up hitting KV will be a cold request. At first glance this doesn't sound too bad and price conscious customers look to this option.
+From a performance perspective though, it will mean that your application will regularly experience long pauses as bunch of requests go
+to access hot keys that aren't in the cache anymore. This can happen either because you've set up your cache so that by the time KV sees
+a request it's outside the cacheTtl window or KV's cache has decided to evict your key because it wasn't getting any traffic.
+
+[BetterKV](https://flareutils.pages.dev/betterkv/) is a popular choice for many of our customers. Other customers choose to handroll. We
+haven't seen any customers who fix the pauses / stampeding herd problem and this is an attempt to provide a suggestion on how to solve that.
+
+* Don't put cache in front of KV.
+  * Pros: the system behaves optimally from a performance perspective.
+  * Cons: Today you get charged for cache reads.
+* Direct some subset of requests satisfied by the cache to KV anyway.
+  * Pros: You will solve the stampeding herd problem for your hottest keys at minimal cost.
+  * Cons: Stampeding herd problem will still be present for cooler keys.
+* Make sure you carefully tune your cache duration and the cacheTTL you use out of KV so that your cache duration is sufficiently less than
+the cacheTTL. For example, cache for 30s, probabilistically
+  * Pros: all keys will mostly avoid the stampeding herd problem
+  * Cons: More complicated to implement correctly.
+
+Recommendation: avoid putting a cache in front of KV & talk to our support staff about pricing. Putting a cache in front of KV also means
+that you can't partake in [noticing writes more quickly](#noticing-updated-values-within-seconds). If you want to go down this road anyway,
+ideally align the cacheTTL and your custom cache such that you refetch from KV before it gets expired from KV's cache. A quick and dirty
+solution may be to sample instead as getting all that logic to work correctly can be tricky. As always, make sure you have good observability
+that can highlight this problem for you so you can see it happening and know what to do about it. The [Workers Analytics Engine](/analytics/analytics-engine)
+is a convenient way to instrument your code to see when you're making KV requests and then watching out for sudden transient spikes for
+requests into KV.
+
+{{<Aside type="note" header="Timing considerations">}}
+`Date` comparisons can be used to get an [approximation](https://blog.cloudflare.com/mitigating-spectre-and-other-security-threats-the-cloudflare-workers-security-model/#step1disallowtimersandmultithreading)
+of how long an I/O operation took, but it's very coarse and likely too inaccurate for measuring KV performance.
+{{</Aside>}}
+
+
 ### Reducing cardinality by coalescing keys
 
 If you have a set of related key-value pairs that have a mixed usage pattern (some hot keys and some cold keys), consider
@@ -68,20 +105,20 @@ If you don't want to merge into a single KV entry as described above and your as
 then you can store the values within the metadata instead of the body. If you then name the keys with a shared unique prefix, your list operation will contain
 the value letting you bulk read multiple keys at once through a single, cacheable list operation.
 
-{{ <Aside type="note" header="List performance note"> }}
+{{<Aside type="note" header="List performance note">}}
 List operations are not "write aware". This means that while they are subject to tiering, they only stay cached for up to one minute past when it was last read, even
 at upper tiers. By comparison, get operations are cached at the upper tiers for a service managed duration that is always longer than your cacheTtl. Additionally, the cacheTtl
 lets you extend the duration of a single key lookup at the data center closest to the request.
-{{ </Aside> }}
+{{</Aside>}}
 
-## Read the values as part of the list
+## Batch reading multiple keys
 
 If you have small values that fit within the [metadata limit](/workers/platform/limits/#kv-limits), you can store the value within the metadata instead.
 This makes the value accessible during the list, avoiding the need to do a second I/O round-trip while iterating in case a lookup ends up missing the local cache.
 
-{{ <Aside type="note" header="List performance note"> }}
+{{<Aside type="note" header="List performance note">}}
 See above about the implications of cache duration and list operations.
-{{ </Aside> }}
+{{</Aside>}}
 
 ## Avoid using the GET REST API at the Edge
 
@@ -120,13 +157,16 @@ Benchmarking to predict what your Workers KV performance will look like in produ
 to put production load onto the system and then measure real-world performance rather than trying to do a synthetic test.
 Examples of issues that can trip up even internal engineers who know all the technical details:
 
-* A low traffic Worker is more subject to cold starts.
+* You don't have [permission](https://blog.cloudflare.com/mitigating-spectre-and-other-security-threats-the-cloudflare-workers-security-model/#step1disallowtimersandmultithreading)
+within the Runtime to get accurate timing measurements. That means you have to know to time externally to the system. At the same time,
+external timings are subject to sources of error that have nothing to do with Workers KV performance, particularly as described below.
+* A low traffic Worker is more subject to cold starts even though in practice cold starts don't exist once production traffic is flowing.
 * Within something we call "MCP"s, we have multiple virtual data centers within a single PoP. Which virtual data center you hit
-is random and currently such data centers have disjoint caches and require even more traffic to keep the isolate for your Worker
-warm.
+is random and today such data centers have disjoint caches and require even more traffic to keep the cache warm regardless of which
+virtual data center you randomly get routed to.
 * [wrk](https://github.com/wg/wrk) can typically generate substantial enough load from a single machine (thousands of requests
 per second) which should probably be enough to representative and overcome such issues, but it requires careful tuning of
-parameters to achieve max throughput.
+parameters to achieve max throughput and you have little to no visibility into Cloudflare's internal network to know if you succeeded.
 * Synthetic tests are typically hand-written and often fail to reproduce real-world access patterns for keys (if you have multiple keys).
 If you have a recording you can play through of the access patterns, that might work well. A representative recording is difficult
 to capture in practice because of the global nature of Cloudflare Workers.

From 59bc4c8b113cfc387631cb8494b347b0695927e0 Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 14 Jul 2023 12:37:16 -0700
Subject: [PATCH 05/12] cleanup wording again

---
 content/workers/learning/advanced-kv-guide.md | 31 ++++++++++++-------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index ef7b3decf142b2..be99d315d52817 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -36,14 +36,22 @@ but think you may not be enabled, please contact support.
 ### Avoid hand-rolling Cache in front of Workers KV
 
 Workers KV is optimized to transparently refresh values in the background on your behalf based on actual access patterns to keep the
-values read lively when there's a write. If you put a cache in front of KV, then KV doesn't see any accesses for keys; every time you do
-end up hitting KV will be a cold request. At first glance this doesn't sound too bad and price conscious customers look to this option.
-From a performance perspective though, it will mean that your application will regularly experience long pauses as bunch of requests go
-to access hot keys that aren't in the cache anymore. This can happen either because you've set up your cache so that by the time KV sees
-a request it's outside the cacheTtl window or KV's cache has decided to evict your key because it wasn't getting any traffic.
+values read lively when there's a write. As a cost-cutting measure, some customers choose to explore putting the Cache API in front
+of the `.get` call so that most requests hit the cache and when the cache expires you double-check with KV.
 
-[BetterKV](https://flareutils.pages.dev/betterkv/) is a popular choice for many of our customers. Other customers choose to handroll. We
-haven't seen any customers who fix the pauses / stampeding herd problem and this is an attempt to provide a suggestion on how to solve that.
+At first glance this doesn't sound too bad and price conscious customers look to this option (we hear from customers that like to use [BetterKV](https://flareutils.pages.dev/betterkv/)
+while others handroll). From a performance perspective though, it's surprising at first glance, but it will mean that your application
+will regularly experience a stampeding herd of requests that experience cold KV reads. Why is this?
+
+The first problem is if you've tuned the expiry within your fronting Cache instance to be the cacheTtl you use for KV (this is what BetterKV
+does). The problem with this is that when you miss your local cache, you'll also miss KV and get a cold read. If you're doing 10k RPS to that
+key, then you'll be missing 10k RPS for the duration it takes to refetch. As described, Workers KV very carefully and intentionally avoids
+this problem by refreshing in the background proactively so that the expiry is always in the future.
+
+Solving this can help a lot, but it's not the entire story. Even if your cache.match is set to expire before your cacheTtl, you'll still have
+another problem. Since KV isn't seeing accesses to that key, Cache will treat KV's cache as cold & prioritize evicting it. In such a scenario,
+even though we're within the cacheTtl, from KV's perspective it sees a sudden stampeding herd of requests that aren't satisfied by your cache.match
+but for which it doesn't have a cache anymore because it was evicted.
 
 * Don't put cache in front of KV.
   * Pros: the system behaves optimally from a performance perspective.
@@ -58,11 +66,10 @@ the cacheTTL. For example, cache for 30s, probabilistically
 
 Recommendation: avoid putting a cache in front of KV & talk to our support staff about pricing. Putting a cache in front of KV also means
 that you can't partake in [noticing writes more quickly](#noticing-updated-values-within-seconds). If you want to go down this road anyway,
-ideally align the cacheTTL and your custom cache such that you refetch from KV before it gets expired from KV's cache. A quick and dirty
-solution may be to sample instead as getting all that logic to work correctly can be tricky. As always, make sure you have good observability
-that can highlight this problem for you so you can see it happening and know what to do about it. The [Workers Analytics Engine](/analytics/analytics-engine)
-is a convenient way to instrument your code to see when you're making KV requests and then watching out for sudden transient spikes for
-requests into KV.
+you'll probably need to apply a mix of doing probabalistic refresh where you direct some percentage of cache hit requests into KV. Note
+however that probabilistic approaches mean that you need to measure and tune because you will have a long-tail that will be missed since
+most customers have a small number of very hot keys that will absorb the probability and a longer tail of keys that won't be refreshed
+probabilistically and thus suffer cold KV fetch. The aggregate RPS of the long tail of keys can easily rival your hottest key RPS.
 
 {{<Aside type="note" header="Timing considerations">}}
 `Date` comparisons can be used to get an [approximation](https://blog.cloudflare.com/mitigating-spectre-and-other-security-threats-the-cloudflare-workers-security-model/#step1disallowtimersandmultithreading)

From 89a6672ac4c938d132619b3466566253762cb1cb Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 14 Jul 2023 14:52:18 -0700
Subject: [PATCH 06/12] Apply feedback / cleanup wording more.

---
 content/workers/learning/advanced-kv-guide.md | 114 ++++++++++++------
 1 file changed, 77 insertions(+), 37 deletions(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index be99d315d52817..03cb91dcf84f68 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -4,10 +4,22 @@ title: Advanced Workers KV Topics
 weight: 7
 ---
 
-# Background
+# Best Practices for Workers KV
 
-To get the best possible performance out of your usage of Workers KV, this document contains some tips.
-As background, it's best to review [how KV works](/workers/runtime-apis/kv/#writing-data-in-bulk).
+This guide provides best practices for optimizing Workers KV latency as well as covering advanced tricks 
+that our customers sometimes employ for their problem domain, including:
+
+* Reducing TTFB latency through the [`cacheTtl`](/workers/runtime-apis/kv/#cache-ttl) parameter without sacrificing
+consistency latency.
+* Avoiding the use of redundant caching layers
+* Using Workers KV's [bindings API]() instead of the administrative REST API for user-facing workloads.
+* Ensuring correctness when you have concurrent writes to manage
+* How to get early access to get sub 1 minute consistency latency.
+* Guidance on subtleties that crop up that make it hard to synthetically test Workers KV performance as a proxy to
+predict production performance.
+* Improving your observability of KV performance.
+
+As background, it's best to review [how KV works](/workers/learning/how-kv-works.md) before reading this document.
 
 # Performance optimizations
 
@@ -15,22 +27,24 @@ As background, it's best to review [how KV works](/workers/runtime-apis/kv/#writ
 
 ### Embrace long cacheTtl
 
-To optimize the long-tail performance of infrequently accessed keys, specify a longer cacheTtl value (e.g. 1 day).
-Historically, a blocker for many customers was that this meant that your reads wouldn't see writes for the duration of the cacheTtl.
-However, as described in our [architecture blog post](https://blog.cloudflare.com/faster-workers-kv-architecture/), most customers today
-are using the new architecture where you will see updated values within a minute of the write, regardless of the cacheTtl value.
-
-{{<Aside type="note" header="Security considerations">}}
-Some customers of Workers KV store authorization tokens. Often time such applications rely on having a strict guarantee on revocation.
-For example, if your service SLA is that a revoked token must be globally revoked within 5 minutes of revocation, your cacheTtl should
-not be longer than 5 minutes. While the write will be noticed within a minute, writes are only noticed due to misses or reads triggering
-a background refresh. If your key is accessed once every 4 minutes and you set a cacheTTL of 10 minutes, it's possible that you will
-exceed your SLA by a few minutes.
-{{</Aside>}}
+*TLDR*: Set a long cacheTtl (e.g. 86400 to represent 1 day).
 
-{{<Aside type="note" header="Availability of long cache TTL noticing writes quickly">}}
-Certain namespaces part of early closed betas and larger ENT customers are currently excluded. If you want to use the new architecture
-but think you may not be enabled, please contact support.
+When reading a value, KV let's you customize the [`cacheTtl`](https://developers.cloudflare.com/workers/runtime-apis/kv/#cache-ttl) parameter.
+Since Cloudflare is a security-first company and KV is sometimes used to store things like authentication tokens, the default `cacheTtl` value
+if not specified is 1 minute. That way if you use KV for cryptographic material, any changes / revocations are noticed globally in a timely fashion.
+When KV encounters a key beyond it's `cacheTTL`, this is treated as a miss and requires traversal to the central data center containing the most
+recently written value.
+
+Most Workers KV requests however are not for security-sensitive keys. To optimize the long-tail performance of infrequently accessed keys, specify a longer
+`cacheTtl` value (e.g. 86400 to request the entire day). Historically, a blocker for many customers was that this meant that your reads wouldn't see writes
+for the duration of the cacheTtl. However, as described in our [architecture blog post](https://blog.cloudflare.com/faster-workers-kv-architecture/),
+most customers today are using the new architecture - you will see updated values within ~1 minute of the write, regardless of the `cacheTtl` value. See also the
+[section](#noticing-updated-values-within-seconds) below about how to explore getting sub minute global consistency.
+
+{{<Aside type="note" header="Decoupled cacheTTL and visible writes availability">}}
+A small fixed list of customers which are comprised of early closed betas and our largest ENT customers are currently excluded as we scale up the system. A longer
+`cacheTtl` will result in it taking that duration for a `get` to report the least value that was written within the interim. If you want to use the
+new architecture but think you may not be enabled, please contact support. This is a transparent optimization.
 {{</Aside>}}
 
 ### Avoid hand-rolling Cache in front of Workers KV
@@ -51,36 +65,46 @@ this problem by refreshing in the background proactively so that the expiry is a
 Solving this can help a lot, but it's not the entire story. Even if your cache.match is set to expire before your cacheTtl, you'll still have
 another problem. Since KV isn't seeing accesses to that key, Cache will treat KV's cache as cold & prioritize evicting it. In such a scenario,
 even though we're within the cacheTtl, from KV's perspective it sees a sudden stampeding herd of requests that aren't satisfied by your cache.match
-but for which it doesn't have a cache anymore because it was evicted.
+but for which it doesn't have a cache anymore because it was evicted. That being said, the most common cause is likely having a very similar
+cacheTtl in your extra caching layer and KV.
+
+The recommendations to solve this problem in the order Cloudflare recommends applying them:
 
 * Don't put cache in front of KV.
-  * Pros: the system behaves optimally from a performance perspective.
-  * Cons: Today you get charged for cache reads.
-* Direct some subset of requests satisfied by the cache to KV anyway.
-  * Pros: You will solve the stampeding herd problem for your hottest keys at minimal cost.
-  * Cons: Stampeding herd problem will still be present for cooler keys.
-* Make sure you carefully tune your cache duration and the cacheTTL you use out of KV so that your cache duration is sufficiently less than
-the cacheTTL. For example, cache for 30s, probabilistically
-  * Pros: all keys will mostly avoid the stampeding herd problem
-  * Cons: More complicated to implement correctly.
-
-Recommendation: avoid putting a cache in front of KV & talk to our support staff about pricing. Putting a cache in front of KV also means
-that you can't partake in [noticing writes more quickly](#noticing-updated-values-within-seconds). If you want to go down this road anyway,
-you'll probably need to apply a mix of doing probabalistic refresh where you direct some percentage of cache hit requests into KV. Note
-however that probabilistic approaches mean that you need to measure and tune because you will have a long-tail that will be missed since
-most customers have a small number of very hot keys that will absorb the probability and a longer tail of keys that won't be refreshed
-probabilistically and thus suffer cold KV fetch. The aggregate RPS of the long tail of keys can easily rival your hottest key RPS.
+  * Pros: the system behaves optimally from a performance perspective. You can also leverage [notice writes more quickly than 1 minute](#noticing-updated-values-within-seconds).
+  * Cons: Today you get charged for cache reads. Larger customers should work with support and longer-term we hope to adjust our pricing to obviate the pricing differential.
+* Set a longer `cacheTtl`
+  * Pros: Improved latency across the board.
+  * Cons: You may see prolonged stale reads by 1 extra KV cycle because the extra cache layer isn't letting KV know to refresh the asset (e.g if your extra cache layer caches
+  for 1 minute, it will take you 2 minutes to see a write (a longer `cacheTtl` without an extra cache layer in front of KV doesn't have this problem).
+* Probabilistically direct some subset of cache hits (e.g. 1%) to KV anyway in a `waitUntil`. **NOTE**: The `waitUntil` is important because KV will abort work if you return a response
+  before KV does its work and no refresh will take place.
+  * Pros: KV will see more representative usage patterns and thus ensure that the most recent value is always in the cache.
+  * Cons: You need to manually fine tune the probability and you may not have sufficient observability to see problems. Consider following the steps in [improving observability](#improving-observability) to
+  make sure you can see this problem.
+* Shorten the TTL within your extra caching layer
+  * Pros: All keys will mostly avoid the stampeding herd of cold KV accesses
+  * Cons: Not as effective at improving performance as increasing the cacheTtl. If the key is evicted from KV's cache due to insufficient usage, you will still suffer a stampeding
+  herd of slow requests.
+
+{{<Aside type="note" header="What does cardinality and distribution mean?">}}
+[Cardinality](https://en.wikipedia.org/wiki/Cardinality) is a mathemtical concept. Within the context of Workers KV, it means "how many distinct keys
+is my application accessing". The distribution in this context means how does the RPS of each key relate to one another. A [uniform distribution](https://en.wikipedia.org/wiki/Discrete_uniform_distribution)
+would be one where every Workers KV key being accessed by your application has similar RPS values. Most applications in practice experience some kind of [exponential distribution](https://en.wikipedia.org/wiki/Exponential_family).
+{{</Aside>}}
 
 {{<Aside type="note" header="Timing considerations">}}
 `Date` comparisons can be used to get an [approximation](https://blog.cloudflare.com/mitigating-spectre-and-other-security-threats-the-cloudflare-workers-security-model/#step1disallowtimersandmultithreading)
-of how long an I/O operation took, but it's very coarse and likely too inaccurate for measuring KV performance.
+of how long an I/O operation took, but it's very coarse and likely too inaccurate for measuring KV performance. As such, be very wary of metrics you collect within a Worker of how your code is performing.
 {{</Aside>}}
 
+**TLDR**: Adding an extra caching layer in front of KV that has good performance is surprisingly tricky and brittle. The best practice is to let KV manage this complex topic correctly
+for you. Insisting
 
 ### Reducing cardinality by coalescing keys
 
 If you have a set of related key-value pairs that have a mixed usage pattern (some hot keys and some cold keys), consider
-coalescing them.
+coalescing so that you have fewer overall keys. Some approaches to accomplishing this are described below.
 
 #### Merging into a "super" KV entry
 One coalescing technique is to make all the keys and values part of a super key/value object. For example, something like this:
@@ -122,6 +146,8 @@ lets you extend the duration of a single key lookup at the data center closest t
 
 If you have small values that fit within the [metadata limit](/workers/platform/limits/#kv-limits), you can store the value within the metadata instead.
 This makes the value accessible during the list, avoiding the need to do a second I/O round-trip while iterating in case a lookup ends up missing the local cache.
+This isn't necessarily suitable for all problem domains obviously as it requires that values fit within the limit and that the set of keys you are trying to read
+are guaranteed to be lexicographically next to each other.
 
 {{<Aside type="note" header="List performance note">}}
 See above about the implications of cache duration and list operations.
@@ -180,3 +206,17 @@ to capture in practice because of the global nature of Cloudflare Workers.
 
 In essence, Cloudflare's infrastructure gets faster the more traffic you put on them, and synthetic tests often cannot generate
 enough load to simulate that properly.
+
+# Improving observability
+
+We are working on providing customers with deeper insights into how KV performs so that you don't have to write any code. Addiitionally,
+you will gain insights into your performance that might otherwise be difficult or impossible for you to capture from outside KV. In the meantime,
+we've added a `cacheStatus` field to the response object returned from `list` and `getWithMetadata`. The values defined are as follows:
+
+* `MISS`: The current data center doesn't have this value. The value will be retrieved through upper tiers or from the central data store.
+* `HIT`: The current data center serviced this value.
+* `REVALIDATE`: A `HIT` and Workers KV took this as an opportunity to trigger a background refresh of the value.
+* `STALE`: A `HIT` where Workers KV noticed it's deep within the default 1 minute refresh interval for the asset.
+
+You can then leverage [Workers Analytics Engine](/analytics/analytics-engine/) to record this information and build basic visualizations
+to measure your cache performance.

From 548dfdaf5ae8606fb0a8e084dc74f7a75ee236ac Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 14 Jul 2023 15:15:33 -0700
Subject: [PATCH 07/12] Cleanups

---
 content/workers/learning/advanced-kv-guide.md | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index 03cb91dcf84f68..7e543c4ee8cee4 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -11,9 +11,9 @@ that our customers sometimes employ for their problem domain, including:
 
 * Reducing TTFB latency through the [`cacheTtl`](/workers/runtime-apis/kv/#cache-ttl) parameter without sacrificing
 consistency latency.
-* Avoiding the use of redundant caching layers
-* Using Workers KV's [bindings API]() instead of the administrative REST API for user-facing workloads.
-* Ensuring correctness when you have concurrent writes to manage
+* Avoiding the use of redundant caching layers.
+* Using Workers KV's [bindings API](workers/runtime-apis/kv/) instead of the administrative REST API for user-facing workloads.
+* Ensuring correctness when you have concurrent writes to manage.
 * How to get early access to get sub 1 minute consistency latency.
 * Guidance on subtleties that crop up that make it hard to synthetically test Workers KV performance as a proxy to
 predict production performance.
@@ -104,7 +104,10 @@ for you. Insisting
 ### Reducing cardinality by coalescing keys
 
 If you have a set of related key-value pairs that have a mixed usage pattern (some hot keys and some cold keys), consider
-coalescing so that you have fewer overall keys. Some approaches to accomplishing this are described below.
+coalescing the need to fetch them somehow so that a single cached fetch retrieves all the values even if you only need one
+of the values. The reason this helps is that long tail retrieval is that the cooler keys share access patterns with the hotter
+keys and are thus more likely to be present in the cache. Some approaches to accomplishing this are described below.
+
 
 #### Merging into a "super" KV entry
 One coalescing technique is to make all the keys and values part of a super key/value object. For example, something like this:
@@ -142,6 +145,10 @@ at upper tiers. By comparison, get operations are cached at the upper tiers for
 lets you extend the duration of a single key lookup at the data center closest to the request.
 {{</Aside>}}
 
+Since list operations are not "write aware" as described above, they are only ever cached for 1 minute. They are still subject to [tiered caching](https://blog.cloudflare.com/faster-workers-kv-architecture#a-new-horizontally-scaled-tiered-cache) as described in
+our blog post, so requests within the region and globally are amortized to keep the asset closer to your request. However, you still need to be reading the value about once
+every 30s to make sure it's always present within Cloudflare's caches.
+
 ## Batch reading multiple keys
 
 If you have small values that fit within the [metadata limit](/workers/platform/limits/#kv-limits), you can store the value within the metadata instead.
@@ -168,7 +175,7 @@ writers, there's no guarantee about a winner. This is even more problematic if y
 partial update of a value. A lot of customers have success creating a [Durable Object](/workers/learning/using-durable-objects)
 and making it responsible for all the writes to your KV namespace. This way, you can serialize access for writing the value.
 
-**Caution**: Workers KV is an eventually consistent system. If you try to do a read/modify/write operation where the read is
+*Caution**: Workers KV is an eventually consistent system. If you try to do a read/modify/write operation where the read is
 coming from KV, you can cause modifications to be lost because there's no guarantee that you will always read the most recent
 value written, even if the write is from the same data center. Additionally, where a Durable Object is running moves around
 outside of your control.

From 3da599a3d2f921cb3408ad376786aaab3e66e406 Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 14 Jul 2023 15:45:42 -0700
Subject: [PATCH 08/12] Update how-kv-works intro

---
 content/workers/learning/how-kv-works.md | 56 +++++++++++++++++++-----
 1 file changed, 45 insertions(+), 11 deletions(-)

diff --git a/content/workers/learning/how-kv-works.md b/content/workers/learning/how-kv-works.md
index 4d505d4042b948..a39af49c844bc9 100644
--- a/content/workers/learning/how-kv-works.md
+++ b/content/workers/learning/how-kv-works.md
@@ -6,34 +6,68 @@ weight: 7
 
 # How KV works
 
-Workers KV is a global, low-latency, key-value data store. It stores data in a small number of centralized data centers, then caches that data in Cloudflare's data centers after access. KV supports exceptionally high read volumes with low latency, making it possible to build highly dynamic APIs and websites that respond as quickly as a cached static file would. While reads are periodically revalidated in the background, requests which are not in cache and need to hit the centralized back end can see high latencies.
+Workers KV is a global, low-latency, key-value data store. It stores data in a small number of centralized data centers,
+then caches that data in Cloudflare's data centers after access.  KV supports exceptionally high read volumes with low
+latency, making it possible to build highly dynamic APIs and websites that respond as quickly as a cached static file
+would. While reads are periodically revalidated in the background, requests which are not in cache and need to hit the
+centralized back end can see high latencies.
+
+Workers KV is free to try, with additional usage available as part of the Workers Bundled plan.
+
+Learn more at the [Workers KV API reference](/workers/runtime-apis/kv/) and take full advantage of [Advanced Workers KV Topics guide](/workers/learning/advanced-kv-guide)
+to tune and design your application to take best advantage of the available features.
 
 ## Write data to KV and read data from KV
 
-When you write to KV, your data is written to central data stores. It is not sent automatically to every location’s cache.
+When you write to KV, your data is written to central data stores. It is not sent automatically to every location’s
+cache, but regional tiers are notified within seconds to do a purge of that key.
 
 ![Your data is written to central data stores when you write to KV.](/images/workers/kv-write.svg)
 
-Initial reads from a location do not have a cached value. The data must be read from the nearest central data store, resulting in a slower response.
+Initial reads from a location do not have a cached value. The data must be read from the nearest regional tier,
+followed by a central tier, degrading finally to the central store for a truly cold global read. So while the very
+first access is slow globally, subsequent requests are faster, especially if they're concentrated in a single region.
 
 ![Initial reads will miss the cache and go to the nearest central data store first.](/images/workers/kv-slow-read.svg)
 
-Frequent reads from the same location return the cached value without reading from a central data store, resulting in faster response times.
+Frequent reads from the same location return the cached value without reading from anywhere else, resulting in the
+fastest response times. Additionally, Workers KV operates diligently to keep the latest value in the cache by
+opportunistically refreshing from upper tiers and the central data stores in the background. This is done carefully
+so that assets that are being accessed continue to be kept served from the cache without any stalls.
 
 ![As mentioned above, frequent reads will return a cached value.](/images/workers/kv-fast-read.svg)
 
-Because Workers KV stores data centrally and uses pull-based replication to store data in cache, it is generally good for use cases where you need to write relatively infrequently, but read quickly and frequently. It is optimized for these high-read applications, only reaching its full performance when data is being frequently read. Infrequently read values are pulled from a central store, while more popular values are cached in the data centers they are requested from.
+Because Workers KV stores data centrally and uses a hybrid push/pull-based replication to store data in cache, it is
+generally good for use cases where you need to write relatively infrequently, but read quickly and frequently.
+It is optimized for these high-read applications, only reaching its full performance when data is being frequently read.
+Infrequently read values are pulled from other data centers or the central store, while more popular values
+are cached in the data centers they are requested from.
+
+## Performance
+
+A little bit of tuning of your usage of Workers KV can result in significant performance gains. The single most
+impactful way to improve performance is to increase the [`cacheTtl`](workers/learning/advanced-kv-guide#embrace long-cachettl)
+parameter up from it's default 60s. This and other techniques are described in detail in the [Advanced Workers KV Topics guide](/workers/learning/advanced-kv-guide)
 
 ## Consistency
 
-KV achieves this performance by being eventually-consistent. Changes are usually immediately visible in the Cloudflare global network location at which they are made but may take up to 60 seconds or more to be visible in other global network locations as their cached versions of the data time out. In particular, visibility of changes takes longer in locations which have recently read a previous version of a given key (including reads that indicated the key did not exist, which are also cached locally). Workers KV is not ideal for situations where you need support for atomic operations or where values must be read and written in a single transaction.
+KV achieves this performance by through caching which makes reads eventually-consistent with writes. Changes are usually
+immediately visible in the Cloudflare global network location at which they are made, but may take up to 60 seconds or
+more to be visible in other global network locations as their cached versions of the data time out or for them to see
+reads to trigger a refresh. Negative lookups indicating that the key doesn't exist are also cached, so the same delay
+exists noticing a value is created as when a value is changed.
 
-If you need stronger consistency guarantees, consider using [Durable Objects](/workers/learning/using-durable-objects/). One pattern is to send all of your writes for a given KV key through a corresponding instance of a Durable Object, and then read that value from KV in other Workers. This is useful if you need more control over writes, but are satisfied with KV's read characteristics described above.
+Workers KV is not currently ideal for situations where you need support for atomic operations or where values must be
+read and written in a single transaction.
 
-KV does not perform like an in-memory datastore, such as [Redis](https://redis.io). Accessing KV values, even when locally cached, has significantly more latency than reading a value from memory within a Worker script.
+If you need stronger consistency guarantees, consider using [Durable Objects](/workers/learning/using-durable-objects/).
+Alternatively, if you are happy with the read behavior but need finer-grained guarantees about the behavior of concurrent
+writes into KV, that is described in the [advanced topic on concurrent writes](/workers/learning/advanced-kv-guide#concurrent-writers-to-a-single key).
 
-All values are encrypted at rest with 256-bit AES-GCM, and only decrypted by the process executing your Worker scripts or responding to your API requests.
+We also are working on making changes possible to visible [within seconds](/workers/learning/advanced-kv-guide#noticing-updated-values-within-seconds)
+and hope to eventually make this self-serve.
 
-Workers KV is free to try, with additional usage available as part of the Workers Bundled plan.
+KV does not perform like an in-memory datastore, such as [Redis](https://redis.io). Accessing KV values, even when locally cached, has significantly more latency than reading a value from memory within a Worker script.
 
-Learn more at the [Workers KV API reference](/workers/runtime-apis/kv/).
+## Security
+All values are encrypted at rest with 256-bit AES-GCM, and only decrypted by the process executing your Worker scripts or responding to your API requests.

From f8d4ef78c6dbb833319d8dc6ecfb1834a927b3ba Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 14 Jul 2023 15:50:29 -0700
Subject: [PATCH 09/12] Fix build

---
 content/workers/learning/advanced-kv-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index 7e543c4ee8cee4..3ee5ebd4843c35 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -12,7 +12,7 @@ that our customers sometimes employ for their problem domain, including:
 * Reducing TTFB latency through the [`cacheTtl`](/workers/runtime-apis/kv/#cache-ttl) parameter without sacrificing
 consistency latency.
 * Avoiding the use of redundant caching layers.
-* Using Workers KV's [bindings API](workers/runtime-apis/kv/) instead of the administrative REST API for user-facing workloads.
+* Using Workers KV's [bindings API](/workers/runtime-apis/kv/) instead of the administrative REST API for user-facing workloads.
 * Ensuring correctness when you have concurrent writes to manage.
 * How to get early access to get sub 1 minute consistency latency.
 * Guidance on subtleties that crop up that make it hard to synthetically test Workers KV performance as a proxy to

From eac69d61ab035a7a034123d4deee18fa8158e44f Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 28 Jul 2023 09:15:44 -0700
Subject: [PATCH 10/12] Remove consistency note as requested

---
 content/workers/learning/advanced-kv-guide.md | 103 +++++++++---------
 content/workers/learning/how-kv-works.md      |   6 +-
 2 files changed, 51 insertions(+), 58 deletions(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index 3ee5ebd4843c35..58aa793ef932dc 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -6,20 +6,19 @@ weight: 7
 
 # Best Practices for Workers KV
 
-This guide provides best practices for optimizing Workers KV latency as well as covering advanced tricks 
+This guide provides best practices for optimizing Workers KV latency as well as covering advanced tricks
 that our customers sometimes employ for their problem domain, including:
 
-* Reducing TTFB latency through the [`cacheTtl`](/workers/runtime-apis/kv/#cache-ttl) parameter without sacrificing
-consistency latency.
-* Avoiding the use of redundant caching layers.
-* Using Workers KV's [bindings API](/workers/runtime-apis/kv/) instead of the administrative REST API for user-facing workloads.
-* Ensuring correctness when you have concurrent writes to manage.
-* How to get early access to get sub 1 minute consistency latency.
-* Guidance on subtleties that crop up that make it hard to synthetically test Workers KV performance as a proxy to
-predict production performance.
-* Improving your observability of KV performance.
+- Reducing TTFB latency through the [`cacheTtl`](/workers/runtime-apis/kv/#cache-ttl) parameter without sacrificing
+  consistency latency.
+- Avoiding the use of redundant caching layers.
+- Using Workers KV's [bindings API](/workers/runtime-apis/kv/) instead of the administrative REST API for user-facing workloads.
+- Ensuring correctness when you have concurrent writes to manage.
+- Guidance on subtleties that crop up that make it hard to synthetically test Workers KV performance as a proxy to
+  predict production performance.
+- Improving your observability of KV performance.
 
-As background, it's best to review [how KV works](/workers/learning/how-kv-works.md) before reading this document.
+As background, it's best to review [how KV works](/workers/learning/how-kv-works) before reading this document.
 
 # Performance optimizations
 
@@ -27,7 +26,7 @@ As background, it's best to review [how KV works](/workers/learning/how-kv-works
 
 ### Embrace long cacheTtl
 
-*TLDR*: Set a long cacheTtl (e.g. 86400 to represent 1 day).
+_TLDR_: Set a long cacheTtl (e.g. 86400 to represent 1 day).
 
 When reading a value, KV let's you customize the [`cacheTtl`](https://developers.cloudflare.com/workers/runtime-apis/kv/#cache-ttl) parameter.
 Since Cloudflare is a security-first company and KV is sometimes used to store things like authentication tokens, the default `cacheTtl` value
@@ -38,8 +37,7 @@ recently written value.
 Most Workers KV requests however are not for security-sensitive keys. To optimize the long-tail performance of infrequently accessed keys, specify a longer
 `cacheTtl` value (e.g. 86400 to request the entire day). Historically, a blocker for many customers was that this meant that your reads wouldn't see writes
 for the duration of the cacheTtl. However, as described in our [architecture blog post](https://blog.cloudflare.com/faster-workers-kv-architecture/),
-most customers today are using the new architecture - you will see updated values within ~1 minute of the write, regardless of the `cacheTtl` value. See also the
-[section](#noticing-updated-values-within-seconds) below about how to explore getting sub minute global consistency.
+most customers today are using the new architecture - you will see updated values within ~1 minute of the write, regardless of the `cacheTtl` value.
 
 {{<Aside type="note" header="Decoupled cacheTTL and visible writes availability">}}
 A small fixed list of customers which are comprised of early closed betas and our largest ENT customers are currently excluded as we scale up the system. A longer
@@ -70,22 +68,22 @@ cacheTtl in your extra caching layer and KV.
 
 The recommendations to solve this problem in the order Cloudflare recommends applying them:
 
-* Don't put cache in front of KV.
-  * Pros: the system behaves optimally from a performance perspective. You can also leverage [notice writes more quickly than 1 minute](#noticing-updated-values-within-seconds).
-  * Cons: Today you get charged for cache reads. Larger customers should work with support and longer-term we hope to adjust our pricing to obviate the pricing differential.
-* Set a longer `cacheTtl`
-  * Pros: Improved latency across the board.
-  * Cons: You may see prolonged stale reads by 1 extra KV cycle because the extra cache layer isn't letting KV know to refresh the asset (e.g if your extra cache layer caches
-  for 1 minute, it will take you 2 minutes to see a write (a longer `cacheTtl` without an extra cache layer in front of KV doesn't have this problem).
-* Probabilistically direct some subset of cache hits (e.g. 1%) to KV anyway in a `waitUntil`. **NOTE**: The `waitUntil` is important because KV will abort work if you return a response
+- Don't put cache in front of KV.
+  - Pros: the system behaves optimally from a performance perspective.
+  - Cons: Today you get charged for cache reads. Larger customers should work with support and longer-term we hope to adjust our pricing to obviate the pricing differential.
+- Set a longer `cacheTtl`
+  - Pros: Improved latency across the board.
+  - Cons: You may see prolonged stale reads by 1 extra KV cycle because the extra cache layer isn't letting KV know to refresh the asset (e.g if your extra cache layer caches
+    for 1 minute, it will take you 2 minutes to see a write (a longer `cacheTtl` without an extra cache layer in front of KV doesn't have this problem).
+- Probabilistically direct some subset of cache hits (e.g. 1%) to KV anyway in a `waitUntil`. **NOTE**: The `waitUntil` is important because KV will abort work if you return a response
   before KV does its work and no refresh will take place.
-  * Pros: KV will see more representative usage patterns and thus ensure that the most recent value is always in the cache.
-  * Cons: You need to manually fine tune the probability and you may not have sufficient observability to see problems. Consider following the steps in [improving observability](#improving-observability) to
-  make sure you can see this problem.
-* Shorten the TTL within your extra caching layer
-  * Pros: All keys will mostly avoid the stampeding herd of cold KV accesses
-  * Cons: Not as effective at improving performance as increasing the cacheTtl. If the key is evicted from KV's cache due to insufficient usage, you will still suffer a stampeding
-  herd of slow requests.
+  - Pros: KV will see more representative usage patterns and thus ensure that the most recent value is always in the cache.
+  - Cons: You need to manually fine tune the probability and you may not have sufficient observability to see problems. Consider following the steps in [improving observability](#improving-observability) to
+    make sure you can see this problem.
+- Shorten the TTL within your extra caching layer
+  - Pros: All keys will mostly avoid the stampeding herd of cold KV accesses
+  - Cons: Not as effective at improving performance as increasing the cacheTtl. If the key is evicted from KV's cache due to insufficient usage, you will still suffer a stampeding
+    herd of slow requests.
 
 {{<Aside type="note" header="What does cardinality and distribution mean?">}}
 [Cardinality](https://en.wikipedia.org/wiki/Cardinality) is a mathemtical concept. Within the context of Workers KV, it means "how many distinct keys
@@ -108,15 +106,18 @@ coalescing the need to fetch them somehow so that a single cached fetch retrieve
 of the values. The reason this helps is that long tail retrieval is that the cooler keys share access patterns with the hotter
 keys and are thus more likely to be present in the cache. Some approaches to accomplishing this are described below.
 
-
 #### Merging into a "super" KV entry
+
 One coalescing technique is to make all the keys and values part of a super key/value object. For example, something like this:
+
 ```
 key1: value1
 key2: value2
 key3: value3
 ```
+
 becomes
+
 ```
 coalesced: {
   key1: value1,
@@ -175,7 +176,7 @@ writers, there's no guarantee about a winner. This is even more problematic if y
 partial update of a value. A lot of customers have success creating a [Durable Object](/workers/learning/using-durable-objects)
 and making it responsible for all the writes to your KV namespace. This way, you can serialize access for writing the value.
 
-*Caution**: Workers KV is an eventually consistent system. If you try to do a read/modify/write operation where the read is
+\*Caution\*\*: Workers KV is an eventually consistent system. If you try to do a read/modify/write operation where the read is
 coming from KV, you can cause modifications to be lost because there's no guarantee that you will always read the most recent
 value written, even if the write is from the same data center. Additionally, where a Durable Object is running moves around
 outside of your control.
@@ -185,31 +186,25 @@ ground truth that you read/modify/write and then write the updated value to KV t
 R2 with conditional upload). That way your value is updated in a strongly consistent fashion and once that happens, you publish
 it to KV for reading.
 
-# Noticing updated values within seconds
-
-Currently, reads have a "refreshTtl" of 1 minute. This means that a write is noticed within 1 minute of a read being issued.
-While we aren't yet ready to let customers customize the refreshTtl themselves within the Runtime API, if this is important
-to your use-case, please contact support to change the default for your namespace and we can work with you.
-
 # Benchmarking Workers KV
 
 Benchmarking to predict what your Workers KV performance will look like in production is tricky and nuanced. It's best to try
 to put production load onto the system and then measure real-world performance rather than trying to do a synthetic test.
 Examples of issues that can trip up even internal engineers who know all the technical details:
 
-* You don't have [permission](https://blog.cloudflare.com/mitigating-spectre-and-other-security-threats-the-cloudflare-workers-security-model/#step1disallowtimersandmultithreading)
-within the Runtime to get accurate timing measurements. That means you have to know to time externally to the system. At the same time,
-external timings are subject to sources of error that have nothing to do with Workers KV performance, particularly as described below.
-* A low traffic Worker is more subject to cold starts even though in practice cold starts don't exist once production traffic is flowing.
-* Within something we call "MCP"s, we have multiple virtual data centers within a single PoP. Which virtual data center you hit
-is random and today such data centers have disjoint caches and require even more traffic to keep the cache warm regardless of which
-virtual data center you randomly get routed to.
-* [wrk](https://github.com/wg/wrk) can typically generate substantial enough load from a single machine (thousands of requests
-per second) which should probably be enough to representative and overcome such issues, but it requires careful tuning of
-parameters to achieve max throughput and you have little to no visibility into Cloudflare's internal network to know if you succeeded.
-* Synthetic tests are typically hand-written and often fail to reproduce real-world access patterns for keys (if you have multiple keys).
-If you have a recording you can play through of the access patterns, that might work well. A representative recording is difficult
-to capture in practice because of the global nature of Cloudflare Workers.
+- You don't have [permission](https://blog.cloudflare.com/mitigating-spectre-and-other-security-threats-the-cloudflare-workers-security-model/#step1disallowtimersandmultithreading)
+  within the Runtime to get accurate timing measurements. That means you have to know to time externally to the system. At the same time,
+  external timings are subject to sources of error that have nothing to do with Workers KV performance, particularly as described below.
+- A low traffic Worker is more subject to cold starts even though in practice cold starts don't exist once production traffic is flowing.
+- Within something we call "MCP"s, we have multiple virtual data centers within a single PoP. Which virtual data center you hit
+  is random and today such data centers have disjoint caches and require even more traffic to keep the cache warm regardless of which
+  virtual data center you randomly get routed to.
+- [wrk](https://github.com/wg/wrk) can typically generate substantial enough load from a single machine (thousands of requests
+  per second) which should probably be enough to representative and overcome such issues, but it requires careful tuning of
+  parameters to achieve max throughput and you have little to no visibility into Cloudflare's internal network to know if you succeeded.
+- Synthetic tests are typically hand-written and often fail to reproduce real-world access patterns for keys (if you have multiple keys).
+  If you have a recording you can play through of the access patterns, that might work well. A representative recording is difficult
+  to capture in practice because of the global nature of Cloudflare Workers.
 
 In essence, Cloudflare's infrastructure gets faster the more traffic you put on them, and synthetic tests often cannot generate
 enough load to simulate that properly.
@@ -220,10 +215,10 @@ We are working on providing customers with deeper insights into how KV performs
 you will gain insights into your performance that might otherwise be difficult or impossible for you to capture from outside KV. In the meantime,
 we've added a `cacheStatus` field to the response object returned from `list` and `getWithMetadata`. The values defined are as follows:
 
-* `MISS`: The current data center doesn't have this value. The value will be retrieved through upper tiers or from the central data store.
-* `HIT`: The current data center serviced this value.
-* `REVALIDATE`: A `HIT` and Workers KV took this as an opportunity to trigger a background refresh of the value.
-* `STALE`: A `HIT` where Workers KV noticed it's deep within the default 1 minute refresh interval for the asset.
+- `MISS`: The current data center doesn't have this value. The value will be retrieved through upper tiers or from the central data store.
+- `HIT`: The current data center serviced this value.
+- `REVALIDATE`: A `HIT` and Workers KV took this as an opportunity to trigger a background refresh of the value.
+- `STALE`: A `HIT` where Workers KV noticed it's deep within the default 1 minute refresh interval for the asset.
 
 You can then leverage [Workers Analytics Engine](/analytics/analytics-engine/) to record this information and build basic visualizations
 to measure your cache performance.
diff --git a/content/workers/learning/how-kv-works.md b/content/workers/learning/how-kv-works.md
index f129ac69bc4f1a..778615d3f19a5b 100644
--- a/content/workers/learning/how-kv-works.md
+++ b/content/workers/learning/how-kv-works.md
@@ -6,7 +6,7 @@ title: How KV works
 # How KV works
 
 Workers KV is a global, low-latency, key-value data store. It stores data in a small number of centralized data centers,
-then caches that data in Cloudflare's data centers after access.  KV supports exceptionally high read volumes with low
+then caches that data in Cloudflare's data centers after access. KV supports exceptionally high read volumes with low
 latency, making it possible to build highly dynamic APIs and websites that respond as quickly as a cached static file
 would. While reads are periodically revalidated in the background, requests which are not in cache and need to hit the
 centralized back end can see high latencies.
@@ -63,10 +63,8 @@ If you need stronger consistency guarantees, consider using [Durable Objects](/w
 Alternatively, if you are happy with the read behavior but need finer-grained guarantees about the behavior of concurrent
 writes into KV, that is described in the [advanced topic on concurrent writes](/workers/learning/advanced-kv-guide#concurrent-writers-to-a-single key).
 
-We also are working on making changes possible to visible [within seconds](/workers/learning/advanced-kv-guide#noticing-updated-values-within-seconds)
-and hope to eventually make this self-serve.
-
 KV does not perform like an in-memory datastore, such as [Redis](https://redis.io). Accessing KV values, even when locally cached, has significantly more latency than reading a value from memory within a Worker script.
 
 ## Security
+
 All values are encrypted at rest with 256-bit AES-GCM, and only decrypted by the process executing your Worker scripts or responding to your API requests.

From 6df43607d5316c78565993cb2c56db0dda05d9f6 Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 28 Jul 2023 09:23:12 -0700
Subject: [PATCH 11/12] language police

---
 content/workers/learning/advanced-kv-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index 58aa793ef932dc..d69b70b878faa8 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -132,7 +132,7 @@ This works best if you don't think you'll need to update the values independentl
 careful about how you synchronize.
 
 **Pros**: Infrequently accessed keys are kept in the cache.
-**Cons**: Size of the resultant value can easily push your worker out of it's memory limits. Safely updating the value requires a [locking mechanism](#concurrent-writers) of some kind.
+**Cons**: Size of the resultant value can push your worker out of it's memory limits. Safely updating the value requires a [locking mechanism](#concurrent-writers) of some kind.
 
 #### Storing in metadata and shared prefix
 
@@ -154,7 +154,7 @@ every 30s to make sure it's always present within Cloudflare's caches.
 
 If you have small values that fit within the [metadata limit](/workers/platform/limits/#kv-limits), you can store the value within the metadata instead.
 This makes the value accessible during the list, avoiding the need to do a second I/O round-trip while iterating in case a lookup ends up missing the local cache.
-This isn't necessarily suitable for all problem domains obviously as it requires that values fit within the limit and that the set of keys you are trying to read
+This isn't necessarily suitable for all problem domains as it requires that values fit within the limit and that the set of keys you are trying to read
 are guaranteed to be lexicographically next to each other.
 
 {{<Aside type="note" header="List performance note">}}

From 4bb761c2e610834167a6d9d0000a608e40a1bf7f Mon Sep 17 00:00:00 2001
From: Vitali Lovich <vlovich@cloudflare.com>
Date: Fri, 28 Jul 2023 10:52:49 -0700
Subject: [PATCH 12/12] replace PoP with data center

---
 content/workers/learning/advanced-kv-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/workers/learning/advanced-kv-guide.md b/content/workers/learning/advanced-kv-guide.md
index d69b70b878faa8..612e4505b14b9a 100644
--- a/content/workers/learning/advanced-kv-guide.md
+++ b/content/workers/learning/advanced-kv-guide.md
@@ -196,7 +196,7 @@ Examples of issues that can trip up even internal engineers who know all the tec
   within the Runtime to get accurate timing measurements. That means you have to know to time externally to the system. At the same time,
   external timings are subject to sources of error that have nothing to do with Workers KV performance, particularly as described below.
 - A low traffic Worker is more subject to cold starts even though in practice cold starts don't exist once production traffic is flowing.
-- Within something we call "MCP"s, we have multiple virtual data centers within a single PoP. Which virtual data center you hit
+- Within something we call "MCP"s, we have multiple virtual data centers within a single data center. Which virtual data center you hit
   is random and today such data centers have disjoint caches and require even more traffic to keep the cache warm regardless of which
   virtual data center you randomly get routed to.
 - [wrk](https://github.com/wg/wrk) can typically generate substantial enough load from a single machine (thousands of requests