⚠ Breaking change proposal: removing `CacheKeyPrefix`, opinions needed #33

jodydonetti · 2021-11-21T18:21:52Z

Hi there FusionCache users, I need some help from you all.

Recently, while designing some new features and working on the backplane (#11) I realized that, in terms of design, the FusionCacheOptions.CacheKeyPrefix feature is probably not that good to have inside on FusionCache itself.

Let me explain.

Brief History

The idea to introduce the CacheKeyPrefix option came up because sometimes we want to use the same distributed cache instance (eg. Redis) for multiple different sets of data at the same time.

For example in one of my projects I'm setting a prefix based on the environment type, like "d:" for development, "s:" for staging and "p:" for production, so I can use the same Redis instance and save some bucks.

With this little "trick" I can have the same "logical" piece of data in the cache like the product with id 123 (which would normally have a cache key of "product/123") but for 2 different environments, without one inadvertently overwrite the other (because with the CacheKeyPrefix we will have 2 different cache keys: "d:product/123" and "p:product/123").

Because of this I initially thought that having such a small feature would be a nice addition, so I went on and implemented it.

From "the outside" you would be able to specify your cache keys, and work with every method specifying your "logical" cache keys (eg: "product/123") knowing that on "the inside" the little magic will happen (eg: turn it to "p:product/123").

Why remove it?

The thing is, cache key generation is NOT FusionCache's concern, but your own: it is outside the scope and responsibility of FusionCache, and it should be only yours.

The moment FusionCache modify a cache key it receives, it starts messing around with a critical piece of data (the cache key itself) and this may have unintended consequences.

For example right now each single method in FusionCache receives a cache key and, before doing anything else, modifies it by applying the specified CacheKeyPrefix (if any) and then moving on, forgetting about the original cache key altogether and working only on the modified one.

This means that everything that happens inside of FusionCache (and, in turn, in the memory and distributed cache) will be tied to the modified cache keys, including logging, events and whatnot.

A practical example

Right now I'm working on the Backplane (#11): suppose we have 3 nodes (N1, N2 and N3), suppose we use the "logical" cache key of "product/123" and a CacheKeyPrefix of "foo:".

Then this would happen:

I call Set("product/123", product) on N1
internally the cache key gets turned to "foo:product/123"
the data is saved in the memory cache on N1 with the key "foo:product/123"
the data is saved in the distributed cache with the key "foo:product/123"
an event is raised for a set operation for the key "foo:product/123"
the backplane sends an eviction notification for the key "foo:product/123"
the nodes N2 and N3 receive the notification for the key "foo:product/123", and try to evict the entries from their local cache for that key
to do so, a public FusionCache method is called (like Set, Remove, etc) and that, in turn, would re-apply the prefix, obtaining a cache key of "foo:foo:product/123" (see the double "foo:" there?)

This is a nice case of Leaky Abstraction we got, amirite?

Now, of course I may "simply" un-apply the prefix to the cache key before sending the notification (point 6) by doing some substring magic or something, or I may "keep" the original cache key + the modified one, but both of those would mean extra work at runtime (cpu/memory), it would smell really bad and, as said, it's not really a good thing design-wise.

So, what are the possible alternatives?

Alternatives

The fact is that every distributed cache out there (and every impl of IDistributedCache) is already natively able to specify different "sets" to be able to split the data in the same service instance, but in a more idiomatic and native way:

in the Redis world there's DefaultDatabase
in the SqlServer world there are SchemaName and TableName
in the CosmosDB world there are DatabaseName and ContainerName
in the MongoDB world (various impls) there are DatabaseName and CollectionName
etc...

Each of them is surely the best way to handle such a scenario in each specific service.

How it will be removed

I'd like to remove support for the feature altogether, since:

best I know basically nobody is using it (but please let me know about this!)
FusionCache is still in v0.X and per semantic versioning it's ok to make breaking changes freely, even though I'd like to point out that as of today I've only made 1 of them, and in the very early days after the very first release
there are alternatives to keep it working in the same way (see below)
it would mean extra work at runtime (cpu/memory)
it would smell really bad and, as said, it's not really a good thing design-wise

Now, to be more precise, I'd like to do it NOT by removing the CacheKeyPrefix prop itself, otherwise dependent code would not compile anymore and people wouldn't know why, but by decorating the prop as [Obsolete] with the error param set to true: this would make the dependent code not compile, BUT with the ability to specify a descriptive message to explain things and suggest how to handle the removal, maybe even with a link to an online page with a thorough explanation and some examples.

How to keep our current cache keys

If you are actually using this option (let me know!) and you really like your cache keys with the prefix they currently have, here's an easy way to keep them as they are.

Before

I can imagine that, probably in a Startup.cs file you will have something like this:

services.AddFusionCache(options =>
{
  [...]
  if (Env.IsProduction())
    options.CacheKeyPrefix = "prod:";
});

Later on you are using FusionCache like this:

cache.GetOrSet<Product>($"product/{id}", ...);

After

What you would have to do is declare some public static prop like this:

public static class MyCache {
  public static string? Prefix { get; set; }
}

change the startup code to something like this:

if (Env.IsProduction())
  MyCache.Prefix = "prod:";
services.AddFusionCache(options =>
{
  [...]
});

and finally when you later use FusionCache like this:

cache.GetOrSet<Product>($"{MyCache.Prefix}product/{id}", ...);

If instead you just like to keep data apart without using the prefix, simply set a Database, DatabaseName, CollectionName or ContainerName based on the technology you are using (Redis, MongoDB, SqlServer, etc...) like mentioned in the Alternatives chapter above.

Ok but is anybody actually using the `CacheKeyPrefix` option?

Right now I'm the only one I know of that is using this obscure little feature.

I've talked with about a dozen people who are using FusionCache, and none of them are using it.

But since the downloads for the main package alone just crossed 17K (thanks all 🎉) I'm here asking you all how you feel about it.

What can you do?

If you agree with this change, simply vote with a 👍.

If you disagree, please comment with the reasoning behind it so I can understand more about your use case.

Thanks to anyone who will participate!

The text was updated successfully, but these errors were encountered:

jodydonetti · 2021-11-24T09:52:34Z

🔔 Update (2021-11-23)

I've released v0.1.9 where the prop is now marked with the [Obsolete] attribute, but without the error flag set to true, so to start notifying FusionCache users about the change, without breaking the compilation.

Next step: mark it as error.

jodydonetti · 2021-12-01T00:44:54Z

Closing, since nobody complained about this.

jasenf · 2021-12-01T16:27:54Z

Hi there - we aren't using our library yet, have been waiting for a little more maturity and to have time for us to potentially swap out our current library. I missed this thread.

One of the major things we use is cache partitioning. This is typically done using cache prefixes. Any multi-tenant application that uses caching services is often required to delete entire groups of cache entries, but only relevant to a specific tenant.

If those cache entries are shared (i.e. table "users") then we end up having to invalidate all cache entries across all tenants. But if it is partitioned, we can isolate invalidation to a specific cache. Rather than prefixes using the implementation I am seeing above, it would be much easier if we just had some approach that either let us set the current active partition (_myCache.SetActivePartition(x); or better yet, just enhancements to the base CRUD actions to the cache that let us send the partition parameter.

As I mentioned this is very important for large scale multi-tenant SaaS products, so it may not be relevant to most of your users. But the ablity to set, get, delete entries from a partition would be wonderful.

jodydonetti · 2021-12-04T13:23:30Z

Hi @jasenf , if I got what you are saying right, what you are talking about is what is commonly known as "cache regions", which is a feature that some caching solutions do in fact have.

If that is the case, I agree that logically it would be a nice addition, and if I will end up adding such feature, I also agree that the best way would be to be explicit and add a high level "cache region" concept, most probably a string cacheRegion additional param in each CRUD method or something like that, like you suggested.

But there's a catch, (at least for me, at least for now).

I think it's important to always consider what the underlying infrastructure (the various caching providers, like Redis, Memcached, etc) can in fact handle, otherwise we may enter the "leaky abstractions" territory, where an API surface area may make promises that in reality cannot fulfill, at least not always.

In practice this means that either the underlying cache provider directly supports cache regions, or we would need to simulate those in other ways: the typical way to have cache regions is, like you said, via prefixes.

But let's take the most common one, Redis, as an example: in Redis you there's no way that I know of to "delete all entries via a prefix", so what you have to do instead is to:

make a query for all the keys matching a certain prefix
loop over each key and, for each key, execute a delete

This is the typical implementation you will encounter in basically every cache abstraction out there, and the problems with this approach are:

the first part (querying for the keys) does not scale
the second part it's a textbook case of the "SELECT N+1" problem which, again, does not scale

Both of these typically do not create problems locally, or in production with few values, but "for large scale multi-tenant SaaS products" like you mentioned this would start to break apart, maybe leaving "zombies" around (because the operation would timeout during the loop+delete part) or throwing exceptions whe querying for the keys, because there are too many of them.

Of course there can be cache implementations that do not have this problem: I can imagine the ones based on an RDBMS (MySql, SqlServer, etc) can simply execute something like:

DELETE * FROM Table WHERE Key LIKE 'myprefix%'

but this means that very few cache implementations would actually support cache regions, and it would then fragment the ecosystem.

All of this is, at least, my own personal experience with cache regions: I would like to know yours to see if I'm missing something.

Thanks!

jodydonetti · 2021-12-04T13:34:28Z

Reopened to better discuss with @jasenf .

jasenf · 2021-12-04T13:41:52Z

We don't use Redis for storage (as you are finding out via our backplane conversation :-) so having in-memory regions using either key-prefixing (which is what we ended up doing) or built in support from the cache provider (having methods that support the passing of the requested region/partition into each call and having a cache provider that simply stores each region in a different dictionary) would work fine.

It's not unusual for some features not to be supported based on underlying providers. If there was no efficient way to implement partitioning in Redis, so be it. But an in-memory one was all we needed and works fairly well. We run our web servers across over 10 different instances and have to manage thousands of requests per minute. I know our solution of locally partitioned memory caches / messaging only backplane isn't for everyone, but it seems like the most graceful one we have been able to come up with in scenarios where there are both high read and update volume.
thanks!

jodydonetti · 2021-12-04T14:06:41Z

I see, thanks for the explanation.

So we would have to be explicit about supporting or not a certain feature (like regions in this case) only in some situations or with some cache implementations, either via a bool flag or via a marker interface or something like that.
Of course it would follow that if a method is called with a region specified, and the current cache implementation does not support it, an exception like NotSupportedException would de thrown so that the user will know about that upfront.

Ok, I have to say it starts to make sense.

I have yet to understand the memory + backplane without dist. cache part on the other thread issue: I've read your explanation (thanks!) but I'm still trying to wrap my head around some details + trying to see if there's a way to uniform the different use cases to obtain a good design that does not confuse users but works gracefully in all scenarios.

Again thanks for having taken your experience here!

I'm closing this issue as its subject (the key prefix) is anyway not the best approach as we mentioned, but will open a new one specifically about introducing cache regions as a high level concept, to discuss the hypothetical design and see if I can come up with something good.

jodydonetti self-assigned this Nov 21, 2021

jodydonetti added the help wanted Extra attention is needed label Nov 21, 2021

jodydonetti pinned this issue Nov 21, 2021

ZiggyCreatures deleted a comment from alb-xss Nov 22, 2021

jodydonetti added the question Further information is requested label Nov 25, 2021

jodydonetti closed this as completed Dec 1, 2021

jodydonetti unpinned this issue Dec 1, 2021

jodydonetti reopened this Dec 4, 2021

jodydonetti closed this as completed Dec 4, 2021

jodydonetti mentioned this issue Dec 4, 2021

Cache Regions? #37

Closed

jodydonetti mentioned this issue Feb 2, 2023

[FEATURE] 👷‍♂️ Builder Pattern #106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚠ Breaking change proposal: removing `CacheKeyPrefix`, opinions needed #33

⚠ Breaking change proposal: removing `CacheKeyPrefix`, opinions needed #33

jodydonetti commented Nov 21, 2021 •

edited

Loading

jodydonetti commented Nov 24, 2021

jodydonetti commented Dec 1, 2021

jasenf commented Dec 1, 2021

jodydonetti commented Dec 4, 2021

jodydonetti commented Dec 4, 2021

jasenf commented Dec 4, 2021

jodydonetti commented Dec 4, 2021

⚠ Breaking change proposal: removing CacheKeyPrefix, opinions needed #33

⚠ Breaking change proposal: removing CacheKeyPrefix, opinions needed #33

Comments

jodydonetti commented Nov 21, 2021 • edited Loading

Brief History

Why remove it?

A practical example

Alternatives

How it will be removed

How to keep our current cache keys

Before

After

Ok but is anybody actually using the CacheKeyPrefix option?

What can you do?

jodydonetti commented Nov 24, 2021

🔔 Update (2021-11-23)

jodydonetti commented Dec 1, 2021

jasenf commented Dec 1, 2021

jodydonetti commented Dec 4, 2021

jodydonetti commented Dec 4, 2021

jasenf commented Dec 4, 2021

jodydonetti commented Dec 4, 2021

⚠ Breaking change proposal: removing `CacheKeyPrefix`, opinions needed #33

⚠ Breaking change proposal: removing `CacheKeyPrefix`, opinions needed #33

jodydonetti commented Nov 21, 2021 •

edited

Loading

Ok but is anybody actually using the `CacheKeyPrefix` option?