Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MemoryCache instead of WeakReference dictionary #101

Open
Bykiev opened this issue Mar 22, 2024 · 11 comments
Open

Use MemoryCache instead of WeakReference dictionary #101

Bykiev opened this issue Mar 22, 2024 · 11 comments

Comments

@Bykiev
Copy link
Contributor

Bykiev commented Mar 22, 2024

Right now WeakReference is used for cache purpose, but it was not a good idea to use it as it's a short-lived and you can't control when it'll be flushed. I think it'll be better to replace it with MemoryCache

@david-brink-talogy, what do you think about it?

@david-brink-talogy
Copy link
Collaborator

Sounds like a good idea. System.Runtime.Caching can be used on netstandard2.0 so it would still work for the full framework. I don't see net45 listed, but perhaps that's because it is no longer supported. You'd probably want to use the singleton instance MemoryCache.Default.

@Bykiev
Copy link
Contributor Author

Bykiev commented Mar 22, 2024

According to the docs .NET Framework 4+ is supported.

Would you like to make a PR?

@sklose
Copy link
Owner

sklose commented Mar 22, 2024

I think the idea behind using WeakReference was that it's caching optimistically. If the prepared expression is still alive outside the library then it's going to reuse it. If it has been garbage collected then it's going to recreate it. This has minimal effects on memory consumption by NCalc itself (the WeakReferences don't really require much memory). I am not sure if it's good idea to add a full cache to the library. It can have significant impact on the memory profile. I am wondering if a feature like that is better suited to live outside the library as the application code probably knows best which caching strategy is optimal for its use case.

@Bykiev
Copy link
Contributor Author

Bykiev commented Mar 22, 2024

The current implementation looks not good, every time on compiling expression it cleans the cache. Also the life of such cache is not big and will be flushed on every GC.
I don't see any easy way to implement the caching outside the library, do you have a code sample?

Upd: this sample from https://github.com/ncalc/ncalc:

This example uses Newtonsoft.Json.

Serializing

var compiled = Expression.Compile(expression, true);
var serialized = JsonConvert.SerializeObject(compiled, new JsonSerializerSettings
{
    TypeNameHandling = TypeNameHandling.All // We need this to allow serializing abstract classes
});

Deserializing

var deserialized = JsonConvert.DeserializeObject<LogicalExpression>(serialized, new JsonSerializerSettings
{
    TypeNameHandling = TypeNameHandling.All
});

Expression.CacheEnabled = false; // We cannot use NCalc's built in cache at the same time.
var exp = new Expression(deserialized);
exp.Parameters = new Dictionary<string, object> {
    {"waterlevel", inputValue}
};

var evaluated = exp.Evaluate();

@david-brink-talogy
Copy link
Collaborator

If the prepared expression is still alive outside the library then it's going to reuse it.

Fair point, but if you had a reference to the LogicExpression woudln't you just use it rather than asking NCalc again? It's good to highlight that whatever settings chosen for CacheItemPolicy would be opinionated, and even with new arguments it still might not satisfy all use cases.

Over here , I questioned the potential overhead of pruning the cache on every single compilation. If you have a use case where you're maintaining strong references for many expressions, or there is so much memory that GC doesn't happen very often, you could spend a lot of time iterating over the cache for every new compilation.

What do you think about tracking the number of compilations and clean the cache on some predefined interval? It could also be done via a scheduled Timer.

compilations++;

if (compilations % 1000 == 0)
{
  CleanCache();
}

@Bykiev
Copy link
Contributor Author

Bykiev commented Mar 22, 2024

What do you think about tracking the number of compilations and clean the cache on some predefined interval? It could also be done via a scheduled Timer.

We can also introduce a sort of background service to track and clean cache, but the MemoryCache still more customizable

@sklose
Copy link
Owner

sklose commented Mar 22, 2024

Fair point, but if you had a reference to the LogicExpression woudln't you just use it rather than asking NCalc again?

Yes, that would be the ideal way of caching and pretty much what I meant when I said full caching should probably live outside the library. The caching we have right now is just an optimization for heavy loads where an application keeps asking for the same expressions over and over again.

I've used this in the past in a web application where we'd compute some user/session specific values with expressions and there are many independent requests in parallel. You get a nice speed up by not re-parsing the expressions all the time, and at the same time you can still rely on the GC to eventually recycle them and don't have to worry about an extra caching layer.

I probably wouldn't want to take NCalc further than this heavy-load optimization in terms of caching and defer the rest to the user.

What do you think about tracking the number of compilations and clean the cache on some predefined interval? It could also be done via a scheduled Timer.

I do agree that pruning the cache on every new expression parsing step is probably excessive. Could we use IDisposable for the cleanup? We could eagerly remove an expression from the cache when it's GC'ed.

@Bykiev
Copy link
Contributor Author

Bykiev commented Mar 22, 2024

I do agree that pruning the cache on every new expression parsing step is probably excessive. Could we use IDisposable for the cleanup? We could eagerly remove an expression from the cache when it's GC'ed.

Introducing the IDisposable seems to be a breaking change. Are we ready for it?

@david-brink-talogy
Copy link
Collaborator

david-brink-talogy commented Mar 22, 2024

I'd agree that IDisposable seems like a breaking change in terms of expected contract as it implies using.

Would something like this work?

internal class CacheEntry
{
  private string _expression;
  private IDictionary<string, WeakReference<CacheEntry>> _cache;
  public LogicalExpression LogicalExpression; { get; private set; }

  public CacheEntry(string expression, LogicalExpression logicalExpression, IDictionary<string, WeakReference<CacheEntry>> cache)
  {
    Expression = expression;
    LogicalExpression = logicalExpression;
    _cache = cache;
  }

  ~CacheEntry()  // finalizer
  {  
    _cache.Remove(Expression);
  }
}

The cache would then be defined as follows. I assume that once all strong references to LogicalExpression are lost, the CacheEntry can be collected, calling the finalizer and thus cleaning up the cache. I'm pretty sure the GC collects finalizers last, which may limit the aggression of LogicalExpression collection.

        private static readonly ConcurrentDictionary<string, WeakReference<CacheEntry>> _compiledExpressions =
            new ConcurrentDictionary<string, WeakReference<CacheEntry>>();

@sklose
Copy link
Owner

sklose commented Mar 22, 2024

I don't think this gonna work - this would trigger when the CacheEntry is garbage collected, which is not tied at all to the lifetime of the underlying LogicalExpression (will likely get collected much sooner).

We could just do this

class Expression
{
  ~Expression() {
     // remove weak reference from cache here
  }
}

We don't need the full IDisposable pattern with the interface and using block. Just the finalizer portion is enough.

@david-brink-talogy
Copy link
Collaborator

I went down he rabbit hole and found this relevant thread.

One other thing I found mentioned is that the finalizer approach could be problematic if the cache dictionary is collected at the same time as the Expression. Unfortunately, this is almost identical to what I recommended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants