Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 11644 - EvictingStrategy.LRU for std.functional.memoize #5826

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jercaianu
Copy link
Contributor

I implemented the O(1) algorithm for "get" and "set" from the cache similar to what is described here[1].

Thanks,
Alex

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @jercaianu! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Auto-close Bugzilla Description
11644 EvictingStrategy.LRU for std.functional.memoize

@MartinNowak
Copy link
Member

MartinNowak commented Oct 31, 2017

Thanks for your PR, looks interesting.

The usage of a doubly linked list and classes bothers me a little from a performance perspective.
Could you use indices and a lazily pre-allocated array instead?

Also most of the issue has been addressed by the random replacement from the cuckoo hashing. Do you have an example that has severe performance issues with that approach?

@andralex
Copy link
Member

Where is the link to [1]?

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great instance of "academia vs. real world" clash :)

@@ -1068,6 +1068,146 @@ template memoize(alias fun, uint maxSize)
}
}

/// ditto
template memoize(alias fun, uint maxSize, bool lru = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched this page and found no use of lru.

import std.traits : ReturnType;
import std.typecons : Tuple;

class Node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A class with dynamic allocation and linked in a doubly linked list is textbook-correct but certainly out of character for phobos and for high-performance applications. Per @MartinNowak we'd be looking at a properly managed contiguous array,


this(Tuple!(Parameters!fun) args = Tuple!(Parameters!fun).init,
ReturnType!fun res = ReturnType!fun.init)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, the defaults here are suprising - most functions will NOT return the default result for the default parameter values. Where are the defaults used?

static Node head;
static Node tail;
static Node[Tuple!Args] cache;
static int capacity = maxSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t probably

@andralex
Copy link
Member

By the way LRU sux we should use LIRS: https://en.wikipedia.org/wiki/LIRS_caching_algorithm

@MartinNowak
Copy link
Member

MartinNowak commented Nov 1, 2017

It's very welcome to review and properly benchmark the current approach.
#2591
If we find this to be inadequate for certain usages, we could look at different replacement policies.

Not too certain we really need a fancy eviction policy here, since we have a uniformly distributed hash (MD5), random replacement should be fairly OKish, and just using a slightly bigger cache will fix performance problems.
Of course a better eviction policy can further optimize for cache size vs. hit ratio.

If we go that route we should replace the current cuckoo hash implementation.

Hopefully we can find a general purpose replacement policy. Making it selectable via a template parameter does increase the complexity of memoize even further. An adaptive cache size would be nice as well (but hard), as it's often difficult to come up with a good number ahead of time.

@ben-manes
Copy link

LIRS is very complicated to implement correctly. Most implementations miss key details that degrades the hit rates, have memory leaks due to the paper neglecting to bound the ghost entries (as done in their C code), and may have jitter due to slow stack pruning if bounded inefficiently. However, it does have an excellent hit rate across many workloads and can be speedy.

After doing a deep dive into approaches for my caching library, I chose TinyLFU with some minor adaptions. It is very easy to implement, time/space efficient, and beats LIRS in many real-world workloads. As an admission policy, it can be paired with any eviction policy (such as an array-based approach rather than linked lists - e.g. sampled LRU).

@jercaianu
Copy link
Contributor Author

@ben-manes @MartinNowak
Thanks for the comments!
I have left this issue on the backburner, since I've been working more on allocators.
I will definitely take a look at TinyLFU.

Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants