Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added LFU caching eviction with constant time operation #42

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mrchocha
Copy link
Contributor

@mrchocha mrchocha commented Oct 19, 2022

context

  • Added LFU caching eviction using doubly-linked lists and hash table in constant time operation

Issue Link: #10
Reference Link: https://arpitbhayani.me/blogs/lfu

screenshots

  1. SET operation
    set_operation

  2. GET operation and Re-Balancing
    get_operation_and_rebalancing

  3. DEL operation
    delete_operation

@mrchocha mrchocha changed the title feat: added LFU caching with constant time CRUD operation feat: added LFU caching eviction with constant time CRUD operation Oct 19, 2022
@mrchocha mrchocha changed the title feat: added LFU caching eviction with constant time CRUD operation feat: added LFU caching eviction with constant time operation Oct 19, 2022
@rohanverma94
Copy link
Contributor

Looks good enough, but it would be better if some degree of concurrency safety is there.

@arpitbbhayani
Copy link
Contributor

Our database is in-memory and this implementation is very heavy on memory requiring a lot of auxiliary data structures. We should try to reduce that even if we don't meet the correctness.

@arpitbbhayani
Copy link
Contributor

Plus this is not CPU cache efficient. Keeping another copies of keys is also inefficient.

@rohanverma94
Copy link
Contributor

rohanverma94 commented Oct 19, 2022

@arpitbbhayani I agree, this implementation might be good for a cache(small set of values) but not for an in-mem database(a large set of values). As it increases the data movements(allocations) as well as it will pressurize the GC and "stop-the-world" problems with the CPU would arise. Have you found a suitable approximation algorithm for LFU?

An approximate LFU algorithm is simple enough, please look at this @mrchocha
https://tokers.github.io/posts/lru-and-lfu-in-redis-memory-eviction/

@mrchocha
Copy link
Contributor Author

mrchocha commented Oct 20, 2022

Our database is in-memory and this implementation is very heavy on memory requiring a lot of auxiliary data structures. We should try to reduce that even if we don't meet the correctness.

Hey, I know it will add some new data structures. but not sure how can I reduce it. any suggestions?
CC: @arpitbbhayani @rohanverma94

@mrchocha
Copy link
Contributor Author

Plus this is not CPU cache efficient. Keeping another copies of keys is also inefficient.

Ya, I thought it would be good if I can create it in plug and play manner. otherwise, I had to change the store object, and then every object will have a couple of extra variables that might not be used for other eviation policies.

@arpitbbhayani
Copy link
Contributor

Read approximate counting - Morris counter and piggy back on LRU field we already have in our object struct.

Research more on how it can be done while being more space efficient. Coding the solution is the easiest thing there.

I would highly recommend you explore a bunch of approaches and papers to come up with the best solution.

Key Pointers

  • Eviction can be approximated
  • Solution should be extremely space efficient

Happy to discuss.

@mrchocha
Copy link
Contributor Author

Ya got some ideas from the articles like how it can be memory efficient. correct me if I am wrong.
we need to use 2 vars.

  1. decay time unit8
  2. counter unit8

here counter will be logarithmic/Morris and it's 8 bit so at max, we will have 256 nodes in the frequency list and need to decrease the counter after last access time - current time > decay time

since we have ePoolSizeMax int = 16 we can crate the configuration while creating eviction pool, such that the frequency list will discard the nodes from the tail, whenever it reaches the limit.

like Redis uses this formula for counter
1/(old_value*lfu_log_factor+1)

@arpitbbhayani
Copy link
Contributor

@mrchocha there are a lot of gaps in your understanding. Would recommend spending some time understanding the approach well. Also, this is not the only way to do it. Research more on this.

Would really recommend you read more papers and seek other approaches. It is okay if it takes time. Let's find a novel approach that would make Dice DB super-efficient.

@mrchocha
Copy link
Contributor Author

sure

@arpitbbhayani
Copy link
Contributor

Let's keep dumping approached in #44 and use that space to converge on a solid approach.

@yashs360
Copy link
Contributor

@mrchocha Thanks for the contribution. There are conflicts in this PR atm.
The code base has moved forward lot. Please rebase and re-submit if you are still keen to fix this. We will be cleaning up old PR's if they are inactive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants