-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MB-61029: Caching Vec To DocID Map #231
Conversation
Likith101
commented
Apr 10, 2024
- Generalized some of the cache functions
- Cache will include vec to docid mapping as well as one structure to help vec excluded calculation as well as the vec excluded structure
- Added back the cache mutexes because cache reads will also write back some of the structures depending on the except bitmap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Likith101 See my in-line comments, in summary I don't think we should be caching anything else but vecToDocIDMap
.
97f15cb
to
774b93d
Compare
@Likith101 Let's also rebase your changes over the base branch, some conflicts have cropped up. |
I'll rebase once the base branch is merged. |
We've merged the base branch, will review after the rebase then (you can hit edit above to change the base branch to |
@Likith101 I feel the base branch you're using here has deviated so much and since you have a commit that does something and then goes back on it with a following commit - resolving is not quite straight forward. My recommendation for you here is ..
|
- Generalised some of the cache function names to be inclusive of the map - Added the map to the cache which will behave the same as the index - Except bitmap logic is not part of the cache and the vecs excluded is calculated outside of the map
Here are some of the performance improvements that I noticed on my local when trying to test these changes.
Based on these results, we can see that the cache is cleared out after roughly 2.5 seconds after a single query is hit. A burst of queries will mean that more time is needed to clear the cache, but since the decay is exponential, it will not stay loaded for a very long time. (Roughly 100 queries within 1 second will mean the cache stays for 6-7 seconds). The tests used random queries on sift 1M index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me now.
@Thejas-bhat would you make one pass here as well please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought - both @Likith101 & @Thejas-bhat should comment on any concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Likith101 . Looks good.
Best let @Thejas-bhat put in his review as well to make sure all his comments/concerns have been addressed.
+ Increase ref counts within locking (read or write) to avoid any possibility of raciness. This includes invoking cacheEntry.load(). + Also refactors getInvalidVecs to getVecIDsToExclude.
f756ea7
to
42fb9dd
Compare
Includes: * eeb2336 Likith B | MB-61029: Caching Vec To DocID Map (blevesearch/zapx#231) * b2384fc Rahul Rampure | minor optimizations and bug fixes (blevesearch/zapx#233) * b56abea Thejas-bhat | MB-61029: Deferring the closing of vector index (blevesearch/zapx#226)
Includes: * eeb2336 Likith B | MB-61029: Caching Vec To DocID Map (blevesearch/zapx#231) * b2384fc Rahul Rampure | minor optimizations and bug fixes (blevesearch/zapx#233) * b56abea Thejas-bhat | MB-61029: Deferring the closing of vector index (blevesearch/zapx#226)