Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MB-61029: Caching Vec To DocID Map #231

Merged
merged 6 commits into from
Apr 18, 2024
Merged

MB-61029: Caching Vec To DocID Map #231

merged 6 commits into from
Apr 18, 2024

Conversation

Likith101
Copy link
Contributor

  • Generalized some of the cache functions
  • Cache will include vec to docid mapping as well as one structure to help vec excluded calculation as well as the vec excluded structure
  • Added back the cache mutexes because cache reads will also write back some of the structures depending on the except bitmap

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Likith101 See my in-line comments, in summary I don't think we should be caching anything else but vecToDocIDMap.

faiss_vector_cache.go Outdated Show resolved Hide resolved
faiss_vector_cache.go Outdated Show resolved Hide resolved
@abhinavdangeti
Copy link
Member

@Likith101 Let's also rebase your changes over the base branch, some conflicts have cropped up.

@Likith101
Copy link
Contributor Author

@Likith101 Let's also rebase your changes over the base branch, some conflicts have cropped up.

I'll rebase once the base branch is merged.

@abhinavdangeti
Copy link
Member

We've merged the base branch, will review after the rebase then (you can hit edit above to change the base branch to master) - make sure only the diff appears in this PR here.

@Likith101 Likith101 changed the base branch from delayedClosing to master April 15, 2024 11:33
@abhinavdangeti
Copy link
Member

@Likith101 I feel the base branch you're using here has deviated so much and since you have a commit that does something and then goes back on it with a following commit - resolving is not quite straight forward.

My recommendation for you here is ..

  • start off fresh with your load branch vecToDocMap at the tip of master
  • make all the changes you need again over it
  • force push your code as a single commit to this branch on origin

 - Generalised some of the cache function names to be inclusive of the map
 - Added the map to the cache which will behave the same as the index
 - Except bitmap logic is not part of the cache and the vecs excluded is
calculated outside of the map
@Likith101
Copy link
Contributor Author

Here are some of the performance improvements that I noticed on my local when trying to test these changes.

Time Between Queries Average Latency with No Cache Only Vector Index Cached Both Index and Map Cached
10ms 559.47ms 69.74ms 29.85ms
25ms 524.53ms 66.21ms 32.15ms
50ms 525.32ms 65.93ms 37.58ms
75ms 522.78ms 66.86ms 40.72ms
100ms 523.66ms 68.32ms 45.74ms
200ms 524.63ms 84.30ms 45.82ms
250ms 537.42ms 95.01ms 47.12ms
300ms 523.27ms 97.04ms 43.64ms
500ms 523.09ms 98.45ms 48.17ms
750ms 528.64ms 97.59ms 46.83ms
1000ms 549.17ms 97.93ms 48.17ms
1500ms 564.24ms 286.55ms 270.62ms
2000ms 562.52ms 315.56ms 289.28ms
2500ms 564.03ms 624.43ms 527.14ms
3000ms 558.75ms 556.64ms 550.42ms
5000ms 563.99ms 561.57ms 566.26ms

Based on these results, we can see that the cache is cleared out after roughly 2.5 seconds after a single query is hit. A burst of queries will mean that more time is needed to clear the cache, but since the decay is exponential, it will not stay loaded for a very long time. (Roughly 100 queries within 1 second will mean the cache stays for 6-7 seconds). The tests used random queries on sift 1M index.

faiss_vector_cache.go Outdated Show resolved Hide resolved
faiss_vector_cache.go Outdated Show resolved Hide resolved
faiss_vector_cache.go Show resolved Hide resolved
faiss_vector_posting.go Outdated Show resolved Hide resolved
faiss_vector_cache.go Outdated Show resolved Hide resolved
faiss_vector_posting.go Outdated Show resolved Hide resolved
abhinavdangeti
abhinavdangeti previously approved these changes Apr 16, 2024
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me now.
@Thejas-bhat would you make one pass here as well please.

abhinavdangeti
abhinavdangeti previously approved these changes Apr 16, 2024
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought - both @Likith101 & @Thejas-bhat should comment on any concerns.

abhinavdangeti
abhinavdangeti previously approved these changes Apr 16, 2024
faiss_vector_cache.go Outdated Show resolved Hide resolved
faiss_vector_cache.go Show resolved Hide resolved
faiss_vector_cache.go Outdated Show resolved Hide resolved
abhinavdangeti
abhinavdangeti previously approved these changes Apr 17, 2024
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Likith101 . Looks good.
Best let @Thejas-bhat put in his review as well to make sure all his comments/concerns have been addressed.

+ Increase ref counts within locking (read or write) to avoid any
  possibility of raciness. This includes invoking cacheEntry.load().
+ Also refactors getInvalidVecs to getVecIDsToExclude.
@abhinavdangeti abhinavdangeti merged commit eeb2336 into master Apr 18, 2024
6 checks passed
@abhinavdangeti abhinavdangeti deleted the vecToDocMap branch April 18, 2024 13:59
abhinavdangeti added a commit to blevesearch/bleve that referenced this pull request Apr 18, 2024
Includes:
* eeb2336 Likith B | MB-61029: Caching Vec To DocID Map (blevesearch/zapx#231)
* b2384fc Rahul Rampure | minor optimizations and bug fixes (blevesearch/zapx#233)
* b56abea Thejas-bhat | MB-61029: Deferring the closing of vector index (blevesearch/zapx#226)
abhinavdangeti added a commit to blevesearch/bleve that referenced this pull request Apr 18, 2024
Includes:
* eeb2336 Likith B | MB-61029: Caching Vec To DocID Map
(blevesearch/zapx#231)
* b2384fc Rahul Rampure | minor optimizations and bug fixes
(blevesearch/zapx#233)
* b56abea Thejas-bhat | MB-61029: Deferring the closing of vector index
(blevesearch/zapx#226)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants