MB-61029: Caching Vec To DocID Map #231

Likith101 · 2024-04-10T06:39:24Z

Generalized some of the cache functions
Cache will include vec to docid mapping as well as one structure to help vec excluded calculation as well as the vec excluded structure
Added back the cache mutexes because cache reads will also write back some of the structures depending on the except bitmap

abhinavdangeti

@Likith101 See my in-line comments, in summary I don't think we should be caching anything else but vecToDocIDMap.

faiss_vector_cache.go

abhinavdangeti · 2024-04-11T20:17:17Z

@Likith101 Let's also rebase your changes over the base branch, some conflicts have cropped up.

Likith101 · 2024-04-12T09:17:09Z

@Likith101 Let's also rebase your changes over the base branch, some conflicts have cropped up.

I'll rebase once the base branch is merged.

abhinavdangeti · 2024-04-12T17:07:16Z

We've merged the base branch, will review after the rebase then (you can hit edit above to change the base branch to master) - make sure only the diff appears in this PR here.

abhinavdangeti · 2024-04-15T16:46:38Z

@Likith101 I feel the base branch you're using here has deviated so much and since you have a commit that does something and then goes back on it with a following commit - resolving is not quite straight forward.

My recommendation for you here is ..

start off fresh with your load branch vecToDocMap at the tip of master
make all the changes you need again over it
force push your code as a single commit to this branch on origin

- Generalised some of the cache function names to be inclusive of the map - Added the map to the cache which will behave the same as the index - Except bitmap logic is not part of the cache and the vecs excluded is calculated outside of the map

Likith101 · 2024-04-16T06:39:16Z

Here are some of the performance improvements that I noticed on my local when trying to test these changes.

Time Between Queries	Average Latency with No Cache	Only Vector Index Cached	Both Index and Map Cached
10ms	559.47ms	69.74ms	29.85ms
25ms	524.53ms	66.21ms	32.15ms
50ms	525.32ms	65.93ms	37.58ms
75ms	522.78ms	66.86ms	40.72ms
100ms	523.66ms	68.32ms	45.74ms
200ms	524.63ms	84.30ms	45.82ms
250ms	537.42ms	95.01ms	47.12ms
300ms	523.27ms	97.04ms	43.64ms
500ms	523.09ms	98.45ms	48.17ms
750ms	528.64ms	97.59ms	46.83ms
1000ms	549.17ms	97.93ms	48.17ms
1500ms	564.24ms	286.55ms	270.62ms
2000ms	562.52ms	315.56ms	289.28ms
2500ms	564.03ms	624.43ms	527.14ms
3000ms	558.75ms	556.64ms	550.42ms
5000ms	563.99ms	561.57ms	566.26ms

Based on these results, we can see that the cache is cleared out after roughly 2.5 seconds after a single query is hit. A burst of queries will mean that more time is needed to clear the cache, but since the decay is exponential, it will not stay loaded for a very long time. (Roughly 100 queries within 1 second will mean the cache stays for 6-7 seconds). The tests used random queries on sift 1M index.

faiss_vector_cache.go

faiss_vector_posting.go

faiss_vector_cache.go

faiss_vector_posting.go

abhinavdangeti

Looking good to me now.
@Thejas-bhat would you make one pass here as well please.

abhinavdangeti

On second thought - both @Likith101 & @Thejas-bhat should comment on any concerns.

faiss_vector_cache.go

abhinavdangeti

Thanks @Likith101 . Looks good.
Best let @Thejas-bhat put in his review as well to make sure all his comments/concerns have been addressed.

+ Increase ref counts within locking (read or write) to avoid any possibility of raciness. This includes invoking cacheEntry.load(). + Also refactors getInvalidVecs to getVecIDsToExclude.

Includes: * eeb2336 Likith B | MB-61029: Caching Vec To DocID Map (blevesearch/zapx#231) * b2384fc Rahul Rampure | minor optimizations and bug fixes (blevesearch/zapx#233) * b56abea Thejas-bhat | MB-61029: Deferring the closing of vector index (blevesearch/zapx#226)

Likith101 requested review from abhinavdangeti, Thejas-bhat, metonymic-smokey, CascadingRadium and moshaad7 April 10, 2024 06:39

abhinavdangeti requested changes Apr 10, 2024

View reviewed changes

faiss_vector_cache.go Outdated Show resolved Hide resolved

faiss_vector_cache.go Outdated Show resolved Hide resolved

Thejas-bhat force-pushed the delayedClosing branch from 97f15cb to 774b93d Compare April 11, 2024 06:36

Likith101 force-pushed the vecToDocMap branch from 8d76951 to 018d975 Compare April 12, 2024 08:58

Likith101 changed the base branch from delayedClosing to master April 15, 2024 11:33

Likith101 force-pushed the vecToDocMap branch from 018d975 to ba89580 Compare April 16, 2024 06:20

abhinavdangeti reviewed Apr 16, 2024

View reviewed changes

Proper lock context fixes + naming + optimizations

f43df2c

abhinavdangeti previously approved these changes Apr 16, 2024

View reviewed changes

Code organizing

11a2ad6

abhinavdangeti dismissed their stale review via 11a2ad6 April 16, 2024 18:24

abhinavdangeti previously approved these changes Apr 16, 2024

View reviewed changes

Consistent cache lookups

7913386

abhinavdangeti dismissed their stale review via 7913386 April 16, 2024 18:36

abhinavdangeti previously approved these changes Apr 16, 2024

View reviewed changes

Thejas-bhat reviewed Apr 17, 2024

View reviewed changes

faiss_vector_cache.go Outdated Show resolved Hide resolved

faiss_vector_cache.go Show resolved Hide resolved

faiss_vector_cache.go Outdated Show resolved Hide resolved

Function to Calculate VecIDsToExclude

5e3882f

Likith101 dismissed abhinavdangeti’s stale review via 5e3882f April 17, 2024 05:50

abhinavdangeti previously approved these changes Apr 17, 2024

View reviewed changes

abhinavdangeti dismissed their stale review via f756ea7 April 17, 2024 19:52

Consistent calls to getInvalidVecs(..)* -> getVecIDsToExclude(..)

42fb9dd

+ Increase ref counts within locking (read or write) to avoid any possibility of raciness. This includes invoking cacheEntry.load(). + Also refactors getInvalidVecs to getVecIDsToExclude.

abhinavdangeti force-pushed the vecToDocMap branch from f756ea7 to 42fb9dd Compare April 17, 2024 19:53

abhinavdangeti approved these changes Apr 17, 2024

View reviewed changes

Thejas-bhat approved these changes Apr 18, 2024

View reviewed changes

abhinavdangeti merged commit eeb2336 into master Apr 18, 2024
6 checks passed

abhinavdangeti deleted the vecToDocMap branch April 18, 2024 13:59

abhinavdangeti mentioned this pull request Apr 18, 2024

Upgrade blevesearch/zapx/v16 blevesearch/bleve#2016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MB-61029: Caching Vec To DocID Map #231

MB-61029: Caching Vec To DocID Map #231

Likith101 commented Apr 10, 2024

abhinavdangeti left a comment

abhinavdangeti commented Apr 11, 2024

Likith101 commented Apr 12, 2024

abhinavdangeti commented Apr 12, 2024

abhinavdangeti commented Apr 15, 2024

Likith101 commented Apr 16, 2024

abhinavdangeti left a comment

abhinavdangeti left a comment

abhinavdangeti left a comment

MB-61029: Caching Vec To DocID Map #231

MB-61029: Caching Vec To DocID Map #231

Conversation

Likith101 commented Apr 10, 2024

abhinavdangeti left a comment

Choose a reason for hiding this comment

abhinavdangeti commented Apr 11, 2024

Likith101 commented Apr 12, 2024

abhinavdangeti commented Apr 12, 2024

abhinavdangeti commented Apr 15, 2024

Likith101 commented Apr 16, 2024

abhinavdangeti left a comment

Choose a reason for hiding this comment

abhinavdangeti left a comment

Choose a reason for hiding this comment

abhinavdangeti left a comment

Choose a reason for hiding this comment