New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Secondary index cache #54101
base: master
Are you sure you want to change the base?
Secondary index cache #54101
Conversation
This is an automated comment for commit 1dbe1d0 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
ClickHouse already have skip index cache, this will be different? |
Hm, right, maybe we should remove the uncompressed cache for index if this new cache works out. The idea is to avoid deserializing the index for each query. If the index allows doing the lookup faster than a full scan then deserializing the whole index in O(n) time on each query partially defeats the purpose. (Alternatively, we could say that index should be able to work without deserialization, directly from a SeekableReadBuffer. But that would take considerable extra work to support, in particular for usearch, at least currently. But maybe we'll do it anyway, to avoid caching the whole index if we only access small parts of it.) |
3134711
to
29d593b
Compare
Now this PR undoes #55683 . Seems impossible to have both: if granules are inserted into cache, they can't be reused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM, please see the inline comments.
Do I correctly understand that there are currently no tests (functional and integration) for the change?
granule = index->createIndexGranule(); | ||
auto load_func = [&] { | ||
initStreamIfNeeded(); | ||
if (stream_mark != mark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if mark
is less than stream_mark
? It wasn't the case before the change, but now the interface of MergeTreeIndexReader
allows it. Should there be an assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean, previous interface MergeTreeIndexReader::seek()
allowed moving backwards too, and it should work fine AFAICT. (Idk whether callers currently do that.)
@@ -71,7 +71,11 @@ namespace DB | |||
M(String, index_mark_cache_policy, DEFAULT_INDEX_MARK_CACHE_POLICY, "Secondary index mark cache policy name.", 0) \ | |||
M(UInt64, index_mark_cache_size, DEFAULT_INDEX_MARK_CACHE_MAX_SIZE, "Size of cache for secondary index marks. Zero means disabled.", 0) \ | |||
M(Double, index_mark_cache_size_ratio, DEFAULT_INDEX_MARK_CACHE_SIZE_RATIO, "The size of the protected queue in the secondary index mark cache relative to the cache's total size.", 0) \ | |||
M(UInt64, mmap_cache_size, DEFAULT_MMAP_CACHE_MAX_SIZE, "A cache for mmapped files.", 0) \ | |||
M(UInt64, mmap_cache_size, DEFAULT_MMAP_CACHE_MAX_SIZE, "Maximum number of files to keep in the mmapped file cache.", 0) \ | |||
M(String, secondary_index_cache_policy, DEFAULT_SECONDARY_INDEX_CACHE_POLICY, "Index mark cache policy name.", 0) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be a use case for the primary and secondary index caches using different eviction policies? Just curious :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
¯_(ツ)_/¯ , I just added the same few settings as caches above. IIRC primary index doesn't have a cache, it's just always kept in memory as long as the data part is loaded. For the other caches above - I guess the only way this would come up is if someone notices (e.g. sees it in profiler) that one of the caches isn't working well and tries different settings for just that cache.
We already have cache and uncompressed cache for secondary indexes. What does this cache differently? ClickHouse/src/Core/ServerSettings.h Lines 68 to 73 in add2435
|
@jrdi, the index is still deserialized, and it can take a lot of time. Although this scenario is limited, but vector search is often used on minuscule datasets (under one billion records). |
FWIW here are some numbers, shortly it gives significant boost for ANN indexes. I've tested on a table with 33'114'778 rows with embeddings (array of 768 elements):
|
But funny thing that it actually may slow down regular indexes. InputCREATE TABLE data
(
`part` Int32,
`key` Int32,
`v1` Int32,
`s` SimpleAggregateFunction(sum, Int64),
INDEX v1_index v1 TYPE minmax GRANULARITY 1
)
ENGINE = AggregatingMergeTree
PARTITION BY part
ORDER BY key
SETTINGS index_granularity = 1;
insert into data (part, key, v1) select number%100, number, number from numbers(1e6); Before: EXPLAIN PIPELINE
SELECT *
FROM data
FINAL
WHERE v1 > 1
SETTINGS max_threads = 2, max_final_threads = 2, force_data_skipping_indices = 'v1_index', use_skip_indexes_if_final = 0;
-- 16 rows in set. Elapsed: 0.396 sec.
EXPLAIN PIPELINE
SELECT *
FROM data
FINAL
WHERE v1 > 1
SETTINGS max_threads = 2, max_final_threads = 2, force_data_skipping_indices = 'v1_index', use_skip_indexes_if_final = 1;
-- 16 rows in set. Elapsed: 0.756 sec. After: EXPLAIN PIPELINE
SELECT *
FROM data
FINAL
WHERE v1 > 1
SETTINGS max_threads = 2, max_final_threads = 2, force_data_skipping_indices = 'v1_index', use_skip_indexes_if_final = 0;
-- 16 rows in set. Elapsed: 0.307 sec.
EXPLAIN PIPELINE
SELECT *
FROM data
FINAL
WHERE v1 > 1
SETTINGS max_threads = 2, max_final_threads = 2, force_data_skipping_indices = 'v1_index', use_skip_indexes_if_final = 1;
-- 16 rows in set. Elapsed: 1.496 sec. So now it is 2x slower, perf points to perfPerf
Perf
Perf
|
we really need this 515x speedup in vector search 🥹 |
Hey @azat, awesome numbers! Could you share a little bit more details on how you did this benchmark? I'm trying to setup a table using ANN indexes (already tried both annoy and usearch) but have problems – the index doesn't speed up my select queries. Even more – without index queries perform better (however still not very good, that is why I'm looking for a optimizations way). Also a strange thing here is that the index size is almost equal to the embedding column size (even with compression). My table has a relatively simple structure and I already tried different parameters for index, including granularity and compression. Queries execution time for this table with ~3kk rows may take around 3-5 seconds which is too slow for such data volume as far as I know. Embeddings dimension that I use is 1536 (also tried to reduce it but haven't noticed much of a difference).
Could you please share, if possible, CH config and table structure for you benchmark so I can compare it on my side? Probably I'm doing something wrong with CH settings or data structuring but I'm out of ideas at the moment. Thank you! |
@mcproger are you using binaries from this PR? |
@azat no, not yet. I was jus looking for ways to optimize my setup on v23.8.7.24 and found this discussion. I wonder either I'm doing something wrong with current setup or this fix with cache will just fix queries perfomance and there is no sense in trying to optimize perfomance at the moment. |
Yes, you need this PR to have this improvement. |
Hm, yeah, plausible that 1M cache lookups add 800ms, they do ~2 unordered_map lookups under a global mutex. Not sure what to do about it, other than just say it's worth it. I at least tried replacing unordered_map with absl::flat_hash_map, but it had little effect. |
…the slow EXPLAIN query from the comment 28% faster
Yes. Added one now. Should be ready for review again now. |
Hmm, how do we deal with this perf regression then? https://s3.amazonaws.com/clickhouse-test-reports/54101/1dbe1d0f24ffce7329f333f48118c74be6b2fcc8/performance_comparison_%5B1_4%5D/report.html |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added a cache for deserialized secondary index granules. This should make repeated queries that use secondary index faster, if the index for the whole table fits in the cache. Size of the new cache is controlled by server setting
secondary_index_cache_size
.Pretty straightforward. A lot of copypasta from UncompressedCache.
Should help a lot with usearch ANN indexes, when they all fit in memory. Making it fast when the index doesn't fit in memory would take more work, maybe even changing usearch's serialization format to have more locality.