[WIP] Add table-level ANN index with DiskANN backend by fastio · Pull Request #103675 · ClickHouse/ClickHouse

fastio · 2026-04-28T12:58:29Z

Resolves #85766

Changelog category (leave one):

Experimental Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add an experimental table-level Approximate Nearest Neighbor (ANN) index backed by DiskANN, with new SQL DDL syntax, query planner integration, and supporting SYSTEM commands.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

This change introduces an experimental table-level ANN index for high-dimensional vector search, using DiskANN as the underlying graph index.

Highlights:

New ANN index type for MergeTree tables, managed at table level (not per part) via an ANNIndexManager and group-based storage layout (ANNIndexGroup, ANNGroupCoverage, ANNGroupStorageDiskFull).
DiskANN integration through a Rust FFI wrapper (rust/workspace/diskann-clickhouse) plus C++ adapters (DiskANNIndexBuilder, DiskANNIndexSearcherAdapter) behind common IANNIndexBuilder / IANNIndexSearcher interfaces.
Background BuildANNIndexTask integrated with BackgroundJobsAssignee; vector data persisted via VectorStreamWriter and per-part row-id mapping (PartRowIdMap*).
Query planner pass useANNSearch that rewrites eligible nearest-neighbor queries to use the index, with a new tableANNCoverage function for diagnostics.
New MergeTree and Server settings, ProfileEvents, CurrentMetrics, AccessType, and a SYSTEM command for managing ANN indexes.
Stateless tests (04102-04111) covering DDL, query path, EXPLAIN, merge routing, metric/source validation, prefilter selectivity, and empty/small-table edge cases, plus extensive gtest unit tests for each component.

The feature is experimental and gated behind dedicated settings; existing vector similarity index behavior is unchanged.

Refs: #103671

TODO

• ✅ DiskANN FFI and MergeTree table-level ann index foundation
• ✅ Group persistence, coverage tracking, background build, invalidation, and GC
• ✅ Query-plan rewrite with mixed indexed/unindexed part reads
• ✅ Basic observability and tests

• ☐ Fix multi-group top-K correctness
• ☐ Implement ANNIndexGroup merge/compaction
• ☐ Add real rescoring/reranking
• ☐ Support multi-replica setups: ReplicatedMergeTree, parallel replicas, lifecycle consistency
• ☐ Wire unused settings, add docs, and stabilize build/gtest/stateless CI

SIFT-1M ANN Benchmark Results

Setup

Machine: 64C,128GB memory.
Dataset: sift-128-euclidean (1M base, 10k query, 128-d L2)
Index: table-level ann (DiskANN/Vamana), build_cfg=paper (max_degree / build_search_list_size / alpha per the DiskANN paper)
Single group (single_group), fixed beam_width=8, search_io_limit=500
3 repetitions per cell, median reported; hash_seed pinned → recall is fully deterministic (identical across all 3 runs)
1000 queries per cell, K=10, 200-query warm-up
Concurrency: concurrency ∈ {1, 32}
Single build: 335 s (~5.6 min), ann_groups=1

Recall@10 vs Search-List-Size Sweep

`search_list_size`	Recall@10	QPS (conc=1)	QPS (conc=32)	p50 (ms, c=1)	p99 (ms, c=1)	p99 (ms, c=32)
10	0.5471	108	219	~9.2	10.6	153.7
30	0.8080	99	192	~10.1	12.2	270.0
50	0.8967	91	174	~10.8	14.2	330.0
100	0.9621	79	137	~12.6	14.2	240.0
200	0.9886	62	91	~16.0	17.7	360.0

GIST-1M ANN Benchmark Results

Setup

Machine: 64C,128GB memory.
Dataset: gist-960-euclidean (1M base, 1k query, 960-d L2)
Index: table-level ann (DiskANN/Vamana), build_cfg=gist (DiskANN-paper-style params, tuned for high-dim)
Single group (single_group), fixed beam_width=8, search_io_limit=500
3 repetitions per cell, median reported; hash_seed pinned → recall is fully deterministic across all 3 runs
1000 queries per cell, K=10, 200-query warm-up
Concurrency: concurrency ∈ {1, 16}
Single build: 1 754 s (~29.2 min), ann_groups=1

Recall@10 vs Search-List-Size Sweep

`search_list_size`	Recall@10	QPS (conc=1)	QPS (conc=16)	p50 (ms, c=1)	p99 (ms, c=1)	p99 (ms, c=16)
20	0.4617	42.7	55.1	~23.4	26.6	315.1
50	0.6600	39.9	50.1	~25.0	31.5	336.3
100	0.7993	35.5	43.5	~28.2	33.2	388.1
200	0.8963	29.0	34.6	~34.5	41.3	486.7
400	0.9583	21.0	24.2	~47.6	55.1	692.8

… table-level integration

Introduce `rust/workspace/diskann-clickhouse` providing a C ABI over the DiskANN library: index build, in-memory and on-disk search, padded SIMD queries, and the FFI header consumed by the C++ side.

Group-based index with copy-on-write manager, plan optimization and per-part distance dispatch, ProfileEvents/metric kernel, gtests and `0_stateless` tests (04102-04111).

clickhouse-gh · 2026-04-28T15:26:00Z

Workflow [PR], commit [e26ebee]

Summary: ❌

job_name	test_name	status	info
Style check		FAIL
	whitespace_check	FAIL	cidb
	cpp	FAIL	cidb
	various	FAIL	cidb
Fast test		FAIL
	Build ClickHouse	FAIL
Build (arm_tidy)		FAIL
	Build ClickHouse	FAIL	cidb
Docs check		DROPPED
Fast test (arm_darwin)		DROPPED
Build (amd_debug)		DROPPED
Build (amd_asan_ubsan)		DROPPED
Build (amd_tsan)		DROPPED
Build (amd_msan)		DROPPED
Build (amd_binary)		DROPPED

AI Review

Summary

This PR adds an experimental table-level ANN index backed by DiskANN, including DDL, planner integration, background build/search lifecycle, and tests. I did not find high-confidence correctness, safety, or compatibility problems that require changes before merge.

ClickHouse Rules

Item	Status	Notes
Deletion logging	✅
Serialization versioning	✅
Core-area scrutiny	✅
No test removal	✅
Experimental gate	✅
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅
No large/binary files	✅

Final Verdict

Status: ✅ Approve

rschu1ze · 2026-04-29T01:30:42Z

@fastio Thanks for this large PR. May I ask what was your motivation to make the index per-table and not per-part?

fastio · 2026-04-29T03:26:23Z

@fastio Thanks for this large PR. May I ask what was your motivation to make the index per-table and not per-part?

Hi @rschu1ze, Thanks for the question — happy to share the reasoning.

First, The main reason is that ANN search is a global top-K ranking problem, not a local part-pruning predicate. ORDER BY distance LIMIT K requires a global nearest-neighbor order across all active parts, so a per-part index would still need fan-out searches and a global merge — with no principled way to decide how many candidates to over-fetch from each part (too few collapses recall, too many makes the per-part index pointless). That is why the ANN index is modeled as a table-level search structure, while part coverage remains an internal lifecycle concern.

Second, You are right, this PR is too large to review as a single mergeable change. My goal is to use it to discuss and validate the overall design direction first. If the direction makes sense, I will split it into a sequence of smaller PRs with clear boundaries, so each step can be reviewed independently.

Happy to dig into any specific aspect if useful.

CurtizJ · 2026-05-01T04:48:31Z

@fastio

Could you please explain how you synchronize data in the global index and in the main table and how you maintain consistency between them?

rschu1ze · 2026-05-03T18:07:43Z

@fastio We (@shankar-iyer and me) discussed the vector similarity index 2.0 last week. There are a few concerns with this PR.

The industry (examples: BigQuery, Turbopuffer, ElasticSearch, Starrocks) is moving towards SPANN aka. IVF vector indexes. This has a reason: Compared to DiskANN which came earlier and stores the Vamana graph on disk, SPANN performs only sequential reads, is simpler to implement, and has better trade-offs than DiskANN (based on the information in the SPANN paper). We therefore agreed to implement SPANN. There is actually already issue Add SPANN memory-disk hybrid vector similarity index #102146 for this.
Global indexes do not fit ClickHouse's architecture well.
- Problem 1: They must be kept in-sync with the underlying parts.
- Problem 2: They introduce a need for a stable row id which ClickHouse currently doesn't have.

First, The main reason is that ANN search is a global top-K ranking problem, not a local part-pruning predicate. ORDER BY distance LIMIT K requires a global nearest-neighbor order across all active parts, so a per-part index would still need fan-out searches and a global merge — with no principled way to decide how many candidates to over-fetch from each part (too few collapses recall, too many makes the per-part index pointless). That is why the ANN index is modeled as a table-level search structure, while part coverage remains an internal lifecycle concern.

This issue exists indeed but it is less worse than it seems - not only because parts grow quite large by default (150 GB). Fan-out because of per-part searches only reduces performance but it never reduces recall. Note that one could theoretically reduce the former by some new setting that would only consider N% of the parts for search.

Even if all my concerns are invalid, I'd prefer SingleStore's mechanism of building covering vector indexes in addition to the original per-segment indexes instead of replacing them as in this PR (see sec 4.2 here). This still doesn't fit the LSM architecture but it is a little less disruptive.

fastio · 2026-05-06T11:22:23Z

@fastio

Could you please explain how you synchronize data in the global index and in the main table and how you maintain consistency between them?

@CurtizJ, Apologies for the delayed response — I was on vacation for the past five days. Thanks for raising this — it's the right question to settle before the implementation lands.

TL;DR. The main table and the global index are kept eventually consistent. Query correctness does not depend on the index being caught up: it is preserved by partitioning active parts into indexed and unindexed sets at query time and rerank-merging the two paths.

Consistency model

For a query at time t, let

A_t = active parts visible to the main table snapshot
I_t = parts covered by the index snapshot

We compute:
Indexed = A_t ∩ I_t -> ANN search, approximate topK + distance
Unindexed = A_t \ I_t -> brute-force distance over the rows
result = rerank(Indexed ∪ Unindexed)

How synchronization happens

The write path is unchanged. There is zero intrusion into the MergeTree write path.
A background task periodically scans the main table's active parts, picks up parts not yet indexed, and builds index data for them.
A compaction mechanism rebuilds and merges existing index data.

Happy to dig deeper into any of these.

fastio · 2026-05-06T12:04:29Z

@fastio We (@shankar-iyer and me) discussed the vector similarity index 2.0 last week. There are a few concerns with this PR.

The industry (examples: BigQuery, Turbopuffer, ElasticSearch, Starrocks) is moving towards SPANN aka. IVF vector indexes. This has a reason: Compared to DiskANN which came earlier and stores the Vamana graph on disk, SPANN performs only sequential reads, is simpler to implement, and has better trade-offs than DiskANN (based on the information in the SPANN paper). We therefore agreed to implement SPANN. There is actually already issue Add SPANN memory-disk hybrid vector similarity index #102146 for this.

Global indexes do not fit ClickHouse's architecture well.

Problem 1: They must be kept in-sync with the underlying parts.

Problem 2: They introduce a need for a stable row id which ClickHouse currently doesn't have.

First, The main reason is that ANN search is a global top-K ranking problem, not a local part-pruning predicate. ORDER BY distance LIMIT K requires a global nearest-neighbor order across all active parts, so a per-part index would still need fan-out searches and a global merge — with no principled way to decide how many candidates to over-fetch from each part (too few collapses recall, too many makes the per-part index pointless). That is why the ANN index is modeled as a table-level search structure, while part coverage remains an internal lifecycle concern.

This issue exists indeed but it is less worse than it seems - not only because parts grow quite large by default (150 GB). Fan-out because of per-part searches only reduces performance but it never reduces recall. Note that one could theoretically reduce the former by some new setting that would only consider N% of the parts for search.

Even if all my concerns are invalid, I'd prefer SingleStore's mechanism of building covering vector indexes in addition to the original per-segment indexes instead of replacing them as in this PR (see sec 4.2 here). This still doesn't fit the LSM architecture but it is a little less disruptive.

@rschu1ze, Thank you for the very helpful feedback.

On DiskANN: the current PR is a skeleton in which the ANN algorithm is pluggable behind IMaterializedIndexAlgorithm / MaterializedIndexAlgorithmFactory, so swapping in SPANN later should be straightforward — the DiskANN parts here are placeholders for evaluation rather than a hard commitment to the algorithm.

On the topology: the SingleStore-style approach of building a cross-part covering index in addition to per-part indexes (sec. 4.2) is a direction well worth thinking through.

fastio added 3 commits April 15, 2026 22:04

Add DiskANN-based vector index infrastructure with Rust FFI layer and…

6602b20

… table-level integration

Add DiskANN Rust FFI wrapper for the ANN index backend

e996906

Introduce `rust/workspace/diskann-clickhouse` providing a C ABI over the DiskANN library: index build, in-memory and on-disk search, padded SIMD queries, and the FFI header consumed by the C++ side.

Add table-level ANN index framework with DiskANN backend

899b664

Group-based index with copy-on-write manager, plan optimization and per-part distance dispatch, ProfileEvents/metric kernel, gtests and `0_stateless` tests (04102-04111).

fastio marked this pull request as draft April 28, 2026 12:58

fastio changed the title ~~Add table-level ANN index with DiskANN backend~~ [WIP] Add table-level ANN index with DiskANN backend Apr 28, 2026

Merge branch 'master' into feat-knn-step-4

e26ebee

alexey-milovidov added the can be tested Allows running workflows for external contributors label Apr 28, 2026

clickhouse-gh Bot added pr-experimental Experimental Feature submodule changed At least one submodule changed in this PR. labels Apr 28, 2026

shankar-iyer mentioned this pull request May 5, 2026

RFC: Vector Search Improvements and vector similarity index 2.0 #104122

Open

Copilot AI mentioned this pull request May 6, 2026

[WIP] Update vector search improvements and vector similarity index #104246

Closed

Claude AI mentioned this pull request May 6, 2026

[WIP] Update vector search improvements and similarity index 2.0 #104247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add table-level ANN index with DiskANN backend#103675

[WIP] Add table-level ANN index with DiskANN backend#103675
fastio wants to merge 4 commits intoClickHouse:masterfrom
fastio:feat-knn-step-4

fastio commented Apr 28, 2026 •

edited by rschu1ze

Loading

Uh oh!

clickhouse-gh Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

rschu1ze commented Apr 29, 2026

Uh oh!

fastio commented Apr 29, 2026 •

edited

Loading

Uh oh!

CurtizJ commented May 1, 2026

Uh oh!

rschu1ze commented May 3, 2026 •

edited

Loading

Uh oh!

fastio commented May 6, 2026

Uh oh!

fastio commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fastio commented Apr 28, 2026 • edited by rschu1ze Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

TODO

SIFT-1M ANN Benchmark Results

Recall@10 vs Search-List-Size Sweep

GIST-1M ANN Benchmark Results

Recall@10 vs Search-List-Size Sweep

Uh oh!

clickhouse-gh Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

ClickHouse Rules

Final Verdict

Uh oh!

rschu1ze commented Apr 29, 2026

Uh oh!

fastio commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CurtizJ commented May 1, 2026

Uh oh!

rschu1ze commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fastio commented May 6, 2026

Uh oh!

fastio commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fastio commented Apr 28, 2026 •

edited by rschu1ze

Loading

clickhouse-gh Bot commented Apr 28, 2026 •

edited

Loading

fastio commented Apr 29, 2026 •

edited

Loading

rschu1ze commented May 3, 2026 •

edited

Loading