Skip to content

Conversation

@shankar-iyer
Copy link
Member

Summary

Basic, starter dataset for vector search in ClickHouse. More to follow, I will update the parent page after all docs have been pushed.

Checklist

@shankar-iyer shankar-iyer requested a review from a team as a code owner August 20, 2025 09:13
@vercel
Copy link

vercel bot commented Aug 20, 2025

@shankar-iyer is attempting to deploy a commit to the ClickHouse Team on Vercel.

A member of the Team first needs to authorize it.

@shankar-iyer
Copy link
Member Author

@rschu1ze

Copy link
Member

@Blargian Blargian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Blargian
Copy link
Member

@shankar-iyer we have an examples repository as well, just an idea to put the Python code there and we can cross link to it from this page. I'll add support for code blocks to display code from external files in the near future.

keywords: ['semantic search', 'vector similarity', 'approximate nearest neighbours', 'embeddings']
---

The [dbpedia dataset](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M) contains 1 million articles from Wikipedia and their vector embeddings generated using the `text-embedding-3-large` model from OpenAI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

```sql
ALTER TABLE dbpedia ADD INDEX vector_index vector TYPE vector_similarity('hnsw', 'cosineDistance', 1536, 'bf16', 64, 512);

ALTER TABLE dbpedia MATERIALIZE INDEX vector_index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add SETTINGS mutations_sync = 2 to make sure the index is materialized synchrously?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@rschu1ze
Copy link
Member

@shankar-iyer Very nice example!

@rschu1ze
Copy link
Member

@shankar-iyer Once this PR is in, please also link the new page from the ANN docs page, similar to this PR.

@vercel
Copy link

vercel bot commented Aug 20, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
clickhouse-docs Ready Ready Preview Aug 20, 2025 3:52pm
2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
clickhouse-docs-ru Ignored Ignored Preview Aug 20, 2025 3:52pm
clickhouse-docs-zh Ignored Ignored Preview Aug 20, 2025 3:52pm

@Blargian Blargian merged commit 1b48773 into ClickHouse:main Aug 20, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants