diff --git a/examples/text_embedding/README.md b/examples/text_embedding/README.md
index 1e999882..2dd1dbb8 100644
--- a/examples/text_embedding/README.md
+++ b/examples/text_embedding/README.md
@@ -4,20 +4,20 @@
In this example, we will build index flow from text embedding from local markdown files, and query the index.
-We appreicate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
+We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
-## Steps:
+## Steps
🌱 A detailed step by step tutorial can be found here: [Get Started Documentation](https://cocoindex.io/docs/getting_started/quickstart)
-### Indexing Flow:
+### Indexing Flow
-1. We will ingest from a list of local files.
-2. For each file, perform chunking (Recursive Split) and then embeddings.
+1. We will ingest a list of local files.
+2. For each file, perform chunking (recursively split) and then embedding.
3. We will save the embeddings and the metadata in Postgres with PGVector.
-### Query:
-We will match against user-provided text by a SQL query, reusing the embedding operation in the indexing flow.
+### Query
+We will match against user-provided text by a SQL query, and reuse the embedding operation in the indexing flow.
## Prerequisite
diff --git a/examples/text_embedding_qdrant/README.md b/examples/text_embedding_qdrant/README.md
index 5e2ea059..3f91dc95 100644
--- a/examples/text_embedding_qdrant/README.md
+++ b/examples/text_embedding_qdrant/README.md
@@ -1,69 +1,87 @@
-## Description
+# Build text embedding and semantic search 🔍 with Qdrant
+
+[](https://github.com/cocoindex-io/cocoindex)
+
+CocoIndex supports Qdrant natively - [documentation](https://cocoindex.io/docs/ops/storages#qdrant). In this example, we will build index flow from text embedding from local markdown files, and query the index. We will use **Qdrant** as the vector database.
+
+We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
+
+
+
+## Steps
+### Indexing Flow
+
+
+1. We will ingest a list of local files.
+2. For each file, perform chunking (recursively split) and then embedding.
+3. We will save the embeddings and the metadata in Postgres with PGVector.
+
+### Query
+We use Qdrant client to query the index, and reuse the embedding operation in the indexing flow.
-Example to build a vector index in Qdrant based on local files.
## Pre-requisites
-- [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
+- [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. Although the target store is Qdrant, CocoIndex uses Postgress to track the data lineage for incremental processing.
- Run Qdrant.
-```bash
-docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant
-```
+ ```bash
+ docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant
+ ```
- [Create a collection](https://qdrant.tech/documentation/concepts/vectors/#named-vectors) to export the embeddings to.
-```bash
-curl -X PUT \
- 'http://localhost:6333/collections/cocoindex' \
- --header 'Content-Type: application/json' \
- --data-raw '{
- "vectors": {
- "text_embedding": {
- "size": 384,
- "distance": "Cosine"
- }
- }
-}'
-```
-
-You can view the collections and data with the Qdrant dashboard at .
+ ```bash
+ curl -X PUT \
+ 'http://localhost:6333/collections/cocoindex' \
+ --header 'Content-Type: application/json' \
+ --data-raw '{
+ "vectors": {
+ "text_embedding": {
+ "size": 384,
+ "distance": "Cosine"
+ }
+ }
+ }'
+ ```
+
+ You can view the collections and data with the Qdrant dashboard at .
## Run
-Install dependencies:
+- Install dependencies:
-```bash
-pip install -e .
-```
+ ```bash
+ pip install -e .
+ ```
-Setup:
+- Setup:
-```bash
-python main.py cocoindex setup
-```
+ ```bash
+ python main.py cocoindex setup
+ ```
-Update index:
+- Update index:
-```bash
-python main.py cocoindex update
-```
+ ```bash
+ python main.py cocoindex update
+ ```
-Run:
+- Run:
-```bash
-python main.py
-```
+ ```bash
+ python main.py
+ ```
## CocoInsight
-
-CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9).
-
-Run CocoInsight to understand your RAG data pipeline:
+I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline.
+It just connects to your local CocoIndex server, with Zero pipeline data retention. Run following command to start CocoInsight:
```bash
python main.py cocoindex server -ci
```
-Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
+Open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
+
+
diff --git a/examples/text_embedding_qdrant/main.py b/examples/text_embedding_qdrant/main.py
index 57f27a45..b2892c43 100644
--- a/examples/text_embedding_qdrant/main.py
+++ b/examples/text_embedding_qdrant/main.py
@@ -1,21 +1,26 @@
from dotenv import load_dotenv
+from qdrant_client import QdrantClient
+from qdrant_client.http.models import Filter, FieldCondition, MatchValue
import cocoindex
+# Define Qdrant connection constants
+QDRANT_GRPC_URL = "http://localhost:6334"
+QDRANT_COLLECTION = "cocoindex"
-def text_to_embedding(text: cocoindex.DataSlice) -> cocoindex.DataSlice:
+
+@cocoindex.transform_flow()
+def text_to_embedding(text: cocoindex.DataSlice[str]) -> cocoindex.DataSlice[list[float]]:
"""
Embed the text using a SentenceTransformer model.
This is a shared logic between indexing and querying, so extract it as a function.
"""
return text.transform(
cocoindex.functions.SentenceTransformerEmbed(
- model="sentence-transformers/all-MiniLM-L6-v2"
- )
- )
+ model="sentence-transformers/all-MiniLM-L6-v2"))
-@cocoindex.flow_def(name="TextEmbedding")
+@cocoindex.flow_def(name="TextEmbeddingWithQdrant")
def text_embedding_flow(
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
):
@@ -50,35 +55,39 @@ def text_embedding_flow(
doc_embeddings.export(
"doc_embeddings",
cocoindex.storages.Qdrant(
- collection_name="cocoindex", grpc_url="http://localhost:6334/"
+ collection_name=QDRANT_COLLECTION, grpc_url=QDRANT_GRPC_URL
),
primary_key_fields=["id"],
setup_by_user=True,
)
-query_handler = cocoindex.query.SimpleSemanticsQueryHandler(
- name="SemanticsSearch",
- flow=text_embedding_flow,
- target_name="doc_embeddings",
- query_transform_flow=text_to_embedding,
- default_similarity_metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
-)
-
-
@cocoindex.main_fn()
def _run():
+ # Initialize Qdrant client
+ client = QdrantClient(url=QDRANT_GRPC_URL, prefer_grpc=True)
+
# Run queries in a loop to demonstrate the query capabilities.
while True:
try:
query = input("Enter search query (or Enter to quit): ")
if query == "":
break
- results, _ = query_handler.search(query, 10, "text_embedding")
+
+ # Get the embedding for the query
+ query_embedding = text_to_embedding.eval(query)
+
+ search_results = client.search(
+ collection_name=QDRANT_COLLECTION,
+ query_vector=("text_embedding", query_embedding),
+ limit=10
+ )
print("\nSearch results:")
- for result in results:
- print(f"[{result.score:.3f}] {result.data['filename']}")
- print(f" {result.data['text']}")
+ for result in search_results:
+ score = result.score
+ payload = result.payload
+ print(f"[{score:.3f}] {payload['filename']}")
+ print(f" {payload['text']}")
print("---")
print()
except KeyboardInterrupt:
diff --git a/examples/text_embedding_qdrant/pyproject.toml b/examples/text_embedding_qdrant/pyproject.toml
index 25b2663c..70454200 100644
--- a/examples/text_embedding_qdrant/pyproject.toml
+++ b/examples/text_embedding_qdrant/pyproject.toml
@@ -3,7 +3,7 @@ name = "text-embedding-qdrant"
version = "0.1.0"
description = "Simple example for cocoindex: build embedding index based on local text files."
requires-python = ">=3.10"
-dependencies = ["cocoindex>=0.1.39", "python-dotenv>=1.0.1"]
+dependencies = ["cocoindex>=0.1.39", "python-dotenv>=1.0.1", "qdrant-client>=1.6.0"]
[tool.setuptools]
packages = []