# Vector Database Tutorial with Pydapter's Qdrant Adapters

This tutorial demonstrates how to use pydapter's Qdrant adapters to seamlessly
work with vector embeddings for semantic search and similarity-based retrieval.
We'll cover both synchronous and asynchronous implementations.


**What is Vector Database?**

Vector databases are specialized storage systems designed for high-dimensional
vector data (embeddings) that enable efficient similarity search. They're
crucial for:

- Semantic search
- Recommendation systems
- Image similarity
- Document retrieval
- Natural language understanding

Qdrant is a powerful vector database with extensive filtering capabilities,
making it perfect for applications that need both semantic similarity and
metadata filtering.

### 1. Prerequisites

run the following in your terminal

```sh
uv venv
uv pip install "pydapter[qdrant]" sentence_transformers
```

Then compose up the Qdrant docker container:

```sh
cd notebooks/using_qdrant
docker compose up
```

### 2. A simple example

In [6]:
from pydantic import BaseModel, Field
from typing import List
import uuid
from sentence_transformers import SentenceTransformer
from pydapter.extras.qdrant_ import QdrantAdapter
from datetime import datetime

# Load a sentence transformer model to generate embeddings
model = SentenceTransformer("all-MiniLM-L6-v2")  # 384-dimensional embeddings

qdrant_url = "http://localhost:6333"


class Document(BaseModel):
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    created_at: str = Field(default_factory=lambda: datetime.now().isoformat())
    title: str
    content: str
    tags: List[str] = Field(default_factory=list)
    embedding: List[float] = Field(default_factory=list)

    def generate_embedding(self):
        """Generate embedding from the document content"""
        self.embedding = model.encode(self.content).tolist()
        return self

In [7]:
# Store documents in Qdrant
def store_documents(documents: list[Document]):
    print(f"Storing {len(documents)} documents in Qdrant...")

    result = QdrantAdapter.to_obj(
        documents,
        collection="documents",
        url=qdrant_url,
        many=True,
    )
    print(f"Storage result: {result}")

In [8]:
# Search for similar documents
def search_documents(query_text: str, top_k=2):
    print(f"Searching for documents similar to: '{query_text}'")

    # Generate embedding for the query
    query_embedding = model.encode(query_text).tolist()

    # Search in Qdrant using the QdrantAdapter
    results = QdrantAdapter.from_obj(
        Document,
        {
            "collection": "documents",
            "query_vector": query_embedding,
            "top_k": top_k,
            "url": qdrant_url,
        },
        many=True,
    )

    print(f"Found {len(results)} similar documents:")
    for i, doc in enumerate(results):
        print(f"{i+1}. {doc.title}")
        print(f"   Content: {doc.content}")
        print(f"   Tags: {', '.join(doc.tags)}")
        print()

    return results

In [9]:
# create some sample documents with `title`, `content`, and `tags`

docs = [
    {
        "title": "Introduction to Machine Learning",
        "content": "Machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to learn from data.",
        "tags": ["ML", "AI", "Data Science"],
    },
    {
        "title": "Deep Learning Fundamentals",
        "content": "Deep learning is a subset of machine learning that uses neural networks with many layers to analyze various factors of data.",
        "tags": ["Deep Learning", "Neural Networks", "AI"],
    },
    {
        "title": "Natural Language Processing",
        "content": "NLP combines computational linguistics and AI to enable computers to understand, interpret, and generate human language.",
        "tags": ["NLP", "AI", "Linguistics"],
    },
    {
        "title": "Computer Vision",
        "content": "Computer vision is a field of AI that trains computers to interpret and understand visual data from the world around us.",
        "tags": ["Computer Vision", "AI", "Image Processing"],
    },
]

In [10]:
def main():

    sample_docs = [Document(**doc) for doc in docs]

    # Generate embeddings for each document
    for doc in sample_docs:
        doc.generate_embedding()

    store_documents(sample_docs)
    search_documents("What is machine learning?", top_k=2)
    search_documents("Explain deep learning", top_k=2)
    search_documents("What is NLP?", top_k=2)


main()

Storing 4 documents in Qdrant...
Storage result: {'upserted_count': 4}
Searching for documents similar to: 'What is machine learning?'
Found 2 similar documents:
1. Introduction to Machine Learning
   Content: Machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to learn from data.
   Tags: ML, AI, Data Science

2. Deep Learning Fundamentals
   Content: Deep learning is a subset of machine learning that uses neural networks with many layers to analyze various factors of data.
   Tags: Deep Learning, Neural Networks, AI

Searching for documents similar to: 'Explain deep learning'
Found 2 similar documents:
1. Deep Learning Fundamentals
   Content: Deep learning is a subset of machine learning that uses neural networks with many layers to analyze various factors of data.
   Tags: Deep Learning, Neural Networks, AI

2. Introduction to Machine Learning
   Content: Machine learning is a field of artificial intelligence 