Github reference: https://github.com/DataTalksClub/llm-zoomcamp/blob/main/02-vector-search/sematic_search.ipynb

### Vector Search

Vector search transforms data into high-dimensional vectors (numerical representations) that capture the meaning or features of the content. Instead of looking for exact words, it finds items that are similar in context or semantics.
For example:
- Traditional search: Looks for the word “dog.”

- Vector search: Finds “dog,” “canine,” “puppy,” or even an image of a dog—because they’re semantically related.


#### Qdrant
Qdrant is an open-source vector search engine, a dedicated solution built in Rust for scalable vector search.

**Setup**

Qdrant is fully open-source, which means you can run it in multiple ways depending on your needs.
You can self-host it on your own infrastructure, deploy it on Kubernetes, or run it in managed Cloud.

We're going to run a Qdrant instance in a Docker container.

**Docker**
All you need to do is pull the image and start the container using the following commands:

docker pull qdrant/qdrant


docker run -p 6333:6333 -p 6334:6334 \
   -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
   qdrant/qdrant

The second line in the docker run command mounts local storage to keep your data persistent. So even if you restart or delete the container, your data will still be stored locally.

6333 – REST API port

6334 – gRPC API port 


To help you explore your data visually, Qdrant provides a built-in Web UI, available in both Qdrant Cloud and local instances. You can use it to inspect collections, check system health, and even run simple queries.

When you're running Qdrant in Docker, the Web UI is available at http://localhost:6333/dashboard

#### Installing Required Libraries
In the environment you created specifically for this course, we’ll install:

- The **qdrant-client** package. We'll be using the Python client, but Qdrant also offers official clients for JavaScript/TypeScript, Go, and Rust, so you can choose the best fit for your own projects.

- The **fastembed** package - an optimized embedding (data vectorization) solution designed specifically for Qdrant. Make sure you install version >= 1.14.2 to use the local inference with Qdrant.

In [3]:
!python -m pip install -q "qdrant-client[fastembed]>=1.14.2"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


#### Step 1: Import Required Libraries & Connect to Qdrant
Now let’s import the necessary modules from the qdrant-client package.

The QdrantClient class allows us to establish a connection to the Qdrant service,
while the models module provides definitions for various configurations and parameters we’ll use.

In [14]:
from qdrant_client import QdrantClient, models

In [16]:
client = QdrantClient("http://localhost:6333") #connecting to local Qdrant instance

#### Step 2: Study the Dataset
To build a working vector search solution (and, more generally, to understand if/when/how it’s needed), it's good to study the dataset and figure out the nature and structure of the data we’re working with, for example:

**modality** — is it text, images, videos, a combination?
**specifics** — if it’s text: language used, how big are the text pieces, are there any special characters, etc.
It will help us define:

the right data "schema" (what to vectorize, what to store as metadata, etc);
the right embedding model (the best fit based on the domain, precision & resource requirements).

In [6]:
import json
with open('documents-llm.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

documents = []

for course_dict in docs_raw:
    for doc in course_dict['documents']:
        doc['course'] = course_dict['course']
        documents.append(doc)
    

In [7]:
documents[0]

{'text': 'Yes, but if you want to receive a certificate, you need to submit your project while we’re still accepting submissions.',
 'section': 'General course-related questions',
 'question': 'I just discovered the course. Can I still join?',
 'course': 'llm-zoomcamp'}

We’re building a Q&A retrieval-augmented generation (RAG) system,
it makes sense to store the text field (answers) as embeddings, and use vector search to find the most relevant answer to a given question query.

We could store the course and section fields as metadata.
This way, we can filter search results when asking questions related to a specific course or a specific section.

#### Step 3: Choosing the Embedding Model with FastEmbed

FastEmbed is an optimized embedding solution designed specifically for Qdrant. It delivers low-latency, CPU-friendly embedding generation, eliminating the need for heavy frameworks like PyTorch or TensorFlow

FastEmbed supports:

-Dense embeddings for text and images (the most common type in vector search, ones we're going to use today)

-Sparse embeddings (e.g., BM25 and sparse neural embeddings)

-Multivector embeddings (e.g., ColPali and ColBERT, late interaction models)

-Rerankers

All of these can be directly used in Qdrant (as Qdrant supports dense, sparse & multivectors along with hybrid search).
FastEmbed’s integration with Qdrant allows you to directly pass text or images to the Qdrant client for embedding.

Let’s select an embedding model to use for our course question answers, stored in text fields, from the options supported by FastEmbed.


In [8]:
from fastembed import TextEmbedding
TextEmbedding.list_supported_models()

[{'model': 'BAAI/bge-base-en',
  'sources': {'hf': 'Qdrant/fast-bge-base-en',
   'url': 'https://storage.googleapis.com/qdrant-fastembed/fast-bge-base-en.tar.gz',
   '_deprecated_tar_struct': True},
  'model_file': 'model_optimized.onnx',
  'description': 'Text embeddings, Unimodal (text), English, 512 input tokens truncation, Prefixes for queries/documents: necessary, 2023 year.',
  'license': 'mit',
  'size_in_GB': 0.42,
  'additional_files': [],
  'dim': 768,
  'tasks': {}},
 {'model': 'BAAI/bge-base-en-v1.5',
  'sources': {'hf': 'qdrant/bge-base-en-v1.5-onnx-q',
   'url': 'https://storage.googleapis.com/qdrant-fastembed/fast-bge-base-en-v1.5.tar.gz',
   '_deprecated_tar_struct': True},
  'model_file': 'model_optimized.onnx',
  'description': 'Text embeddings, Unimodal (text), English, 512 input tokens truncation, Prefixes for queries/documents: not so necessary, 2023 year.',
  'license': 'mit',
  'size_in_GB': 0.21,
  'additional_files': [],
  'dim': 768,
  'tasks': {}},
 {'model':

It makes sense to choose a model that produces small-to-moderate-sized embeddings (e.g., 512 dimensions), so we don’t overuse resources in our simple setup.

In [9]:
EMBEDDING_DIMENSIONALITY = 512

for model in TextEmbedding.list_supported_models():
    if model["dim"] == EMBEDDING_DIMENSIONALITY:
        print(json.dumps(model, indent=2))

{
  "model": "BAAI/bge-small-zh-v1.5",
  "sources": {
    "hf": "Qdrant/bge-small-zh-v1.5",
    "url": "https://storage.googleapis.com/qdrant-fastembed/fast-bge-small-zh-v1.5.tar.gz",
    "_deprecated_tar_struct": true
  },
  "model_file": "model_optimized.onnx",
  "description": "Text embeddings, Unimodal (text), Chinese, 512 input tokens truncation, Prefixes for queries/documents: not so necessary, 2023 year.",
  "license": "mit",
  "size_in_GB": 0.09,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "Qdrant/clip-ViT-B-32-text",
  "sources": {
    "hf": "Qdrant/clip-ViT-B-32-text",
    "url": null,
    "_deprecated_tar_struct": false
  },
  "model_file": "model.onnx",
  "description": "Text embeddings, Multimodal (text&image), English, 77 input tokens truncation, Prefixes for queries/documents: not necessary, 2021 year",
  "license": "mit",
  "size_in_GB": 0.25,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "jinaai/jina-embeddings-v2-small-e

We need an embedding model suitable for English text.

It also makes sense to select a unimodal model, since we’re not including images in our search, and specifically tailored solutions are usually better than universal ones.

It seems like jina-embedding-small-en is a good choice!



In [10]:
model_handle = "jinaai/jina-embeddings-v2-small-en"

Now we’re ready to configure and use Qdrant for semantic search. To fully understand what’s happening, here’s a quick overview of Qdrant’s core terminology:

Points are the central entity Qdrant works with.

* A point is a record consisting of an ID, a vector, and an optional payload.

* A collection is a named set of points (i.e., vectors with optional payloads) that you can search within.

Think of it as the container for your vector search solution, a single business problem solved.
Qdrant supports different types of vectors to enable different modes of data exploration and search (dense, sparse, multivectors, and named vectors).

In this example, we’ll use the most common type, dense vectors.

Embeddings capture the semantic essence of the data, while the payload holds structured metadata.

This metadata becomes especially useful when applying filters or sorting during search. Qdrant's payloads can hold structured data like booleans, keywords, geo-locations, arrays, and nested objects.

#### Step 4: Create a Collection
When creating a collection, we need to specify:

**Name**: A unique identifier for the collection.

**Vector Configuration**:

**Size**: The dimensionality of the vectors.

**Distance Metric**: The method used to measure similarity between vectors.

There are additional parameters you can explore in this documentation--> https://qdrant.tech/documentation/concepts/collections/#create-a-collection 

In [17]:
# Define the collection name
collection_name = "zoomcamp-rag"

# Create the collection with specified vector parameters
client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=EMBEDDING_DIMENSIONALITY,  # Dimensionality of the vectors
        distance=models.Distance.COSINE  # Distance metric for similarity search
    )
)

True

#### Step 5: Create, Embed & Insert Points into the Collection
**Points**(https://qdrant.tech/documentation/concepts/points/#points) are the core data entities in Qdrant. Each point consists of:

* ID. A unique identifier. Qdrant supports both 64-bit unsigned integers and UUIDs.
* Vector. The embedding that represents the data point in vector space.
* Payload (optional). Additional metadata as key-value pairs.

In [18]:
points = []
id = 0

for doc in documents:
    point = models.PointStruct(
        id=id,
        vector=models.Document(text=doc['text'], model=model_handle), #embed text locally with "jinaai/jina-embeddings-v2-small-en" from FastEmbed
        payload={
            "text": doc['text'],
            "section": doc['section'],
            "course": doc['course']
        } #save all needed metadata fields
    )
    points.append(point)

    id += 1

In [19]:
client.upsert(
    collection_name=collection_name,
    points=points
)

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

tokenizer_config.json:   0%|          | 0.00/367 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

onnx/model.onnx:   0%|          | 0.00/130M [00:00<?, ?B/s]

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

Let’s explore the uploaded data in the Qdrant Web UI at http://localhost:6333/dashboard to study semantic similarity visually.

For example, using the Visualize tab in the zoomcamp-rag collection, we can view all answers to the course questions and see how they group together by meaning.

#### Step 6: Running a Similarity Search

In [20]:
# Defining a function

def search(query, limit=1):

    results = client.query_points(
        collection_name=collection_name,
        query=models.Document( #embed the query text locally with "jinaai/jina-embeddings-v2-small-en"
            text=query,
            model=model_handle 
        ),
        limit=limit, # top closest matches
        with_payload=True #to get metadata in the results
    )

    return results

In [23]:
print(search("What if I submit homeworks late?").points[0].payload['text'])

You don't need it. You're accepted. You can also just start learning and submitting homework (while the form is Open) without registering. It is not checked against any registered list. Registration is just to gauge interest before the start date.


#### Step 7: Running a Similarity Search with Filters

Creating an index requires additional computational resources and memory, so choosing fields to be indexed is essential. Qdrant does not make this choice but grants it to the user.

Qdrant also supports other filter types such as should, must_not, range, and more. For a full overview, check Filtering Guide -->https://qdrant.tech/articles/vector-search-filtering/

To enable efficient filtering, we need to turn on indexing of payload fields.-->https://qdrant.tech/documentation/concepts/indexing/#payload-index

In [30]:

client.create_payload_index(
    collection_name=collection_name,
    field_name="course",
    field_schema="keyword" # exact matching on string metadata fields
)

UpdateResult(operation_id=4, status=<UpdateStatus.COMPLETED: 'completed'>)

In [28]:
### updating the function

def search_in_course(query, course, limit=1):

    results = client.query_points(
        collection_name=collection_name,
        query=models.Document( #embed the query text locally with "jinaai/jina-embeddings-v2-small-en"
            text=query,
            model=model_handle
        ),
        query_filter=models.Filter( # filter by course name
            must=[
                models.FieldCondition(
                    key="course",
                    match=models.MatchValue(value=course)
                )
            ]
        ),
        limit=limit, # top closest matches
        with_payload=True #to get metadata in the results
    )

    return results

In [29]:
print(search_in_course("What if I submit homeworks late?", "llm-zoomcamp").points[0].payload['text'])


You don't need it. You're accepted. You can also just start learning and submitting homework (while the form is Open) without registering. It is not checked against any registered list. Registration is just to gauge interest before the start date.
