# Vector Search with Qdrant

## Vector Search

Vector search is the backbone of the modern internet, whether you notice it or not. It powers recommendation engines, chatbots, AI agents, and even major search engines.

In simple terms, traditional keyword search works by matching exact words. This works well when you know the precise keywords present in the data. But what happens when there are no keywords? What if you're searching through images, audio, video or code, or even cross-modally?

Moreover, even in text-heavy documents, keyword search struggles to capture context and meaning. The same idea can be phrased in countless ways, so it is completely unfeasible to compare/search for using keyword-based methods.

Instead of relying on exact matches, vector search retrieves information based on semantic similarity measured numerically between vectorized data representations (embeddings). It recognizes patterns and relationships between concepts, enabling search systems to retrieve the most relevant content, even when the phrasing differs, terminology varies, or no explicit keywords exist.

## Qdrant

[Qdrant](https://qdrant.tech/) is an **open-source** vector search engine, a dedicated solution built in Rust for scalable vector search. 
If you're wondering why you might need a dedicated solution for vector search, we’ve addressed that in the article ["Built for Vector Search"](https://qdrant.tech/articles/dedicated-vector-search/).

To TLDR:
- To make production-level vector search at scale;
- To stay in sync with the latest trends and best practices;
- To fully use vector search capabilities (including those beyond simple similarity search).

In this notebook, we’ll give you a small sneak peek into semantic (vector) search with Qdrant and encourage you to play around & see if it fits your needs!

If you have any questions about vector search in Qdrant, feel free to reach out in our [Discord community](https://discord.gg/G7PQU6Cy).


## Step 0: Setup

Qdrant is fully open-source, which means you can run it in multiple ways depending on your needs.  
You can self-host it on your own infrastructure, deploy it on Kubernetes, or run it in managed Cloud.  

We're going to run a Qdrant instance in a Docker container.

### Docker

All you need to do is pull the image and start the container using the following commands:

```bash
docker pull qdrant/qdrant

docker run -p 6333:6333 -p 6334:6334 \
   -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
   qdrant/qdrant
```

The second line in the `docker run` command mounts local storage to keep your data persistent.
So even if you restart or delete the container, your data will still be stored locally.

- 6333 – REST API port
- 6334 – gRPC API port

To help you explore your data visually, Qdrant provides a built-in **Web UI**, available in both Qdrant Cloud and local instances.
You can use it to inspect collections, check system health, and even run simple queries.

When you're running Qdrant in Docker, the Web UI is available at http://localhost:6333/dashboard

### Installing Required Libraries

In the environment you created specifically for this course, we’ll install:

- The `qdrant-client` package. We'll be using the Python client, but Qdrant also offers official clients for JavaScript/TypeScript, Go, and Rust, so you can choose the best fit for your own projects.

- The `fastembed` package - an optimized embedding (data vectorization) solution designed specifically for Qdrant. Make sure you install version `>= 1.14.2` to use the **local inference** with Qdrant.

In [1]:
!python -m pip install -q "qdrant-client[fastembed]>=1.14.2"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Step 1: Import Required Libraries & Connect to Qdrant

Now let’s import the necessary modules from the `qdrant-client` package.

The `QdrantClient` class allows us to establish a connection to the Qdrant service,  
while the `models` module provides definitions for various configurations and parameters we’ll use.

In [2]:
from qdrant_client import QdrantClient, models

  from .autonotebook import tqdm as notebook_tqdm


Initialize the client

In [3]:
client = QdrantClient("http://localhost:6333") #connecting to local Qdrant instance

## Step 2: Study the Dataset

To build a working vector search solution (and, more generally, to understand if/when/how it’s needed), it's good to study the dataset and figure out the nature and structure of the data we’re working with, for example:

- modality — is it text, images, videos, a combination?  
- specifics — if it’s text: language used, how big are the text pieces, are there any special characters, etc.  

It will help us define:
- the right data "schema" (what to vectorize, what to store as metadata, etc);  
- the right embedding model (the best fit based on the domain, precision & resource requirements). 

We have a toy dataset provided for experimentation, let's check it out:


In [4]:
import requests

docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

In [None]:
documents_raw

Data already seems cleaned and chunked (i.e., divided into small pieces that embedding models can easily digest), so what's left is to define:

- which fields could be used for **semantic search** ;
- which fields should be stored as **metadata**, e.g. useable for filtering conditions; 

We have a dataset with three course types:  
`data-engineering-zoomcamp`, `machine-learning-zoomcamp`, and `mlops-zoomcamp`.  
Each course includes a collection of `question` and `text` (answer) pairs, along with the `section` the question refers to.

---

#### Which Fields Could Be Used for Semantic Search

Here we can observe semantic similarity in practice: some of the questions and answers don’t share many overlapping words,  
yet they clearly address the same topic. One asks about a topic, the other provides an answer.

For example:

**Question**:  
- *“I have **registered** for the Data Engineering Bootcamp. When can I expect to receive the confirmation email?”*

**Answer**:  
- *“You don't need it. You're accepted. You can also just start learning and submitting homework without registering. It is not checked against any **registered** list. Registration is just to gauge interest before the start date.”*

These two could be matched via the keyword **registered**,  
but a sentence like *“Not **registered** participants are not getting certification”* would also match that keyword, while having a different semantic meaning.

So, if we’re building a Q&A retrieval-augmented generation (RAG) system,  
it makes sense to store the `text` field (answers) as **embeddings**, and use vector search to find the most relevant answer to a given `question` query.

#### Which Fields Should Be Stored as Metadata

For example, we could store the `course` and `section` fields as metadata.  
This way, we can filter search results when asking questions related to a specific course or a specific section.


## Step 3: Choosing the Embedding Model with FastEmbed

Now that we know we're embedding small chunks of English text (course-related question and answer pairs), we can choose a suitable embedding model to convert this data into vectors.

The choice of an embedding model depends on many factors:
- The task, data modality, and data specifics;
- The trade-off between search precision and resource usage (larger embeddings require more storage and memory);
- The cost of inference (especially if you're using a third-party provider);
- etc

> The best way to select an embedding model is to **test and benchmark different options on your own data**.

In this notebook, we’re going to use [FastEmbed](https://github.com/qdrant/fastembed) as our embedding provider.

---

**FastEmbed** is an optimized embedding solution designed specifically for Qdrant. It delivers low-latency, CPU-friendly embedding generation, eliminating the need for heavy frameworks like PyTorch or TensorFlow. It uses quantized model weights and ONNX Runtime, making it significantly faster than traditional Sentence Transformers on CPU while maintaining competitive accuracy.

FastEmbed supports:
- **Dense embeddings** for text and images (*the most common type in vector search, ones we're going to use today*)
- **Sparse embeddings** (e.g., BM25 and sparse neural embeddings)  
- **Multivector embeddings** (e.g., ColPali and ColBERT, late interaction models)  
- **Rerankers**

All of these can be directly used in Qdrant (as **Qdrant supports dense, sparse & multivectors along with hybrid search**).  
FastEmbed’s integration with Qdrant allows you to directly pass text or images to the Qdrant client for embedding.

In this notebook, we’ll use FastEmbed for local inference with Qdrant.  
> Keep in mind your machine's resources when choosing an embedding model for local inference.


### FastEmbed for Textual Data

Let’s select an embedding model to use for our course question answers, stored in `text` fields, from the options supported by FastEmbed.

In [None]:
from fastembed import TextEmbedding
TextEmbedding.list_supported_models()

It makes sense to choose a model that produces small-to-moderate-sized embeddings (e.g., 512 dimensions), so we don’t overuse resources in our simple setup.


In [7]:
import json

EMBEDDING_DIMENSIONALITY = 512

for model in TextEmbedding.list_supported_models():
    if model["dim"] == EMBEDDING_DIMENSIONALITY:
        print(json.dumps(model, indent=2))

{
  "model": "BAAI/bge-small-zh-v1.5",
  "sources": {
    "hf": "Qdrant/bge-small-zh-v1.5",
    "url": "https://storage.googleapis.com/qdrant-fastembed/fast-bge-small-zh-v1.5.tar.gz",
    "_deprecated_tar_struct": true
  },
  "model_file": "model_optimized.onnx",
  "description": "Text embeddings, Unimodal (text), Chinese, 512 input tokens truncation, Prefixes for queries/documents: not so necessary, 2023 year.",
  "license": "mit",
  "size_in_GB": 0.09,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "Qdrant/clip-ViT-B-32-text",
  "sources": {
    "hf": "Qdrant/clip-ViT-B-32-text",
    "url": null,
    "_deprecated_tar_struct": false
  },
  "model_file": "model.onnx",
  "description": "Text embeddings, Multimodal (text&image), English, 77 input tokens truncation, Prefixes for queries/documents: not necessary, 2021 year",
  "license": "mit",
  "size_in_GB": 0.25,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "jinaai/jina-embeddings-v2-small-e

We need an embedding model suitable for **English text**. 

It also makes sense to select a **unimodal** model, since we’re not including images in our search, and specifically tailored solutions are usually better than universal ones.

It seems like `jina-embedding-small-en` is a good choice!

In [8]:
model_handle = "jinaai/jina-embeddings-v2-small-en"

Like most dense embedding models, `jina-embedding-small-en` was trained to measure semantic closeness using **cosine similarity**.  
You can find this information, for example, on the model’s [Hugging Face card](https://huggingface.co/jinaai/jina-embeddings-v2-small-en).

> The parameters of the chosen embedding model, including the output embedding dimensions and the semantic similarity (distance) metric, are required to configure semantic search in Qdrant.


Now we’re ready to configure and use Qdrant for semantic search.
To fully understand what’s happening, here’s a quick overview of Qdrant’s core terminology:

- **Points** are the central entity Qdrant works with.  
  A point is a record consisting of an **ID**, a **vector**, and an optional **payload**.
- A **collection** is a named set of points (i.e., vectors with optional payloads) that you can search within.  
  *Think of it as the container for your vector search solution, a single business problem solved.*

> Qdrant supports different types of vectors to enable different modes of data exploration and search (dense, sparse, multivectors, and named vectors).

In this example, we’ll use the most common type, **dense vectors**.

Embeddings capture the semantic essence of the data, while the **payload** holds structured metadata.  
This metadata becomes especially useful when applying filters or sorting during search. **Qdrant's payloads** can hold structured data like booleans, keywords, geo-locations, arrays, and nested objects.




## Step 4: Create a Collection

When creating a [collection](https://qdrant.tech/documentation/concepts/collections/), we need to specify:

*   Name: A unique identifier for the collection.
*   Vector Configuration:
    *   Size: The dimensionality of the vectors.
    *   Distance Metric: The method used to measure similarity between vectors.


There are additional parameters you can explore in our [documentation](https://qdrant.tech/documentation/concepts/collections/#create-a-collection). Moreover, you can configure other vector types in Qdrant beyond typical dense embeddings (f.e., for hybrid search). However, for this example, the simplest default configuration is sufficient.
    

In [9]:
# Define the collection name
collection_name = "zoomcamp-rag"

# Create the collection with specified vector parameters
client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=EMBEDDING_DIMENSIONALITY,  # Dimensionality of the vectors
        distance=models.Distance.COSINE  # Distance metric for similarity search
    )
)

True

## Step 5: Create, Embed & Insert Points into the Collection

[Points](https://qdrant.tech/documentation/concepts/points/#points) are the core data entities in Qdrant. Each point consists of:

1. **ID**. A unique identifier. Qdrant supports both 64-bit unsigned integers and UUIDs.  
2. **Vector**. The embedding that represents the data point in vector space.  
3. **Payload** *(optional)*. Additional metadata as key-value pairs.


In [10]:
points = []
id = 0

for course in documents_raw:
    for doc in course['documents']:

        point = models.PointStruct(
            id=id,
            vector=models.Document(text=doc['text'], model=model_handle), #embed text locally with "jinaai/jina-embeddings-v2-small-en" from FastEmbed
            payload={
                "text": doc['text'],
                "section": doc['section'],
                "course": course['course']
            } #save all needed metadata fields
        )
        points.append(point)

        id += 1

Now we’re going to embed and upload points to our collection.

First, FastEmbed will fetch&download the selected model (path defaults to `os.path.join(tempfile.gettempdir(), "fastembed_cache")`), and perform inference directly on your machine.  
Then, the generated points will be upserted into the collection, and the vector index will be built.


In [11]:
client.upsert(
    collection_name=collection_name,
    points=points
)

Fetching 5 files: 100%|██████████| 5/5 [00:01<00:00,  2.93it/s]


UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

The speed of upsert mainly depends on the time spent on local inference.  
To speed this up, you could run FastEmbed on GPUs or use a machine with more resources.

In addition to basic `upsert`, Qdrant supports **batch upsert** in both column- and record-oriented formats.

The Python client offers:
- Parallelization  
- Retries  
- Lazy batching  

These can be configured via parameters in the `upload_collection` and `upload_points` functions.  
For details, check the [documentation](https://qdrant.tech/documentation/concepts/points/#upload-points).


### Study Data Visually

Let’s explore the uploaded data in the Qdrant Web UI at [http://localhost:6333/dashboard](http://localhost:6333/dashboard) to study semantic similarity visually.

For example, using the `Visualize` tab in the `zoomcamp-rag` collection, we can view all answers to the course questions (948 points) and see how they group together by meaning, additionally coloured by the course type.  

To do that, run the following command:

```json
{
  "limit": 948,
  "color_by": {
    "payload": "course"
  }
}
```

This 2D representation is the result of dimensionality reduction applied to `jina-embeddings`.

## Step 6: Running a Similarity Search

Now, let’s find the most similar `text` vector in Qdrant to a given query embedding - the most relevant answer to a given question.

### How Similarity Search Works

1. Qdrant compares the query vector to stored vectors (based on a vector index) using the distance metric defined when creating the collection.

2. The closest matches are returned, ranked by similarity.

> Vector index is built for **approximate** nearest neighbor (ANN) search, making large-scale vector search feasible.

If you'd like to dive into our choice of vector index for vector search, check our article ["What is a vector database"](https://qdrant.tech/articles/what-is-a-vector-database/), or, for a more technical deep dive, our article on [Filterable Hierarchical Navigable Small World](https://qdrant.tech/articles/filtrable-hnsw/).


Let's define a search function:

In [12]:
def search(query, limit=1):

    results = client.query_points(
        collection_name=collection_name,
        query=models.Document( #embed the query text locally with "jinaai/jina-embeddings-v2-small-en"
            text=query,
            model=model_handle 
        ),
        limit=limit, # top closest matches
        with_payload=True #to get metadata in the results
    )

    return results

Now let’s pick a random question from the course data.  
As you remember, we didn’t upload the questions to Qdrant.


In [13]:
import random

course = random.choice(documents_raw)
course_piece = random.choice(course['documents'])
print(json.dumps(course_piece, indent=2))

{
  "text": "If you uploaded manually the fvh 2019 parquet files manually after downloading from https://d37ci6vzurychx.cloudfront.net/trip-data/fhv_tripdata_2019-*.parquet you may face errors regarding date types while loading the data in a landing table (say fhv_tripdata). Try to create an the external table with the schema defines as following and load each month in a loop.\n-----Correct load with schema defination----will not throw error----------------------\nCREATE OR REPLACE EXTERNAL TABLE `dw-bigquery-week-3.trips_data_all.external_tlc_fhv_trips_2019` (\ndispatching_base_num STRING,\npickup_datetime TIMESTAMP,\ndropoff_datetime TIMESTAMP,\nPUlocationID FLOAT64,\nDOlocationID FLOAT64,\nSR_Flag FLOAT64,\nAffiliated_base_number STRING\n)\nOPTIONS (\nformat = 'PARQUET',\nuris = ['gs://project id/fhv_2019_8.parquet']\n);\nCan Also USE  uris = ['gs://project id/fhv_2019_*.parquet'] (THIS WILL remove the need for the loop and can be done for all month in single RUN )\n\u2013 THANKYOU 

Let's see which answer we get:

In [14]:
result = search(course_piece['question'])

In [15]:
result

QueryResponse(points=[ScoredPoint(id=310, version=0, score=0.8827988, payload={'text': "If you uploaded manually the fvh 2019 parquet files manually after downloading from https://d37ci6vzurychx.cloudfront.net/trip-data/fhv_tripdata_2019-*.parquet you may face errors regarding date types while loading the data in a landing table (say fhv_tripdata). Try to create an the external table with the schema defines as following and load each month in a loop.\n-----Correct load with schema defination----will not throw error----------------------\nCREATE OR REPLACE EXTERNAL TABLE `dw-bigquery-week-3.trips_data_all.external_tlc_fhv_trips_2019` (\ndispatching_base_num STRING,\npickup_datetime TIMESTAMP,\ndropoff_datetime TIMESTAMP,\nPUlocationID FLOAT64,\nDOlocationID FLOAT64,\nSR_Flag FLOAT64,\nAffiliated_base_number STRING\n)\nOPTIONS (\nformat = 'PARQUET',\nuris = ['gs://project id/fhv_2019_8.parquet']\n);\nCan Also USE  uris = ['gs://project id/fhv_2019_*.parquet'] (THIS WILL remove the need f

`score` – the cosine similarity between the `question` and `text` embeddings.




Let’s compare the original and retrieved answers for our randomly selected question.



In [16]:
print(f"Question:\n{course_piece['question']}\n")
print("Top Retrieved Answer:\n{}\n".format(result.points[0].payload['text']))
print("Original Answer:\n{}".format(course_piece['text']))


Question:
Invalid data types after Ingesting FHV data through parquet files: Could not parse SR_Flag as Float64,Couldn’t parse datetime column as timestamp,couldn’t handle NULL values in PULocationID,DOLocationID

Top Retrieved Answer:
If you uploaded manually the fvh 2019 parquet files manually after downloading from https://d37ci6vzurychx.cloudfront.net/trip-data/fhv_tripdata_2019-*.parquet you may face errors regarding date types while loading the data in a landing table (say fhv_tripdata). Try to create an the external table with the schema defines as following and load each month in a loop.
-----Correct load with schema defination----will not throw error----------------------
CREATE OR REPLACE EXTERNAL TABLE `dw-bigquery-week-3.trips_data_all.external_tlc_fhv_trips_2019` (
dispatching_base_num STRING,
pickup_datetime TIMESTAMP,
dropoff_datetime TIMESTAMP,
PUlocationID FLOAT64,
DOlocationID FLOAT64,
SR_Flag FLOAT64,
Affiliated_base_number STRING
)
OPTIONS (
format = 'PARQUET',
uris

Now let’s search the answer to a question that wasn’t in the initial dataset.


In [17]:
print(search("What if I submit homeworks late?").points[0].payload['text'])

No, late submissions are not allowed. But if the form is still not closed and it’s after the due date, you can still submit the homework. confirm your submission by the date-timestamp on the Course page.y
Older news:[source1] [source2]


## Step 7: Running a Similarity Search with Filters

We can refine our search using metadata filters. 

> Qdrant’s custom vector index implementation, Filterable HNSW, allows for precise and scalable vector search with filtering conditions.

For example, we can search for an answer to a question related to a specific course from the three available in the dataset.  
Using a `must` filter ensures that all specified conditions are met for a data point to be included in the search results.

> Qdrant also supports other filter types such as `should`, `must_not`, `range`, and more. For a full overview, check our [Filtering Guide](https://qdrant.tech/articles/vector-search-filtering/)


To enable efficient filtering, we need to turn on [indexing of payload fields](https://qdrant.tech/documentation/concepts/indexing/#payload-index).


In [18]:
client.create_payload_index(
    collection_name=collection_name,
    field_name="course",
    field_schema="keyword" # exact matching on string metadata fields
)

UpdateResult(operation_id=2, status=<UpdateStatus.COMPLETED: 'completed'>)

Now let's update our search function

In [19]:
def search_in_course(query, course="mlops-zoomcamp", limit=1):

    results = client.query_points(
        collection_name=collection_name,
        query=models.Document( #embed the query text locally with "jinaai/jina-embeddings-v2-small-en"
            text=query,
            model=model_handle
        ),
        query_filter=models.Filter( # filter by course name
            must=[
                models.FieldCondition(
                    key="course",
                    match=models.MatchValue(value=course)
                )
            ]
        ),
        limit=limit, # top closest matches
        with_payload=True #to get metadata in the results
    )

    return results

Let’s see how the same question is answered across different courses:  
`data-engineering-zoomcamp`, `machine-learning-zoomcamp`, and `mlops-zoomcamp`.


In [22]:
print(search_in_course("What if I submit homeworks late?", "mlops-zoomcamp").points[0].payload['text'])

Please choose the closest one to your answer. Also do not post your answer in the course slack channel.


## Conclusion

🎉 Congratulations! You now have everything you need to run a simple semantic search with Qdrant! 👏

In general, data preparation, organization, and storage in a production-ready vector search solution is a topic worth a course of its own.  
If you’re curious to dive deeper into efficient vector search setup, check out our [Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/).

In the next videos, we will show you how to use **[hybrid search](https://qdrant.tech/articles/hybrid-search/)**, combining the strengths of both keywords-based search and vector search. In many real-world applications, they work hand-in-hand, balancing the precision of keywords with the flexibility of embeddings to deliver the best results.

P.S. We encourage you to check out Qdrant’s capabilities, which go beyond similarity search powering RAG & agentic pipelines (but still, here's our [MCP server](https://github.com/qdrant/mcp-server-qdrant) ;) ).
