#### 1. Build Qdrant Client 

In [1]:
!pip install -q "qdrant-client[fastembed]>=1.14.2"

In [2]:
from qdrant_client import QdrantClient, models

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# 1. Initialize the client
client = QdrantClient("http://localhost:6333") #connecting to local Qdrant instance
print(client)

<qdrant_client.qdrant_client.QdrantClient object at 0x76e3bdbfa120>


#### 2. Data Collection
Collect the FAQ data online for indexing

In [5]:
import requests

docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

#documents_raw

Decide which fields to be used for semantic search, which to be used as metadata for filering.

Text including Q&A pairs can be used as Search Content, Course Name and Section Name can be used as Metadata

#### 3. Collection Creation and Embedding Model Selection

In [6]:
from fastembed import TextEmbedding
import json

# For simplicity and memory friendly, use 512 dimension for embedding
EMBEDDING_DIMENSIONALITY = 512

for model in TextEmbedding.list_supported_models():
    if model["dim"] == EMBEDDING_DIMENSIONALITY:
        print(json.dumps(model, indent=2))

{
  "model": "BAAI/bge-small-zh-v1.5",
  "sources": {
    "hf": "Qdrant/bge-small-zh-v1.5",
    "url": "https://storage.googleapis.com/qdrant-fastembed/fast-bge-small-zh-v1.5.tar.gz",
    "_deprecated_tar_struct": true
  },
  "model_file": "model_optimized.onnx",
  "description": "Text embeddings, Unimodal (text), Chinese, 512 input tokens truncation, Prefixes for queries/documents: not so necessary, 2023 year.",
  "license": "mit",
  "size_in_GB": 0.09,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "Qdrant/clip-ViT-B-32-text",
  "sources": {
    "hf": "Qdrant/clip-ViT-B-32-text",
    "url": null,
    "_deprecated_tar_struct": false
  },
  "model_file": "model.onnx",
  "description": "Text embeddings, Multimodal (text&image), English, 77 input tokens truncation, Prefixes for queries/documents: not necessary, 2021 year",
  "license": "mit",
  "size_in_GB": 0.25,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "jinaai/jina-embeddings-v2-small-e

Points are the central entity Qdrant works with.
A point is a record consisting of an ID, a vector, and an optional payload.

A collection is a named set of points (i.e., vectors with optional payloads) that you can search within.
Think of it as the container for your vector search solution, a single business problem solved.

When creating a collection, we need to specify:

Name: A unique identifier for the collection.
Vector Configuration:
Size: The dimensionality of the vectors.
Distance Metric: The method used to measure similarity between vectors.

In [9]:
# Define the collection name
collection_name = "zoomcamp-rag"

# Create the collection with specified vector parameters
if not client.collection_exists("zoomcamp-rag"):
    client.create_collection(

        collection_name=collection_name,
    
        vectors_config=models.VectorParams(
            size=EMBEDDING_DIMENSIONALITY,  # Dimensionality of the vectors
            distance=models.Distance.COSINE  # Distance metric for similarity search
        )
    )

#### 4. Create, Embed & Insert Points into the Collection

Points are the core data entities in Qdrant. Each point consists of:

- ID. A unique identifier. Qdrant supports both 64-bit unsigned integers and UUIDs.
- Vector. The embedding that represents the data point in vector space.
- Payload (optional). Additional metadata as key-value pairs.

In [None]:
# Create Points to be upserted

# Define the embedding model
model_handle = "jinaai/jina-embeddings-v2-small-en"

points = []
id = 0

for course in documents_raw:
    for doc in course['documents']:

        point = models.PointStruct(
            id=id,
            vector=models.Document(text=doc['text'], model=model_handle), #embed text locally with "jinaai/jina-embeddings-v2-small-en" from FastEmbed
            payload={
                "text": doc['text'],
                "section": doc['section'],
                "course": course['course']
            } #save all needed metadata fields
        )
        points.append(point)

        id += 1

In [13]:
points[3]

PointStruct(id=3, vector=Document(text="You don't need it. You're accepted. You can also just start learning and submitting homework without registering. It is not checked against any registered list. Registration is just to gauge interest before the start date.", model='jinaai/jina-embeddings-v2-small-en', options=None), payload={'text': "You don't need it. You're accepted. You can also just start learning and submitting homework without registering. It is not checked against any registered list. Registration is just to gauge interest before the start date.", 'section': 'General course-related questions', 'course': 'data-engineering-zoomcamp'})

Now we’re going to embed and upload points to our collection.

First, FastEmbed will fetch&download the selected model (path defaults to os.path.join(tempfile.gettempdir(), "fastembed_cache")), and perform inference directly on your machine.

Then, the generated points will be upserted into the collection, and the vector index will be built.

In [14]:
# Embed the text points and upsert into collection for retrival
client.upsert(

    collection_name=collection_name,
    points=points

)

Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00,  5.53it/s]


UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

In [29]:
# Set up the filtering condition 
client.create_payload_index(
    collection_name=collection_name,
    field_name="course",
    field_schema="keyword" # exact matching on string metadata fields
)

UpdateResult(operation_id=2, status=<UpdateStatus.COMPLETED: 'completed'>)

As the embeddings are upserted into the collection, the collection is ready for query retrival.

#### 5. Running a Similarity Search

Retrival Process:
1. Qdrant compares the query vector to stored vectors (based on a vector index) using the distance metric defined when creating the collection.

The closest matches are returned, ranked by similarity.

2. Vector index is built for approximate nearest neighbor (ANN) search, making large-scale vector search feasible.

In [31]:
def search(query, course_filter, top_n = 1):

    results = client.query_points(
        collection_name=collection_name,

        query = models.Document( # Embed the query text locally with "jinaai/jina-embeddings-v2-small-en"
            text=query,
            model=model_handle),
        
        query_filter = models.Filter( # filter by course name
            must=[
                models.FieldCondition(
                    key="course",
                    match=models.MatchValue(value=course_filter)
                )
            ]
        ),
        limit = top_n, # top closest matches
        with_payload=True #to get metadata in the results
    )

    return results

In [19]:
# Search example
import random

# Randomly pick up a course question
course = random.choice(documents_raw)
course_piece = random.choice(course['documents'])
#print(json.dumps(course_piece, indent=2))

search_result = search(course_piece['question'])
display(search_result) 

QueryResponse(points=[ScoredPoint(id=757, version=0, score=0.832627, payload={'text': 'Answer: All midterms and capstones are meant to be solo projects. [source @Alexey]', 'section': 'Projects (Midterm and Capstone)', 'course': 'machine-learning-zoomcamp'}, vector=None, shard_key=None, order_value=None)])

In [21]:
print(f"Question:\n{course_piece['question']}\n")
print("Top Retrieved Answer:\n{}\n".format(search_result.points[0].payload['text']))
print("Original Answer:\n{}".format(course_piece['text']))

Question:
Are projects solo or collaborative/group work?

Top Retrieved Answer:
Answer: All midterms and capstones are meant to be solo projects. [source @Alexey]

Original Answer:
Answer: All midterms and capstones are meant to be solo projects. [source @Alexey]


In [22]:
print(search("What if I submit homeworks late?").points[0].payload['text'])

No, late submissions are not allowed. But if the form is still not closed and it’s after the due date, you can still submit the homework. confirm your submission by the date-timestamp on the Course page.y
Older news:[source1] [source2]


score – the cosine similarity between the question and text embeddings.

#### 6. Integrate Semantic Search into LLM Assistant

Switch the search engine into Sementic Search with Qdrant

In [25]:
!pip install openai docx

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting openai
  Downloading openai-1.90.0-py3-none-any.whl.metadata (26 kB)
Collecting docx
  Downloading docx-0.2.4.tar.gz (54 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Collecting lxml (from docx)
  Downloading lxml-5.4.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.5 kB)
Downloading openai-1.90.0-py3-none-any.whl (734 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m734.6/734.6 kB[0m [31m28.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading distro-1.9.0-py3-none-any.whl (20 kB)
Downloading jiter-0.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (352 kB)
Downloading lxml-5.

In [26]:
import pandas as pd 
import numpy as np
import openai
from openai import OpenAI
import os

In [1]:
os.environ["OPENAI_API_KEY"] = "<API KEY>"

NameError: name 'os' is not defined

In [28]:
# Make the LLM Assistant Function
def response(query, context):

    # 1. Make the prompt
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}
CONTEXT: {context}f
""".strip()
    
    prompt = prompt_template.format(question=query, context=context).strip()

    # 2. Get the answer using LLM
    client = OpenAI()
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [34]:
def rag(user_query, course_filter = 'data-engineering-zoomcamp', top_n = 5):

    context = search(query = user_query, course_filter = course_filter, top_n = top_n)
    answer = response(query = user_query, context = context)
    
    return answer  

##### Try Sample Q&A

In [37]:
from IPython.display import display, Markdown, Latex

In [46]:
question = 'Where can I submit my homework?'

answer = rag(user_query=question, course_filter = "mlops-zoomcamp")
Markdown(answer)

The provided CONTEXT does not contain specific information about where to submit your homework. Please refer to your course guidelines or contact your instructor for the correct submission process.