# RAG with Vector Search using Qdrant

This notebook demonstrates two approaches to RAG:
1. **Text-based search** using Minsearch (keyword matching)
2. **Vector search** using Qdrant (semantic similarity)

## Why Vector Search?

Traditional keyword search matches exact words, but **vector search** understands meaning:
- "How do I start Kafka?" and "Steps to run Kafka" are semantically similar
- Vector embeddings capture this semantic similarity
- Better retrieval = better answers from the LLM

We'll use:
- **Qdrant**: Vector database for storing and searching embeddings
- **Jina Embeddings**: Model to convert text into vectors
- **Hugging Face Llama**: LLM for generating answers

## Step 1: Load Environment Variables

In [None]:
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Verify the token is loaded
token = os.getenv('HUGGINGFACEHUB_API_TOKEN')
print(f"Token loaded: {token[:10]}..." if token else "Token not found!")

## Step 2: Load Documents Dataset

In [1]:
import requests 

docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

In [2]:
documents[2]

{'text': "Yes, even if you don't register, you're still eligible to submit the homeworks.\nBe aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.",
 'section': 'General course-related questions',
 'question': 'Course - Can I still join the course after the start date?',
 'course': 'data-engineering-zoomcamp'}

In [3]:
import minsearch

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

  from scipy.sparse import csr_matrix, issparse


<minsearch.minsearch.Index at 0x24a2177c320>

## Step 3: Create Text Search Index (Baseline)

First, let's set up traditional text-based search as a baseline for comparison.

## Step 4: Initialize Hugging Face LLM

Set up the Llama 3.2 3B model via Hugging Face's Inference API.

In [5]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )

    return results

In [9]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [None]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

# Hugging Face model setup
repo_id = "meta-llama/Llama-3.2-3B-Instruct"
llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
)
client = ChatHuggingFace(llm=llm, verbose=False)

In [None]:
from langchain_core.messages import HumanMessage, SystemMessage

def llm(prompt):
    response = client.invoke(
        [
            SystemMessage(content="You are a helpful assistant."),
            HumanMessage(content=prompt),
        ]
    )
    return response.content

In [11]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

## Step 5: Test Baseline RAG with Text Search

In [12]:
rag('how do I run kafka?')

'To run Kafka, you can execute the following command in your project directory for the Java Kafka producer:\n\n```\njava -cp build/libs/<jar_name>-1.0-SNAPSHOT.jar:out src/main/java/org/example/JsonProducer.java\n```\n\nFor Python, ensure that you create a virtual environment and run the necessary dependencies as indicated. First, create the virtual environment and activate it:\n\n```\npython -m venv env\nsource env/bin/activate  # On MacOS/Linux\n# or\nenv\\Scripts\\activate  # On Windows\n```\n\nThen, install the required packages from `requirements.txt`. After setting up the environment, you can run your Python scripts. Remember that Docker images should be up and running before executing any Python files.'

In [18]:
rag('the course has already started, can I still enroll?')

'Yes, you can still enroll in the course even if it has already started. You are also eligible to submit the homework assignments. However, make sure to pay attention to the deadlines for turning in the final projects to avoid leaving everything until the last minute.'

## Step 6: RAG with Vector Search (Qdrant)

Now let's upgrade to **semantic vector search** using Qdrant!

### What we'll do:
1. Connect to Qdrant (vector database)
2. Create embeddings for all documents using Jina model
3. Store embeddings in Qdrant
4. Search using semantic similarity (not just keywords)
5. Use the same LLM to generate answers

### Prerequisites:
Make sure Qdrant is running in Docker:
```bash
docker run -d -p 6333:6333 -p 6334:6334 \
   -v "./qdrant_storage:/qdrant/storage:z" \
   qdrant/qdrant
```

In [13]:
from qdrant_client import QdrantClient, models

### Connect to Qdrant

In [14]:
qd_client = QdrantClient("http://localhost:6333")

In [17]:
EMBEDDING_DIMENSIONALITY = 512
model_handle = "jinaai/jina-embeddings-v2-small-en"

### Configure Embedding Model

We'll use **Jina Embeddings v2 Small** - a lightweight but powerful embedding model:
- Converts text into 512-dimensional vectors
- Optimized for semantic similarity
- Works well for English text

In [15]:
collection_name = "zoomcamp-faq"

In [None]:
qd_client.delete_collection(collection_name=collection_name)

In [18]:
qd_client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=EMBEDDING_DIMENSIONALITY,
        distance=models.Distance.COSINE
    )
)

True

### Create Qdrant Collection

Create a collection to store our document vectors:
- **size=512**: Match the embedding dimensionality
- **distance=COSINE**: Use cosine similarity to measure how close vectors are

In [35]:
qd_client.create_payload_index(
    collection_name=collection_name,
    field_name="course",
    field_schema="keyword"
)

UpdateResult(operation_id=2, status=<UpdateStatus.COMPLETED: 'completed'>)

### Create Index for Filtering

This allows us to filter by course name efficiently (like SQL's WHERE clause).

In [20]:
points = []

for i, doc in enumerate(documents):
    text = doc['question'] + ' ' + doc['text']
    vector = models.Document(text=text, model=model_handle)
    point = models.PointStruct(
        id=i,
        vector=vector,
        payload=doc
    )
    points.append(point)

### Generate and Store Embeddings

For each document:
1. Combine question + text
2. Generate embedding vector using Jina model
3. Store vector + metadata in Qdrant

This may take a minute as we process all documents...

In [22]:
qd_client.upsert(
    collection_name=collection_name,
    points=points
)

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

In [24]:
question = 'I just discovered the course. Can I still join it?'

In [43]:
def vector_search(question):
    print('vector_search is used')
    
    course = 'data-engineering-zoomcamp'
    query_points = qd_client.query_points(
        collection_name=collection_name,
        query=models.Document(
            text=question,
            model=model_handle 
        ),
        query_filter=models.Filter( 
            must=[
                models.FieldCondition(
                    key="course",
                    match=models.MatchValue(value=course)
                )
            ]
        ),
        limit=5,
        with_payload=True
    )
    
    results = []
    
    for point in query_points.points:
        results.append(point.payload)
    
    return results

### Define Vector Search Function

This function performs semantic search:
1. Converts the query into a vector using the same Jina model
2. Finds the 5 most similar document vectors in Qdrant
3. Filters by course name
4. Returns the matching documents

**Key difference from text search**: This finds semantically similar content, not just keyword matches!

In [44]:
def rag(query):
    search_results = vector_search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

### Updated RAG Function with Vector Search

Same pipeline as before, but now using semantic vector search instead of keyword search!

In [45]:
rag('how do I run kafka?')

vector_search is used


'To run Kafka, you need to follow these steps based on your scripts:\n\n1. Make sure your Kafka broker is running. You can confirm this by running `docker ps`. If the broker is not active, navigate to the folder with your docker-compose yaml file and run `docker compose up -d` to start all instances.\n\n2. In your project directory, to run the producer, use the following command:\n   ```\n   java -cp build/libs/<jar_name>-1.0-SNAPSHOT.jar:out src/main/java/org/example/JsonProducer.java\n   ```\n\n3. Ensure that the `StreamsConfig.BOOTSTRAP_SERVERS_CONFIG` in your Java scripts (e.g., JsonProducer.java, JsonConsumer.java) is set to the correct server URL. Also, verify that the cluster key and secrets in `src/main/java/org/example/Secrets.java` are updated with the correct values.\n\nBy following these steps, you should be able to run Kafka successfully.'

### Test Vector Search RAG

Try asking questions in different ways - vector search understands the semantic meaning!

In [None]:
## Summary

### What We Built
Two complete RAG systems:
1. **Text-based RAG** - Keyword matching with Minsearch
2. **Vector-based RAG** - Semantic search with Qdrant

### Text vs Vector Search Comparison

| Aspect | Text Search (Minsearch) | Vector Search (Qdrant) |
|--------|------------------------|------------------------|
| **How it works** | Matches keywords | Measures semantic similarity |
| **Query flexibility** | Exact words needed | Understands meaning/paraphrasing |
| **Example** | "run Kafka" matches "run Kafka" | "start Kafka" finds "run Kafka" |
| **Speed** | Very fast | Fast (requires embedding) |
| **Setup** | Simple | Requires embedding model + vector DB |
| **Best for** | Known keywords | Natural language questions |

### When to Use Each?

**Use Text Search when:**
- Users search with exact terms/codes
- Speed is critical
- Simple setup is needed

**Use Vector Search when:**
- Users ask natural language questions
- Semantic understanding matters
- Handling synonyms/paraphrases is important

**Use Both (Hybrid):**
- Combine for best results (see `04_hybrid_search.ipynb`)

### Key Advantage of Hugging Face Endpoints
- **No local GPU required** for the LLM (Llama 3.2)
- **Serverless architecture** - pay per use
- **Easy model switching** - just change the `repo_id`
- **Production-ready** - scales automatically

### Next Steps
- Explore `04_hybrid_search.ipynb` for combining both approaches
- Try different embedding models
- Experiment with different LLMs from Hugging Face
- Compare answer quality between text and vector RAG