# Step 4: Search, Retrieval, and RAG

This is the final and most exciting step. We will now query our vector database to find relevant code, and then use those results with a Large Language Model (LLM) to generate a direct, human-readable answer. This is the complete end-to-end **Retrieval-Augmented Generation (RAG)** process.

## Prerequisites

1.  **Qdrant is Running:** Ensure your Qdrant Docker container is running via `docker-compose up -d`.
2.  **Collection is Indexed:** You should have already run the previous notebook to index the data into the `semantic_code_search` collection.

In [1]:
# Install necessary libraries, including 'rich' for pretty printing
%pip install openai python-dotenv qdrant-client rich

Collecting rich
  Downloading rich-14.1.0-py3-none-any.whl.metadata (18 kB)
Collecting markdown-it-py>=2.2.0 (from rich)
  Downloading markdown_it_py-4.0.0-py3-none-any.whl.metadata (7.3 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich)
  Using cached mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Downloading rich-14.1.0-py3-none-any.whl (243 kB)
Downloading markdown_it_py-4.0.0-py3-none-any.whl (87 kB)
Using cached mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Installing collected packages: mdurl, markdown-it-py, rich
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3/3[0m [rich][32m2/3[0m [rich]
[1A[2KSuccessfully installed markdown-it-py-4.0.0 mdurl-0.1.2 rich-14.1.0
Note: you may need to restart the kernel to use updated packages.


## 1. Setup Clients and API Key

As before, we'll start by setting up our clients for OpenAI and Qdrant. Make sure your `.env` file is in the same directory.

In [2]:
import os
import getpass
from openai import OpenAI
from dotenv import load_dotenv
from qdrant_client import QdrantClient
from rich.console import Console
from rich.panel import Panel
from rich.markdown import Markdown
from rich.syntax import Syntax

# --- Initialize Rich Console for pretty printing ---
console = Console()

# --- OpenAI Client Setup ---
load_dotenv()
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
    api_key = getpass.getpass("OpenAI API key not found. Please enter your key: ")

try:
    openai_client = OpenAI(api_key=api_key)
    console.print("✅ OpenAI client initialized successfully!")
except Exception as e:
    console.print(f"❌ Error initializing OpenAI client: {e}")

# --- Qdrant Client Setup ---
try:
    qdrant_client = QdrantClient("localhost", port=6333)
    console.print("✅ Qdrant client connected successfully!")
except Exception as e:
    console.print(f"❌ Error connecting to Qdrant: {e}")

## 2. Define the Search Function

We'll create a function that encapsulates the search process: taking a user's text query, embedding it, and searching Qdrant for the most similar code chunks.

In [5]:
COLLECTION_NAME = "semantic_code_search"

# NOTE: qdrant_client.search is deprecated. We transparently prefer query_points when available.
def get_openai_embedding(text: str, model: str = "text-embedding-3-small"):
    text = text.replace("\n", " ")
    try:
        return openai_client.embeddings.create(input=[text], model=model).data[0].embedding
    except Exception as e:
        console.print(f"❌ Error generating embedding: {e}")
        return None


def search_code(query: str, top_k: int = 3):
    """Search for code chunks based on a natural language query.

    Uses the newer query_points API when present, else falls back to deprecated search for older client versions.
    Returns a list of ScoredPoint objects (uniform interface)."""
    query_embedding = get_openai_embedding(query)
    if not query_embedding:
        return []

    try:
        # Preferred modern API
        if hasattr(qdrant_client, "query_points"):
            result = qdrant_client.query_points(
                collection_name=COLLECTION_NAME,
                query=query_embedding,  # vector search by raw embedding
                limit=top_k,
                with_payload=True,
                with_vectors=False,
            )
            # query_points returns a models.QueryResponse; extract .points
            search_results = result.points
        else:
            # Backward compatibility path (deprecated)
            search_results = qdrant_client.search(
                collection_name=COLLECTION_NAME,
                query_vector=query_embedding,
                limit=top_k,
                with_payload=True,
            )
        return search_results
    except Exception as e:
        console.print(f"❌ Search failed: {e}")
        return []

console.print("✓ Search functions defined (using query_points API if available).")

## 3. Perform the Search

Let's define our query and execute the search. We'll then look at the raw results.

In [6]:
user_query = "How do I validate an email address?"
console.print(f"[bold cyan]Query:[/bold cyan] {user_query}\n")

search_results = search_code(user_query)

console.print(f"Found {len(search_results)} relevant results.")

### Method 1: Simple Retrieval Results

First, let's just display the results directly. This is a basic semantic search implementation.

In [7]:
console.print(Panel("[bold green]Method 1: Simple Retrieval Results[/bold green]"))

for i, result in enumerate(search_results):
    payload = result.payload
    score = result.score
    
    # Create a syntax-highlighted code block
    code_snippet = Syntax(payload['snippet'], "python", theme="monokai", line_numbers=True)
    
    # Create a panel for each result
    result_panel = Panel(
        code_snippet,
        title=f"[bold]Result {i+1} | Score: {score:.4f}[/bold]",
        subtitle=f"[cyan]{payload['context']['file_path']}[/cyan]",
        border_style="blue"
    )
    console.print(result_panel)
    console.print(f"[bold]LLM Description:[/bold] {payload['llm_description']}\n")

### Method 2: Generating an Answer with RAG

Now, let's implement the more advanced RAG approach. We'll take the retrieved results, use them as context in a prompt for a powerful chat model (`gpt-4o`), and get a direct, synthesized answer.

In [8]:
console.print(Panel("[bold green]Method 2: Generating an Answer with RAG[/bold green]"))

# 1. Context Collection
context = ""
for result in search_results:
    payload = result.payload
    context += f"File: {payload['context']['file_path']}\nCode Snippet:\n```python\n{payload['snippet']}\n```\n\n"

# 2. Intelligent Prompting
system_prompt = "You are an expert programming assistant. Answer the user's question based ONLY on the provided context from their codebase."

human_prompt = f"""--- CONTEXT FROM THE CODEBASE ---
{context}
--- QUESTION ---
{user_query}
"""

console.print(f"[bold]System Prompt:[/bold] {system_prompt}")
console.print(f"[bold]Human Prompt Preview:[/bold] {human_prompt[:200]}...\n")

# 3. Synthesized Answer
try:
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": human_prompt}
        ],
        temperature=0.0 # Make the output deterministic
    )
    answer = response.choices[0].message.content
    
    # Display the final answer as formatted markdown
    console.print(Panel(Markdown(answer), title="[bold magenta]Synthesized Answer[/bold magenta]"))

except Exception as e:
    console.print(f"❌ Error generating RAG answer: {e}")