# PyTorch Tutorial: RAG and Agents

The modern AI stack isn't just about training models; it's about **using** them. Two massive trends in 2025 are **RAG (Retrieval Augmented Generation)** and **Agents**.

## Learning Objectives
- Understand **RAG**: Giving LLMs access to your private data.
- Understand **Embeddings**: Converting text to numbers for search.
- Build a simple **Agent**: An LLM that can use tools (like a calculator).

## 1. Vocabulary First

- **Embedding**: A vector (list of numbers) representing the *meaning* of text. Similar meanings = close vectors.
- **Vector Database**: A database optimized to store and search embeddings.
- **Context Window**: The limit on how much text an LLM can read at once.
- **RAG**: Retrieve relevant docs -> Paste into Context -> Ask LLM to answer.
- **Agent**: An LLM loop that decides *what to do next* (Think -> Act -> Observe).

## 2. RAG: The "Open Book" Exam

Imagine asking an LLM "What is my company's vacation policy?". It doesn't know. RAG fixes this.

### Step A: Create Embeddings (Mock)
In production, you'd use `OpenAI` or `HuggingFace` embeddings.

In [None]:
import torch
import torch.nn.functional as F

# Let's pretend these are 3-dimensional embeddings for simplicity
# In reality, they are 768 or 1536 dimensions
documents = {
    "doc1": torch.tensor([0.1, 0.2, 0.9]), # Represents "Vacation policy"
    "doc2": torch.tensor([0.8, 0.1, 0.1]), # Represents "Coffee machine manual"
    "doc3": torch.tensor([0.2, 0.9, 0.1])  # Represents "Meeting notes"
}

query = torch.tensor([0.15, 0.25, 0.85]) # User asks: "How many vacation days?"

print("Database created.")

### Step B: Retrieval (Cosine Similarity)
We find the document closest to the query.

In [None]:
def retrieve(query, docs):
    best_doc = None
    max_score = -1
    
    for name, doc_vec in docs.items():
        # Cosine Similarity: (A . B) / (|A| * |B|)
        score = F.cosine_similarity(query.unsqueeze(0), doc_vec.unsqueeze(0))
        print(f"Score for {name}: {score.item():.4f}")
        
        if score > max_score:
            max_score = score
            best_doc = name
            
    return best_doc

result = retrieve(query, documents)
print(f"\nMost relevant document: {result}")

## 3. Agents: LLMs with Tools

An agent is a loop:
1. **Thought**: LLM analyzes the request.
2. **Action**: LLM decides to call a function (e.g., `calculator(5, 5)`).
3. **Observation**: The function returns a result (`25`).
4. **Response**: LLM uses the result to answer the user.

This is how ChatGPT Plugins work!

In [None]:
# Mock Agent Loop
def calculator(a, b, op):
    if op == 'add': return a + b
    if op == 'mul': return a * b

user_request = "What is 5 * 5?"

# 1. LLM decides to use a tool (Mocked)
print(f"User: {user_request}")
print("Agent: I need to calculate 5 * 5. Calling tool 'calculator'...")

# 2. Tool Execution
result = calculator(5, 5, 'mul')

# 3. Final Answer
print(f"Agent: The result is {result}.")

## Key Takeaways

1. **RAG** connects LLMs to your data using embeddings.
2. **Agents** give LLMs "hands" to perform actions.
3. This is the future of Application AI.