# On-Premises RAG Agent Testing

This notebook tests the fully on-premises RAG agent that uses:
- **Ollama** for LLM and embeddings (no cloud APIs!)
- **Qdrant** for vector storage (running locally via Docker)
- **LangGraph** for agent orchestration

## Prerequisites

1. **Ollama** must be running with models pulled:
   ```bash
   ollama pull deepseek-r1:8b
   ollama pull mxbai-embed-large
   ```

2. **Qdrant** must be running via Docker:
   ```bash
   docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
   ```

## Step 1: Import and Setup

In [None]:
import os
from onprem_rag_agent import (
    create_onprem_agent,
    chat_with_agent,
    get_or_create_vectorstore,
    retrieve_dnd_information
)
from langchain_core.messages import HumanMessage

## Step 2: Configure Qdrant (Optional)

By default, it connects to `127.0.0.1:6334`. Change if needed.

In [None]:
# Optional: Set custom Qdrant URL
# os.environ["QDRANT_URL"] = "127.0.0.1:6334"
# os.environ["QDRANT_COLLECTION"] = "DnD_Documents"

print("Qdrant URL:", os.environ.get("QDRANT_URL", "127.0.0.1:6334"))
print("Collection:", os.environ.get("QDRANT_COLLECTION", "DnD_Documents"))

## Step 3: Setup Vector Store

This will either connect to an existing collection or create a new one.

**Note:** Creating a new collection will take 10-15 minutes as it needs to embed ~5800 documents.

In [None]:
print("Setting up vector store...")
print("This may take a while if creating a new collection...\n")

vectorstore = get_or_create_vectorstore(
    qdrant_url="127.0.0.1:6334",
    collection_name="DnD_Documents",
    recreate=False  # Set to True to force recreation
)

print("\n✓ Vector store ready!")

## Step 4: Test RAG Tool Directly

Let's test the RAG retrieval tool before using the full agent.

In [None]:
# Test the RAG tool directly
query = "What feats improve strength?"
print(f"Query: {query}\n")

result = retrieve_dnd_information.invoke({"query": query})
print(f"Result:\n{result}")

## Step 5: Create the Agent

Now let's create the full agent with tool use capabilities.

In [None]:
print("Creating on-premises agent...\n")

agent = create_onprem_agent(
    model="deepseek-r1:8b",
    temperature=0.7
)

print("\n✓ Agent created and ready!")

## Step 6: Test Single Query

In [None]:
query = "What are some feats that improve a character's strength?"
print(f"User: {query}\n")

response = chat_with_agent(agent, query)
print(f"Agent: {response}")

## Step 7: Test Multiple Queries

In [None]:
test_queries = [
    "Tell me about feats related to magic or spellcasting",
    "What feats are good for a rogue character?",
    "What feats help with armor class?",
    "Are there feats that improve initiative?"
]

for i, query in enumerate(test_queries, 1):
    print("="*80)
    print(f"Query {i}: {query}")
    print("="*80)

    response = chat_with_agent(agent, query)
    print(f"\nAgent: {response}\n")

## Step 8: Interactive Chat

Try your own questions!

In [None]:
# Your custom query
custom_query = ""  # Enter your question here

if custom_query:
    print(f"User: {custom_query}\n")
    response = chat_with_agent(agent, custom_query)
    print(f"Agent: {response}")
else:
    print("Enter a query in the custom_query variable above")

## Step 9: Inspect Agent Execution

Let's see the full execution trace to understand what the agent is doing.

In [None]:
# Run with full output to see tool calls
query = "What feats help with stealth?"
print(f"Query: {query}\n")

result = agent.invoke({"messages": [HumanMessage(content=query)]})

print("\n" + "="*80)
print("FULL EXECUTION TRACE")
print("="*80)

for i, message in enumerate(result["messages"]):
    print(f"\nMessage {i+1}:")
    print(f"Type: {type(message).__name__}")
    print(f"Content: {message.content[:200]}...") if len(message.content) > 200 else print(f"Content: {message.content}")

    if hasattr(message, "tool_calls") and message.tool_calls:
        print(f"Tool Calls: {message.tool_calls}")

## Summary

You've successfully tested a fully on-premises RAG agent that:
- ✅ Runs completely locally (no cloud APIs)
- ✅ Uses Ollama for LLM and embeddings
- ✅ Uses Qdrant for vector storage
- ✅ Uses LangGraph for agent orchestration
- ✅ Can retrieve and reason over D&D knowledge

This demonstrates how to build production-ready AI applications without depending on external APIs!