# RAG Testing Notebook

An isolated environment to experiment with different retrieval-augmented generation (RAG) models and retrieval strategies.

This notebook demonstrates how to:
- Load a vector database and retrieve relevant documents
- Integrate retrieved context with an LLM for informed responses
- Build a conversational interface with context-aware answers

**Current Setup:**
- Uses a pre-built Chroma vector database with semantic embeddings
- Queries an OpenAI LLM (gpt-4.1-nano) with retrieved context
- Provides a Gradio chat interface for testing 

<br>

> **NOTE:** This notebook is designed for a simple one-shot, stateless query workflow. A separate file is used to test RAG with conversational history.

## Setup & Configuration

Install required dependencies and configure environment variables. This notebook uses OpenAI's embeddings and chat models, so ensure your `.env` file contains valid API keys.

In [None]:
import gradio as gr
from dotenv import load_dotenv
from langchain_chroma import Chroma
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.messages import SystemMessage, HumanMessage

## Environment Variables & Model Configuration

Define the LLM model and vector database path, then load configuration from `.env`

In [None]:
MODEL = "gpt-4.1-nano"
db_name = "insure-llm-vdb"
db_path = f"../../vector_dbs/{db_name}"
load_dotenv(override=True)

## Vector Database & Embeddings

Initialize the Chroma vector database with OpenAI embeddings. The embeddings must match the model used when building the database for consistent semantic search.

> **NOTE:** Make sure the embeddings model matches the exact model that was originally used to generate the stored embeddings. Using a different embeddings model will result in inconsistent vector representations and poor retrieval performance.

In [None]:
# embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma(persist_directory=db_path, embedding_function=embeddings)

## Retriever & Language Model

Set up the semantic search retriever from the vector database and initialize the chat model for generating responses.

In [None]:
retriever = vectorstore.as_retriever()
llm = ChatOpenAI(temperature=0, model_name=MODEL)

## Testing Individual Components

Test the retriever and LLM independently to ensure they work before integration.

In [None]:
retriever.invoke("Who is Avery?")

In [None]:
llm.invoke("Who is Avery?")

## RAG Pipeline Integration

Combine the retriever and LLM into a single question-answering pipeline that retrieves relevant context and uses it to generate informed responses.

In [None]:
SYSTEM_PROMPT_TEMPLATE = """
You are a knowledgeable, friendly assistant representing the company Insurellm.
You are chatting with a user about Insurellm.
If relevant, use the given context to answer any question.
If you don't know the answer, say so.
Context:
{context}
"""

In [None]:
def answer_question(question: str):
    docs = retriever.invoke(question)
    context = "\n\n".join(doc.page_content for doc in docs)
    system_prompt = SYSTEM_PROMPT_TEMPLATE.format(context=context)
    response = llm.invoke([SystemMessage(content=system_prompt), HumanMessage(content=question)])
    return response.content

In [None]:
answer_question("Who is Averi Lancaster?", [])

## Interactive Chat Interface

Launch a Gradio-powered chat interface to test the RAG pipeline interactively. Users can ask questions and receive context-aware responses.

In [None]:
gr.ChatInterface(answer_question).launch()