# Ready Tensor RAG Assistant - Demo Notebook

This notebook demonstrates the full workflow of the RAG-based assistant system.

## 1. Setup and Imports

First, let's install and import all necessary dependencies.

In [None]:
# Install required packages
!pip install -q langchain langchain-openai faiss-cpu python-dotenv

import os
from dotenv import load_dotenv
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

# Load environment variables
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

print("âœ“ All imports successful!")

## 2. Initialize Components

Set up the embeddings model and LLM.

In [None]:
# Initialize embeddings and LLM
embeddings = OpenAIEmbeddings(openai_api_key=api_key)
llm = ChatOpenAI(openai_api_key=api_key, temperature=0.7, model="gpt-3.5-turbo")

print("âœ“ Embeddings and LLM initialized!")
print(f"Embedding Model: OpenAI Embeddings")
print(f"LLM Model: GPT-3.5-turbo")

## 3. Load and Process Documents

Load sample documents and create embeddings.

In [None]:
# Sample documents (replace with your actual documents)
sample_documents = [
    "Ready Tensor is a platform for AI learning and certification.",
    "RAG (Retrieval-Augmented Generation) combines document retrieval with language models.",
    "Vector databases enable semantic search and similarity matching.",
    "LangChain provides tools for building LLM applications.",
]

# Create text splitter
text_splitter = CharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
    separator="."
)

# Split documents into chunks
from langchain.schema import Document
docs = [Document(page_content=doc) for doc in sample_documents]
split_docs = text_splitter.split_documents(docs)

print(f"âœ“ Loaded {len(sample_documents)} documents")
print(f"âœ“ Split into {len(split_docs)} chunks")
print("\nSample chunks:")
for i, doc in enumerate(split_docs[:3]):
    print(f"  {i+1}. {doc.page_content[:60]}...")

## 4. Create Vector Store

Build and store embeddings in FAISS.

In [None]:
# Create vector store
vectorstore = FAISS.from_documents(split_docs, embeddings)

print(f"âœ“ Vector store created")
print(f"âœ“ Total documents: {vectorstore.index.ntotal}")
print("âœ“ Vector store ready for retrieval!")

## 5. Create QA Chain

Set up the retrieval-based QA system.

In [None]:
# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True
)

print("âœ“ QA chain created!")
print("Ready for queries!")

## 6. Test Queries

Run sample queries against the RAG system.

In [None]:
# Test queries
test_queries = [
    "What is Ready Tensor?",
    "How does RAG work?",
    "What are vector databases used for?",
]

print("Running test queries...\n")
print("=" * 80)

for query in test_queries:
    print(f"\nQ: {query}")
    print("-" * 40)
    result = qa_chain({"query": query})
    print(f"A: {result['result']}")
    print(f"\nSources used:")
    for doc in result.get('source_documents', []):
        print(f"  - {doc.page_content[:60]}...")
    print("=" * 80)

## 7. Performance Metrics

Analyze the performance of the RAG system.

In [None]:
import time

print("Performance Analysis")
print("=" * 40)

# Measure query latency
latencies = []
for query in test_queries:
    start = time.time()
    _ = qa_chain({"query": query})
    latency = time.time() - start
    latencies.append(latency)

print(f"Average latency: {sum(latencies)/len(latencies):.2f}s")
print(f"Min latency: {min(latencies):.2f}s")
print(f"Max latency: {max(latencies):.2f}s")
print(f"\nVector store info:")
print(f"  - Documents indexed: {vectorstore.index.ntotal}")
print(f"  - Embedding dimension: {vectorstore.index.d}")

## 8. Next Steps

To extend this demo:

1. **Load Real Documents**: Replace sample_documents with actual PDFs or text files
2. **Add Conversation Memory**: Implement multi-turn conversations
3. **Optimize Retrieval**: Tune chunk size, overlap, and search parameters
4. **Deploy**: Convert to REST API or Streamlit web app
5. **Monitor**: Add logging and observability

Happy building! ðŸš€