# 🦜 RAG System Tutorial with LangChain

This notebook demonstrates how to build a **Retrieval-Augmented Generation (RAG)** system step by step.

## What is RAG?
RAG combines:
- **Retrieval**: Finding relevant documents from a knowledge base
- **Generation**: Using an LLM to generate answers based on retrieved context

## Architecture Overview
```
Documents → Text Splitting → Embeddings → Vector DB → Retrieval → LLM → Answer
```

## 📦 Step 1: Install and Import Dependencies

First, let's install all required packages and import them.

In [None]:
# Install required packages (run this if not already installed)
# !pip install langchain langchain-openai langchain-community chromadb beautifulsoup4 requests python-dotenv

In [None]:
import os
import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv

# LangChain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.schema import Document

# Load environment variables
load_dotenv()

print("✅ All dependencies imported successfully!")

## 🔑 Step 2: Configure API Keys and Models

We need API keys for:
- **OpenAI**: For embeddings (converting text to vectors)
- **E2E Networks**: For the language model (generating responses)

In [None]:
# Configuration
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
E2E_LLM_ENDPOINT = os.getenv("E2E_LLM_ENDPOINT")
E2E_API_KEY = os.getenv("E2E_API_KEY")
E2E_MODEL_NAME = os.getenv("E2E_MODEL_NAME", "meta-llama/Meta-Llama-3.1-8B-Instruct")

print(f"📊 Using E2E Model: {E2E_MODEL_NAME}")
print(f"🔗 E2E Endpoint: {E2E_LLM_ENDPOINT[:50]}..." if E2E_LLM_ENDPOINT else "❌ No E2E endpoint configured")
print(f"🔑 OpenAI Key: {'✅ Configured' if OPENAI_API_KEY else '❌ Missing'}")

## 🌐 Step 3: Document Loading from Websites

Let's load documents from LangChain's official documentation websites.

### What happens here:
1. **Web Scraping**: Download HTML content from URLs
2. **Text Extraction**: Extract clean text using BeautifulSoup
3. **Document Creation**: Convert to LangChain Document objects

In [None]:
def load_documents_from_urls(urls):
    """Load documents from a list of URLs"""
    documents = []
    
    print("📚 Loading documents from websites...")
    
    for url in urls:
        try:
            # Step 1: Download the webpage
            response = requests.get(url, timeout=10)
            response.raise_for_status()
            
            # Step 2: Parse HTML and extract text
            soup = BeautifulSoup(response.content, 'html.parser')
            
            # Remove script and style elements
            for script in soup(["script", "style"]):
                script.decompose()
            
            # Get clean text
            content = soup.get_text()
            content = ' '.join(content.split())  # Clean whitespace
            
            # Step 3: Create LangChain Document
            if len(content) > 200:  # Only save if content is substantial
                doc = Document(
                    page_content=content,
                    metadata={
                        "source": url,
                        "title": soup.title.string if soup.title else "LangChain Documentation"
                    }
                )
                documents.append(doc)
                print(f"✅ Loaded: {url}")
            
        except Exception as e:
            print(f"❌ Error loading {url}: {str(e)}")
            continue
    
    print(f"📖 Successfully loaded {len(documents)} documents")
    return documents

In [ ]:
# URLs to load (LangChain documentation)
docs_urls = [
    "https://python.langchain.com/docs/introduction/",
    "https://python.langchain.com/docs/concepts/vectorstores/",
    "https://python.langchain.com/docs/concepts/text_splitters/",
    "https://python.langchain.com/docs/concepts/embedding_models/"
]

# Load the documents
documents = load_documents_from_urls(docs_urls)

# Show what we loaded
print(f"\n📄 Document Summary:")
for i, doc in enumerate(documents[:2]):  # Show first 2
    print(f"Document {i+1}:")
    print(f"  Source: {doc.metadata['source']}")
    print(f"  Title: {doc.metadata['title']}")
    print(f"  Content length: {len(doc.page_content)} characters")
    print(f"  Preview: {doc.page_content[:200]}...\n")

## ✂️ Step 4: Text Splitting

Large documents need to be split into smaller chunks because:
- **LLM Context Limits**: Models have maximum input lengths
- **Better Retrieval**: Smaller chunks = more precise matches
- **Efficient Processing**: Faster embedding and search

### Parameters:
- **chunk_size**: Maximum characters per chunk
- **chunk_overlap**: Characters to overlap between chunks (maintains context)

In [None]:
# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,        # Maximum 1000 characters per chunk
    chunk_overlap=200,      # 200 characters overlap between chunks
    separators=["\n\n", "\n", " ", ""]  # Split on paragraphs, then lines, then words
)

print("✂️ Splitting documents into chunks...")

# Split the documents
chunks = text_splitter.split_documents(documents)

print(f"📄 Split {len(documents)} documents into {len(chunks)} chunks")

# Show some examples
print("\n🔍 Example chunks:")
for i, chunk in enumerate(chunks[:3]):
    print(f"\nChunk {i+1}:")
    print(f"  Length: {len(chunk.page_content)} characters")
    print(f"  Source: {chunk.metadata['source']}")
    print(f"  Content: {chunk.page_content[:150]}...")

## 🧮 Step 5: Embeddings - Converting Text to Vectors

**Embeddings** convert text into numerical vectors that capture semantic meaning.

### Why Embeddings?
- **Semantic Search**: Find documents by meaning, not just keywords
- **Mathematical Operations**: Compare similarity using vector math
- **Efficient Storage**: Vectors can be indexed for fast search

### How it works:
```
"What is LangChain?" → [0.1, 0.8, -0.3, 0.5, ...] (1536 dimensions)
"LangChain tutorial"  → [0.2, 0.7, -0.2, 0.6, ...] (similar vector)
"Pizza recipe"        → [0.9, 0.1, 0.8, -0.4, ...] (very different vector)
```

In [None]:
# Initialize the embedding model
embeddings = OpenAIEmbeddings(
    openai_api_key=OPENAI_API_KEY,
    model="text-embedding-ada-002"  # OpenAI's embedding model
)

print("🧮 Testing embeddings...")

# Test embedding a simple sentence
test_text = "What is LangChain?"
test_embedding = embeddings.embed_query(test_text)

print(f"📝 Text: '{test_text}'")
print(f"🔢 Embedding dimensions: {len(test_embedding)}")
print(f"🎯 First 5 values: {test_embedding[:5]}")
print(f"📊 Vector magnitude: {sum(x*x for x in test_embedding)**0.5:.4f}")

# Test similarity between different texts
texts = [
    "What is LangChain?",
    "LangChain tutorial and guide",
    "How to cook pasta"
]

print("\n🔍 Testing semantic similarity:")
embeddings_list = [embeddings.embed_query(text) for text in texts]

# Calculate cosine similarity
def cosine_similarity(a, b):
    dot_product = sum(x*y for x, y in zip(a, b))
    magnitude_a = sum(x*x for x in a)**0.5
    magnitude_b = sum(x*x for x in b)**0.5
    return dot_product / (magnitude_a * magnitude_b)

for i in range(len(texts)):
    for j in range(i+1, len(texts)):
        similarity = cosine_similarity(embeddings_list[i], embeddings_list[j])
        print(f"  '{texts[i]}' vs '{texts[j]}': {similarity:.3f}")

## 🗄️ Step 6: Vector Database with ChromaDB

A **Vector Database** stores and indexes embeddings for fast similarity search.

### Why Vector Databases?
- **Fast Search**: Find similar vectors in milliseconds
- **Scalability**: Handle millions of documents
- **Persistence**: Save embeddings to disk
- **Filtering**: Search with metadata filters

### ChromaDB Features:
- Open-source and lightweight
- Built-in embedding support
- SQL-like filtering
- Perfect for prototyping

In [None]:
print("🗄️ Creating vector database...")

# Create ChromaDB vector store
vector_store = Chroma.from_documents(
    documents=chunks,                    # Our text chunks
    embedding=embeddings,                # Embedding function
    persist_directory="./chroma_tutorial_db",  # Save to disk
    collection_name="langchain_tutorial" # Collection name
)

print(f"✅ Vector store created with {len(chunks)} chunks")

# Test similarity search
print("\n🔍 Testing similarity search:")
query = "What is LangChain?"
similar_docs = vector_store.similarity_search(query, k=3)  # Get top 3 similar documents

print(f"Query: '{query}'")
print(f"Found {len(similar_docs)} similar documents:\n")

for i, doc in enumerate(similar_docs):
    print(f"Result {i+1}:")
    print(f"  Source: {doc.metadata['source']}")
    print(f"  Content: {doc.page_content[:200]}...\n")

## 🔍 Step 7: Understanding Retrieval

**Retrieval** is the process of finding relevant documents for a given query.

### Types of Retrieval:
1. **Similarity Search**: Find most similar vectors
2. **MMR (Maximal Marginal Relevance)**: Balance relevance and diversity
3. **Threshold-based**: Only return results above similarity threshold

### Key Parameters:
- **k**: Number of documents to retrieve
- **score_threshold**: Minimum similarity score
- **fetch_k**: Number of docs to fetch before filtering

In [None]:
# Create a retriever
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 most similar documents
)

print("🔍 Testing different retrieval methods:")

queries = [
    "How do text splitters work?",
    "What are embeddings?",
    "Vector database benefits"
]

for query in queries:
    print(f"\n❓ Query: '{query}'")
    
    # Method 1: Simple similarity search
    docs = retriever.get_relevant_documents(query)
    print(f"📄 Found {len(docs)} relevant documents")
    
    # Show the most relevant result
    if docs:
        best_doc = docs[0]
        print(f"🏆 Most relevant:")
        print(f"   Source: {best_doc.metadata['source']}")
        print(f"   Preview: {best_doc.page_content[:150]}...")
    
    # Method 2: Search with similarity scores
    docs_with_scores = vector_store.similarity_search_with_score(query, k=3)
    print(f"📊 Similarity scores:")
    for i, (doc, score) in enumerate(docs_with_scores):
        print(f"   #{i+1}: {score:.3f} - {doc.page_content[:80]}...")

## 🤖 Step 8: Language Model Setup

The **Language Model (LLM)** generates human-like responses based on the retrieved context.

### Our Setup:
- **Model**: Meta Llama 3.1 8B Instruct
- **Hosted on**: E2E Networks
- **API**: OpenAI-compatible interface

### Key Parameters:
- **temperature**: Controls randomness (0 = deterministic, 1 = creative)
- **max_tokens**: Maximum response length
- **top_p**: Controls diversity of word choices

In [None]:
# Initialize the language model
llm = ChatOpenAI(
    openai_api_key=E2E_API_KEY,
    openai_api_base=E2E_LLM_ENDPOINT,
    model_name=E2E_MODEL_NAME,
    temperature=0.7,  # Balanced creativity
    max_tokens=512    # Reasonable response length
)

print(f"🤖 Language model initialized: {E2E_MODEL_NAME}")

# Test the LLM
print("\n🧪 Testing language model:")
test_prompt = "Explain what a vector database is in simple terms."
response = llm.invoke(test_prompt)

print(f"💬 Prompt: {test_prompt}")
print(f"🤖 Response: {response.content}")

## 🔗 Step 9: Creating the RAG Chain

Now we combine all components into a **RAG Chain**:

### RAG Process:
1. **User asks a question**
2. **Retrieve relevant documents** from vector database
3. **Create context** by combining retrieved documents
4. **Generate prompt** with context + question
5. **LLM generates answer** based on context
6. **Return answer** to user

### Benefits:
- **Factual Accuracy**: Answers based on specific documents
- **Source Attribution**: Can cite where information came from
- **Up-to-date**: Add new documents without retraining model

In [None]:
# Create a prompt template for RAG
rag_prompt = ChatPromptTemplate.from_template("""
You are a helpful assistant that answers questions based on the provided context.

Context from LangChain documentation:
{context}

Question: {question}

Please provide a clear and accurate answer based on the context above. If the context doesn't contain enough information to answer the question, say so.

Answer:
""")

def rag_chain(question):
    """Complete RAG pipeline"""
    
    # Step 1: Retrieve relevant documents
    print(f"🔍 Searching for: '{question}'")
    docs = retriever.get_relevant_documents(question)
    print(f"📄 Found {len(docs)} relevant documents")
    
    # Step 2: Create context from documents
    context = "\n\n".join([doc.page_content for doc in docs])
    print(f"📝 Context length: {len(context)} characters")
    
    # Step 3: Format the prompt
    formatted_prompt = rag_prompt.format(context=context, question=question)
    
    # Step 4: Generate response
    print("🤖 Generating response...")
    response = llm.invoke(formatted_prompt)
    
    # Step 5: Return answer with sources
    sources = [doc.metadata['source'] for doc in docs]
    
    return {
        'answer': response.content,
        'sources': sources,
        'num_docs': len(docs)
    }

print("✅ RAG chain created successfully!")

## 🎯 Step 10: Testing the Complete RAG System

Let's test our RAG system with various questions to see how it performs.

In [None]:
# Test questions about LangChain
test_questions = [
    "What is LangChain and what is it used for?",
    "How do text splitters work in LangChain?",
    "What are the benefits of using vector databases?",
    "How do embeddings help in document search?"
]

print("🎯 Testing RAG System:\n")
print("=" * 80)

for i, question in enumerate(test_questions, 1):
    print(f"\n📝 Test {i}: {question}")
    print("-" * 80)
    
    # Get RAG response
    result = rag_chain(question)
    
    # Display results
    print(f"\n🤖 Answer:")
    print(result['answer'])
    
    print(f"\n📚 Sources ({result['num_docs']} documents):")
    for j, source in enumerate(result['sources'], 1):
        print(f"  {j}. {source}")
    
    print("\n" + "=" * 80)

## 🔧 Step 11: Advanced RAG Features

Let's explore some advanced features to improve our RAG system.

In [None]:
# Advanced Retrieval: MMR (Maximal Marginal Relevance)
# This balances relevance with diversity to avoid redundant results

mmr_retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 3,           # Final number of documents
        "fetch_k": 10,    # Fetch more docs, then filter for diversity
        "lambda_mult": 0.7  # Balance: 0=max diversity, 1=max relevance
    }
)

print("🔧 Testing MMR retrieval:")
query = "What are the benefits of vector databases?"

# Compare standard vs MMR retrieval
standard_docs = retriever.get_relevant_documents(query)
mmr_docs = mmr_retriever.get_relevant_documents(query)

print(f"\nQuery: '{query}'")
print(f"\n📊 Standard Retrieval (top 3):")
for i, doc in enumerate(standard_docs):
    print(f"  {i+1}. {doc.page_content[:100]}...")

print(f"\n🎯 MMR Retrieval (diverse top 3):")
for i, doc in enumerate(mmr_docs):
    print(f"  {i+1}. {doc.page_content[:100]}...")

In [ ]:
# Metadata Filtering
# Search within specific sources only

print("🔍 Testing metadata filtering:")

# Filter by specific source
filtered_docs = vector_store.similarity_search(
    "embeddings",
    k=5,
    filter={"source": "https://python.langchain.com/docs/concepts/embedding_models/"}
)

print(f"\n📄 Documents about embeddings from specific source:")
for i, doc in enumerate(filtered_docs):
    print(f"  {i+1}. Source: {doc.metadata['source']}")
    print(f"     Preview: {doc.page_content[:150]}...\n")

In [None]:
# Enhanced RAG with confidence scoring

def enhanced_rag_chain(question, use_mmr=True, min_score=0.7):
    """Enhanced RAG with confidence scoring"""
    
    # Choose retrieval method
    if use_mmr:
        docs = mmr_retriever.get_relevant_documents(question)
        retrieval_method = "MMR"
    else:
        docs = retriever.get_relevant_documents(question)
        retrieval_method = "Standard"
    
    # Get similarity scores
    docs_with_scores = vector_store.similarity_search_with_score(question, k=len(docs))
    
    # Filter by minimum score
    high_confidence_docs = [(doc, score) for doc, score in docs_with_scores if score >= min_score]
    
    if not high_confidence_docs:
        return {
            'answer': "I don't have enough confident information to answer this question.",
            'sources': [],
            'confidence': 'Low',
            'retrieval_method': retrieval_method
        }
    
    # Use only high-confidence documents
    confident_docs = [doc for doc, score in high_confidence_docs]
    avg_score = sum(score for doc, score in high_confidence_docs) / len(high_confidence_docs)
    
    # Create context
    context = "\n\n".join([doc.page_content for doc in confident_docs])
    
    # Enhanced prompt with confidence indication
    enhanced_prompt = ChatPromptTemplate.from_template("""
    You are a helpful assistant answering questions based on LangChain documentation.
    
    Context (confidence score: {confidence:.2f}):
    {context}
    
    Question: {question}
    
    Provide a clear answer based on the context. If you're uncertain about any part of your answer, mention it.
    
    Answer:
    """)
    
    formatted_prompt = enhanced_prompt.format(
        context=context, 
        question=question, 
        confidence=avg_score
    )
    
    response = llm.invoke(formatted_prompt)
    
    # Determine confidence level
    if avg_score >= 0.9:
        confidence = "Very High"
    elif avg_score >= 0.8:
        confidence = "High"
    elif avg_score >= 0.7:
        confidence = "Medium"
    else:
        confidence = "Low"
    
    return {
        'answer': response.content,
        'sources': [doc.metadata['source'] for doc in confident_docs],
        'confidence': confidence,
        'avg_score': avg_score,
        'num_docs': len(confident_docs),
        'retrieval_method': retrieval_method
    }

print("✅ Enhanced RAG chain created!")

In [None]:
# Test enhanced RAG
print("🚀 Testing Enhanced RAG System:\n")

enhanced_questions = [
    "What is LangChain?",  # Should have high confidence
    "How do I cook pasta?",  # Should have low confidence (not in our docs)
]

for question in enhanced_questions:
    print(f"❓ Question: {question}")
    result = enhanced_rag_chain(question)
    
    print(f"🎯 Confidence: {result['confidence']} (Score: {result.get('avg_score', 0):.3f})")
    print(f"📊 Retrieval: {result['retrieval_method']} | Documents: {result['num_docs']}")
    print(f"🤖 Answer: {result['answer'][:200]}...")
    print(f"📚 Sources: {len(result['sources'])} documents")
    print("-" * 80 + "\n")

## 📊 Step 12: RAG System Evaluation

Let's analyze how well our RAG system performs.

In [None]:
# Evaluation metrics and analysis

def evaluate_retrieval(questions):
    """Evaluate retrieval quality"""
    results = []
    
    for question in questions:
        docs_with_scores = vector_store.similarity_search_with_score(question, k=5)
        scores = [score for doc, score in docs_with_scores]
        
        results.append({
            'question': question,
            'top_score': max(scores) if scores else 0,
            'avg_score': sum(scores) / len(scores) if scores else 0,
            'score_range': max(scores) - min(scores) if scores else 0
        })
    
    return results

# Test questions for evaluation
eval_questions = [
    "What is LangChain?",
    "How do embeddings work?",
    "What are vector databases?",
    "How to split text documents?",
    "What is retrieval augmented generation?"
]

eval_results = evaluate_retrieval(eval_questions)

print("📊 Retrieval Quality Analysis:")
print("-" * 80)

for result in eval_results:
    print(f"❓ {result['question']}")
    print(f"   Top Score: {result['top_score']:.3f}")
    print(f"   Avg Score: {result['avg_score']:.3f}")
    print(f"   Range: {result['score_range']:.3f}")
    print()

# Overall statistics
all_top_scores = [r['top_score'] for r in eval_results]
all_avg_scores = [r['avg_score'] for r in eval_results]

print(f"📈 Overall Performance:")
print(f"   Average Top Score: {sum(all_top_scores)/len(all_top_scores):.3f}")
print(f"   Average Avg Score: {sum(all_avg_scores)/len(all_avg_scores):.3f}")
print(f"   Questions with High Confidence (>0.8): {sum(1 for s in all_top_scores if s > 0.8)}/{len(all_top_scores)}")

## 🎓 Step 13: Key Takeaways and Best Practices

### What We've Learned:

#### 🏗️ **RAG Architecture**
- **Document Loading**: Web scraping → Clean text extraction
- **Text Processing**: Chunking with overlap for context preservation
- **Embeddings**: Convert text to semantic vectors
- **Vector Storage**: Fast similarity search with ChromaDB
- **Retrieval**: Find relevant context for queries
- **Generation**: LLM creates answers from context

#### ⚡ **Performance Optimizations**
- **Chunk Size**: Balance between context and specificity
- **Overlap**: Prevents important information from being split
- **Retrieval Methods**: MMR for diversity, similarity for relevance
- **Confidence Scoring**: Filter low-quality matches

#### 🛠️ **Best Practices**
1. **Start Simple**: Basic similarity search, then add complexity
2. **Measure Performance**: Track retrieval scores and answer quality
3. **Iterate on Chunks**: Experiment with sizes and overlap
4. **Source Attribution**: Always provide document sources
5. **Handle Edge Cases**: What if no good matches are found?

#### 🚀 **Next Steps**
- **Add More Documents**: Expand your knowledge base
- **Fine-tune Retrieval**: Adjust parameters based on your data
- **Add Memory**: For conversational RAG
- **Implement Caching**: For faster repeated queries
- **Add UI**: Build a chat interface like Streamlit

In [None]:
# Final interactive demo
print("🎯 Interactive RAG Demo - Ask your own questions!")
print("(Enter 'quit' to exit)\n")

while True:
    try:
        user_question = input("❓ Your question: ").strip()
        
        if user_question.lower() in ['quit', 'exit', 'q']:
            print("👋 Thanks for using the RAG tutorial!")
            break
        
        if not user_question:
            continue
        
        # Get enhanced RAG response
        result = enhanced_rag_chain(user_question)
        
        print(f"\n🤖 Answer (Confidence: {result['confidence']}):")
        print(result['answer'])
        
        if result['sources']:
            print(f"\n📚 Sources:")
            for i, source in enumerate(result['sources'], 1):
                print(f"  {i}. {source}")
        
        print("\n" + "-" * 60 + "\n")
        
    except KeyboardInterrupt:
        print("\n👋 Goodbye!")
        break
    except Exception as e:
        print(f"❌ Error: {e}")
        print("Please try again.\n")

## 🎉 Congratulations!

You've successfully built a complete RAG system from scratch! 

### What You've Accomplished:
- ✅ **Document Loading** from websites
- ✅ **Text Splitting** with smart chunking
- ✅ **Embedding Creation** for semantic search
- ✅ **Vector Database** setup with ChromaDB
- ✅ **Retrieval System** with multiple strategies
- ✅ **LLM Integration** with E2E Networks
- ✅ **Complete RAG Pipeline** with confidence scoring
- ✅ **Performance Evaluation** and optimization

### Ready for Production?
Your RAG system is now ready to be integrated into applications like:
- **Chatbots** for customer support
- **Documentation assistants** for internal tools
- **Research tools** for academic work
- **Knowledge management** systems

Happy building! 🚀