# Retrieval-Augmented Generation (RAG) Pipeline
## AI Engineering Assignment

This notebook demonstrates a complete RAG pipeline using LangChain:
1. Load sample documents
2. Split into chunks
3. Create embeddings
4. Store in FAISS vector database
5. Retrieve relevant chunks
6. Generate answer using LLM

**Important Setup:**
- Make sure you're using the **"RAG Assignment (Python 3.9)"** kernel
- All packages are pre-installed in the virtual environment
- You'll need an OpenAI API key or Anthropic API key

## Step 1: Verify Libraries

All packages are pre-installed in the dedicated virtual environment. Let's verify they're available.

In [9]:
# Verify installations (packages are pre-installed in virtual environment)
import langchain
import faiss
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

print("✓ All libraries loaded successfully!")
print(f"✓ LangChain version: {langchain.__version__}")
print(f"✓ FAISS available")
print(f"\nReady to build RAG pipeline!")

✓ All libraries loaded successfully!
✓ LangChain version: 0.3.27
✓ FAISS available

Ready to build RAG pipeline!


## Step 2: Configure API Keys

You'll need an API key for the chat model (Claude or OpenAI).

**Recommendation**: Use **Anthropic Claude** - more generous free tier!

In [10]:
import os
import getpass

# ============================================
# Choose which LLM API to use
# ============================================
USE_OPENAI = False  # Set to True to use OpenAI instead of Claude

# For embeddings, we use FREE HuggingFace (no API key needed)
# For chat LLM, you need either Anthropic or OpenAI API key

if USE_OPENAI:
    if "OPENAI_API_KEY" not in os.environ:
        os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
    print("✓ OpenAI API key configured")
    print("  Using: GPT-3.5-turbo for chat")
else:
    if "ANTHROPIC_API_KEY" not in os.environ:
        os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter your Anthropic API key: ")
    print("✓ Anthropic API key configured")
    print("  Using: Claude-3.5-Haiku for chat (fast & affordable)")

print("\nNote: Embeddings use FREE HuggingFace (no API credits needed!)")

✓ Anthropic API key configured
  Using: Claude-3.5-Haiku for chat (fast & affordable)

Note: Embeddings use FREE HuggingFace (no API credits needed!)


## Step 3: Load Sample Documents
Creating sample text documents about AI topics for our knowledge base.

In [11]:
from langchain.schema import Document

# Create sample documents about AI and Machine Learning
documents = [
    Document(
        page_content="""Artificial Intelligence (AI) is the simulation of human intelligence processes by machines, 
        especially computer systems. These processes include learning, reasoning, and self-correction. 
        AI applications include expert systems, natural language processing, speech recognition, and machine vision. 
        AI has become increasingly important in modern technology and is used in various industries including 
        healthcare, finance, transportation, and entertainment.""",
        metadata={"source": "ai_basics.txt", "topic": "AI Introduction"}
    ),
    Document(
        page_content="""Machine Learning is a subset of artificial intelligence that focuses on the development of 
        algorithms and statistical models that enable computers to learn and improve from experience without being 
        explicitly programmed. There are three main types of machine learning: supervised learning, unsupervised learning, 
        and reinforcement learning. Supervised learning uses labeled data, unsupervised learning finds patterns in 
        unlabeled data, and reinforcement learning learns through trial and error with rewards.""",
        metadata={"source": "machine_learning.txt", "topic": "Machine Learning"}
    ),
    Document(
        page_content="""Deep Learning is a specialized subset of machine learning that uses neural networks with 
        multiple layers (deep neural networks). These networks are inspired by the structure and function of the human brain. 
        Deep learning has achieved remarkable success in areas such as image recognition, natural language processing, 
        and game playing. Popular deep learning frameworks include TensorFlow, PyTorch, and Keras. Deep learning models 
        require large amounts of data and computational power to train effectively.""",
        metadata={"source": "deep_learning.txt", "topic": "Deep Learning"}
    ),
    Document(
        page_content="""Natural Language Processing (NLP) is a branch of artificial intelligence that helps computers 
        understand, interpret, and manipulate human language. NLP combines computational linguistics with statistical, 
        machine learning, and deep learning models. Applications of NLP include machine translation, sentiment analysis, 
        chatbots, text summarization, and question answering systems. Modern NLP has been revolutionized by transformer 
        models like BERT and GPT.""",
        metadata={"source": "nlp.txt", "topic": "NLP"}
    ),
    Document(
        page_content="""Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with 
        text generation. RAG systems first retrieve relevant documents from a knowledge base, then use those documents 
        as context for a language model to generate accurate and informed responses. This approach helps reduce hallucinations 
        and provides more factual, grounded answers. RAG is particularly useful for building AI systems that need to answer 
        questions based on specific, up-to-date, or proprietary information.""",
        metadata={"source": "rag.txt", "topic": "RAG"}
    ),
    Document(
        page_content="""Vector databases are specialized databases designed to store and efficiently search through 
        high-dimensional vector embeddings. These embeddings are numerical representations of data such as text, images, 
        or audio. Vector databases use similarity search algorithms like cosine similarity or euclidean distance to find 
        the most relevant vectors. Popular vector databases include FAISS, Pinecone, Weaviate, and ChromaDB. They are 
        essential components of modern RAG systems and semantic search applications.""",
        metadata={"source": "vector_db.txt", "topic": "Vector Databases"}
    )
]

print(f"Loaded {len(documents)} documents")
print("\nDocument sources:")
for i, doc in enumerate(documents, 1):
    print(f"{i}. {doc.metadata['source']} - {doc.metadata['topic']}")
    print(f"   Content preview: {doc.page_content[:100]}...\n")

Loaded 6 documents

Document sources:
1. ai_basics.txt - AI Introduction
   Content preview: Artificial Intelligence (AI) is the simulation of human intelligence processes by machines, 
       ...

2. machine_learning.txt - Machine Learning
   Content preview: Machine Learning is a subset of artificial intelligence that focuses on the development of 
        ...

3. deep_learning.txt - Deep Learning
   Content preview: Deep Learning is a specialized subset of machine learning that uses neural networks with 
        mu...

4. nlp.txt - NLP
   Content preview: Natural Language Processing (NLP) is a branch of artificial intelligence that helps computers 
     ...

5. rag.txt - RAG
   Content preview: Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with 
      ...

6. vector_db.txt - Vector Databases
   Content preview: Vector databases are specialized databases designed to store and efficiently search through 
       ...



## Step 4: Split Documents into Chunks
Breaking down documents into smaller chunks for better retrieval and processing.

In [12]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Initialize text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,        # Maximum size of each chunk
    chunk_overlap=50,      # Overlap between chunks to maintain context
    length_function=len,
    separators=["\n\n", "\n", ". ", " ", ""]
)

# Split documents
splits = text_splitter.split_documents(documents)

print(f"Split {len(documents)} documents into {len(splits)} chunks")
print("\nFirst 3 chunks:")
for i, chunk in enumerate(splits[:3], 1):
    print(f"\n--- Chunk {i} ---")
    print(f"Source: {chunk.metadata['source']}")
    print(f"Length: {len(chunk.page_content)} characters")
    print(f"Content: {chunk.page_content[:200]}...")

Split 6 documents into 10 chunks

First 3 chunks:

--- Chunk 1 ---
Source: ai_basics.txt
Length: 489 characters
Content: Artificial Intelligence (AI) is the simulation of human intelligence processes by machines, 
        especially computer systems. These processes include learning, reasoning, and self-correction. 
   ...

--- Chunk 2 ---
Source: machine_learning.txt
Length: 446 characters
Content: Machine Learning is a subset of artificial intelligence that focuses on the development of 
        algorithms and statistical models that enable computers to learn and improve from experience without...

--- Chunk 3 ---
Source: machine_learning.txt
Length: 87 characters
Content: unlabeled data, and reinforcement learning learns through trial and error with rewards....


## Step 5: Create Embeddings and FAISS Vector Store
Converting text chunks into vector embeddings and storing them in a local FAISS database.

In [13]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

# ============================================
# Choose Embedding Model
# ============================================
USE_OPENAI_EMBEDDINGS = False  # Set to True if you have OpenAI credits

if USE_OPENAI_EMBEDDINGS:
    # Option 1: OpenAI Embeddings (requires credits)
    embeddings = OpenAIEmbeddings(
        model="text-embedding-ada-002"
    )
    print("Using OpenAI embeddings (text-embedding-ada-002)")
else:
    # Option 2: Free HuggingFace Embeddings (no API key needed!)
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2"
    )
    print("Using FREE HuggingFace embeddings (all-MiniLM-L6-v2)")

print("Creating embeddings and building FAISS vector store...")
print("This may take a few moments...\n")

# Create FAISS vector store from documents
vectorstore = FAISS.from_documents(
    documents=splits,
    embedding=embeddings
)

print(f"✓ FAISS vector store created successfully!")
print(f"✓ Stored {len(splits)} document chunks as vector embeddings")
print(f"\nVector store is ready for similarity search!")

Using FREE HuggingFace embeddings (all-MiniLM-L6-v2)
Creating embeddings and building FAISS vector store...
This may take a few moments...

✓ FAISS vector store created successfully!
✓ Stored 10 document chunks as vector embeddings

Vector store is ready for similarity search!


## Step 6: Retrieve Relevant Chunks
Performing similarity search to find the most relevant document chunks for a query.

In [14]:
# Define the query
query = "What is RAG and how does it work?"

print(f"Query: {query}\n")
print("=" * 80)

# Retrieve top-k most similar documents
k = 3  # Number of documents to retrieve
retrieved_docs = vectorstore.similarity_search(query, k=k)

print(f"\nRetrieved {len(retrieved_docs)} most relevant chunks:\n")

for i, doc in enumerate(retrieved_docs, 1):
    print(f"\n{'='*80}")
    print(f"RETRIEVED DOCUMENT {i}")
    print(f"{'='*80}")
    print(f"Source: {doc.metadata['source']}")
    print(f"Topic: {doc.metadata['topic']}")
    print(f"\nContent:\n{doc.page_content}")

print(f"\n{'='*80}")

Query: What is RAG and how does it work?


Retrieved 3 most relevant chunks:


RETRIEVED DOCUMENT 1
Source: vector_db.txt
Topic: Vector Databases

Content:
essential components of modern RAG systems and semantic search applications.

RETRIEVED DOCUMENT 2
Source: rag.txt
Topic: RAG

Content:
Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with 
        text generation. RAG systems first retrieve relevant documents from a knowledge base, then use those documents 
        as context for a language model to generate accurate and informed responses. This approach helps reduce hallucinations 
        and provides more factual, grounded answers. RAG is particularly useful for building AI systems that need to answer

RETRIEVED DOCUMENT 3
Source: nlp.txt
Topic: NLP

Content:
Natural Language Processing (NLP) is a branch of artificial intelligence that helps computers 
        understand, interpret, and manipulate human language. NLP combines computational

## Step 7: Generate Answer Using LLM
Using a chat model to generate a final answer based on the retrieved context.

In [15]:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# Initialize the chat model
if USE_OPENAI:
    llm = ChatOpenAI(
        model="gpt-3.5-turbo",
        temperature=0
    )
    print("Using OpenAI GPT-3.5-turbo")
else:
    # Using Claude 3.5 Haiku - fast, affordable, and generous free tier!
    llm = ChatAnthropic(
        model="claude-3-5-haiku-20241022",
        temperature=0
    )
    print("Using Anthropic Claude-3.5-Haiku (fast & affordable!)")

# Create a custom prompt template
prompt_template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:
{context}

Question: {question}

Answer: Let me provide a detailed answer based on the context above."""

PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

# Create retrieval QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

print("\n✓ RAG chain created successfully!")
print("✓ Ready to answer questions!")

Using Anthropic Claude-3.5-Haiku (fast & affordable!)

✓ RAG chain created successfully!
✓ Ready to answer questions!


## Step 8: Display Complete Results
Showing the query, retrieved context, and final generated answer together.

In [16]:
# Run the query through the RAG pipeline
result = qa_chain.invoke({"query": query})

# Display results
print("\n" + "="*80)
print("RAG PIPELINE - COMPLETE RESULTS")
print("="*80)

print(f"\n📝 QUERY:\n{query}")

print("\n" + "="*80)
print("📚 RETRIEVED CONTEXT (Top 3 Chunks)")
print("="*80)

for i, doc in enumerate(result['source_documents'], 1):
    print(f"\n--- Context Chunk {i} ---")
    print(f"Source: {doc.metadata['source']}")
    print(f"Topic: {doc.metadata['topic']}")
    print(f"\nContent:\n{doc.page_content}")
    print("-" * 80)

print("\n" + "="*80)
print("🤖 GENERATED ANSWER")
print("="*80)
print(f"\n{result['result']}")

print("\n" + "="*80)
print("✅ RAG Pipeline Execution Completed Successfully!")
print("="*80)


RAG PIPELINE - COMPLETE RESULTS

📝 QUERY:
What is RAG and how does it work?

📚 RETRIEVED CONTEXT (Top 3 Chunks)

--- Context Chunk 1 ---
Source: vector_db.txt
Topic: Vector Databases

Content:
essential components of modern RAG systems and semantic search applications.
--------------------------------------------------------------------------------

--- Context Chunk 2 ---
Source: rag.txt
Topic: RAG

Content:
Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with 
        text generation. RAG systems first retrieve relevant documents from a knowledge base, then use those documents 
        as context for a language model to generate accurate and informed responses. This approach helps reduce hallucinations 
        and provides more factual, grounded answers. RAG is particularly useful for building AI systems that need to answer
--------------------------------------------------------------------------------

--- Context Chunk 3 ---
Source: nlp.tx

## Additional Example Queries
Try the RAG system with different questions!

In [17]:
# Try additional queries
example_queries = [
    "What are the three types of machine learning?",
    "Explain what vector databases are used for",
    "What is the difference between AI and deep learning?"
]

print("Testing RAG with additional queries...\n")

for i, test_query in enumerate(example_queries, 1):
    print(f"\n{'='*80}")
    print(f"Example Query {i}: {test_query}")
    print("="*80)
    
    test_result = qa_chain.invoke({"query": test_query})
    
    print(f"\nAnswer:\n{test_result['result']}")
    print(f"\nSources used: {', '.join([doc.metadata['source'] for doc in test_result['source_documents']])}")

Testing RAG with additional queries...


Example Query 1: What are the three types of machine learning?

Answer:
According to the context, the three main types of machine learning are:

1. Supervised learning (which uses labeled data)
2. Unsupervised learning (which finds patterns in data)
3. Reinforcement learning

Note: While the context was cut off mid-sentence for both supervised and unsupervised learning descriptions, these three types are explicitly stated in the first paragraph about Machine Learning.

Sources used: machine_learning.txt, ai_basics.txt, deep_learning.txt

Example Query 2: Explain what vector databases are used for

Answer:
Based on the context provided, vector databases are specialized databases designed to store and efficiently search through high-dimensional vector embeddings. These embeddings are numerical representations of various types of data like text, images, or audio.

The primary purpose of vector databases is to enable similarity search using algorith

## Summary

This notebook demonstrated a complete RAG pipeline:

✅ **Step 1**: Verified required libraries (pre-installed in virtual environment)  
✅ **Step 2**: Configured API keys for LLM access  
✅ **Step 3**: Loaded 6 sample documents about AI topics  
✅ **Step 4**: Split documents into manageable chunks  
✅ **Step 5**: Created vector embeddings and stored in FAISS  
✅ **Step 6**: Retrieved top-k relevant chunks using similarity search  
✅ **Step 7**: Generated answers using OpenAI/Claude LLM  
✅ **Step 8**: Displayed complete results with context and answer  

**Key Components:**
- **Vector Store**: FAISS (local, no cloud required)
- **Embeddings**: OpenAI text-embedding-ada-002
- **LLM**: OpenAI GPT-3.5-turbo or Anthropic Claude
- **Framework**: LangChain

**Screenshot Checklist for Assignment:**
1. ✓ Libraries verified (Cell 1)
2. ✓ Documents loaded and split (Cells 3-4)
3. ✓ Embeddings created (Cell 5)
4. ✓ FAISS vector store created (Cell 5)
5. ✓ Query retrieval results (Cell 6)
6. ✓ Final generated answer (Cell 8)