# Day 22: Basic RAG Implementation

In this notebook, we'll build a simple Retrieval-Augmented Generation (RAG) pipeline from scratch. This will help solidify the concepts of chunking, embeddings, and vector stores.

## Overview

We will cover the following steps:
1.  **Setup**: Install necessary libraries.
2.  **Document Loading & Chunking**: Prepare our knowledge base.
3.  **Embedding**: Convert text chunks into vector embeddings.
4.  **Indexing**: Store the embeddings in a local vector store (FAISS).
5.  **Retrieval & Generation**: Build the pipeline to answer questions using our knowledge base.

## 1. Setup

First, we need to install the required libraries. We'll use `sentence-transformers` for creating embeddings and `faiss-cpu` for our local vector store.

In [None]:
!pip install sentence-transformers faiss-cpu openai python-dotenv

In [None]:
import os
import openai
import numpy as np
from sentence_transformers import SentenceTransformer
import faiss
from dotenv import load_dotenv

# Load environment variables for OpenAI API key
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

# Helper function for LLM calls
def get_llm_response(prompt):
    if not openai.api_key:
        print("OpenAI API key not found. Returning a placeholder response.")
        return f"Based on the provided context, the answer to your question is... [Simulated Response]"
    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.1
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

## 2. Document Loading & Chunking

For simplicity, we'll use a single string as our document. We'll then write a function to split it into smaller, overlapping chunks.

In [None]:
# Our knowledge base: A simple text about the fictional planet 'Zoltar'
document = (
    "The planet Zoltar is a marvel of the Andromeda galaxy, known for its twin suns, Helios Prime and Helios Beta, which create a perpetual twilight. "
    "The surface of Zoltar is covered in crystalline forests that hum with a low-frequency energy. This energy is harnessed by the native Zoltarians, a species of sentient, silicon-based lifeforms. "
    "Zoltarians communicate through a complex series of light patterns, a language known as 'Luminar'. Their society is structured around the 'Great Crystal', a massive geological formation at the planet's north pole that is believed to be the source of all life. "
    "The Zoltarian diet consists of absorbing geothermal energy from volcanic vents scattered across the planet. They reproduce asexually, budding off smaller versions of themselves once every 'Great Cycle', which corresponds to 50 Earth years. "
    "Zoltar's atmosphere is composed mainly of nitrogen and argon, and is unbreathable for carbon-based lifeforms. The planet has a strong magnetic field, which protects it from the intense solar winds of its twin suns. The average temperature on Zoltar is a cool -30 degrees Celsius."
)

def chunk_text(text, chunk_size=150, overlap=30):
    """Splits text into overlapping chunks."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunk = ' '.join(words[i:i + chunk_size])
        chunks.append(chunk)
    return chunks

# Create chunks
text_chunks = chunk_text(document)

print(f"Created {len(text_chunks)} chunks.")
for i, chunk in enumerate(text_chunks):
    print(f"--- Chunk {i+1} ---")
    print(chunk + '\n')

## 3. Embedding

Next, we'll use a sentence-transformer model to convert our text chunks into numerical vectors (embeddings).

In [None]:
# Load a pre-trained embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings for each chunk
chunk_embeddings = embedding_model.encode(text_chunks, convert_to_tensor=False)

print(f"Shape of the embedding matrix: {chunk_embeddings.shape}")
print(f"Embedding dimension: {chunk_embeddings.shape[1]}")

## 4. Indexing

Now, we'll store these embeddings in a FAISS index for efficient similarity search.

In [None]:
# Get the dimension of the embeddings
d = chunk_embeddings.shape[1]

# Create a FAISS index
index = faiss.IndexFlatL2(d)

# Add the chunk embeddings to the index
index.add(chunk_embeddings.astype('float32'))

print(f"FAISS index created with {index.ntotal} vectors.")

## 5. Retrieval & Generation

Finally, let's build the core RAG logic. We'll take a user query, find relevant chunks, and use an LLM to generate an answer based on that context.

In [None]:
def answer_question(query, k=2):
    """The main RAG pipeline function."""
    # 1. Embed the user query
    query_embedding = embedding_model.encode([query])
    
    # 2. Retrieve relevant chunks from the vector store
    distances, indices = index.search(query_embedding.astype('float32'), k)
    
    # Get the actual text chunks
    retrieved_chunks = [text_chunks[i] for i in indices[0]]
    context = "\n\n---\n\n".join(retrieved_chunks)
    
    print(f"--- Retrieved Context ---")
    print(context + '\n')
    
    # 3. Build the augmented prompt
    prompt = (
        f"Based on the following context, please answer the question.\n"
        f"Context:\n{context}\n\n"
        f"Question: {query}\n"
        f"Answer:"
    )
    
    # 4. Generate the final answer using the LLM
    final_answer = get_llm_response(prompt)
    return final_answer

# --- Let's test our RAG system! ---
query1 = "How do Zoltarians communicate?"
print(f'\n>>> Query: {query1}')
answer1 = answer_question(query1)
print(f'\n>>> Final Answer: {answer1}')

print('\n' + '='*50 + '\n')

query2 = "What do Zoltarians eat?"
print(f'>>> Query: {query2}')
answer2 = answer_question(query2)
print(f'\n>>> Final Answer: {answer2}')

## 6. Conclusion

Congratulations! You've built a complete, albeit simple, RAG pipeline from scratch. We have successfully:

- Loaded and chunked a source document.
- Generated vector embeddings for each chunk.
- Stored and indexed these embeddings in a FAISS vector store.
- Implemented a retrieval-and-generation loop to answer questions based on the document.

This forms the foundation for more complex RAG systems. In the upcoming days, we'll explore how to improve each component of this pipeline, from enhancing retrieval quality with rerankers to optimizing the final prompt for the LLM.