# RAG-RARE System: Advanced Retrieval-Augmented Generation

This notebook implements an advanced Retrieval-Augmented Generation (RAG) system, designed to provide accurate and contextually relevant responses by combining the power of Large Language Models (LLMs) with external knowledge retrieval. The system is built to handle various document types, perform intelligent chunking, leverage powerful embeddings, and integrate with a vector store for efficient semantic search.

## Key Features:

1.  **Multi-Document Support**: Capable of processing diverse document formats.
2.  **Context-Aware Chunking**: Utilizes `RecursiveCharacterTextSplitter` to maintain semantic coherence within document chunks.
3.  **Robust Embeddings**: Employs OpenAI's `text-embedding-ada-002` for high-quality vector representations.
4.  **Efficient Vector Store**: Uses `FAISS` (Facebook AI Similarity Search) for fast and scalable similarity search.
5.  **Advanced Retrieval**: Implements `MultiQueryRetriever` and `ContextualCompressionRetriever` for improved query understanding and relevant document selection.
6.  **Powerful LLM Integration**: Leverages `ChatOpenAI` for generating coherent and contextually rich responses.
7.  **Conversational Memory**: Integrates `ConversationBufferMemory` to maintain conversation history, enabling follow-up questions.
8.  **Chaining for Cohesion**: Utilizes `ConversationalRetrievalChain` to seamlessly integrate retrieval and generation components.

## Setup and Installation

Before running the notebook, ensure you have the necessary libraries installed and your environment variables configured.

In [None]:
# Install necessary libraries
!pip install -qU langchain langchain-openai pypdf faiss-cpu tiktoken transformers langchain_community

# Import required modules
import os
from langchain.chains import ConversationalRetrievalChain
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.retrievers import ContextualCompressionRetriever


## 1. Environment Setup

Set your OpenAI API key. **Crucially, never hardcode your API keys directly in publicly shared notebooks.** It's best practice to load them from environment variables.

To do this:
1.  Create a `.env` file in your project directory (if not already present).
2.  Add your API key to it: `OPENAI_API_KEY="YOUR_OPENAI_API_KEY"`
3.  Load the environment variables at the start of your script or notebook.

In [None]:
# Set your OpenAI API key from environment variables
# Make sure to set OPENAI_API_KEY in your environment before running this.
# For example: os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# It's highly recommended to load this from a .env file using python-dotenv
# from dotenv import load_dotenv
# load_dotenv()
# os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY" # Replace with your actual key or load from env

## 2. Document Loading and Preprocessing

This section handles loading documents. You can load various file types. For demonstration, we'll use a placeholder for document paths. Replace `'path/to/your/document.pdf'` or `'path/to/your/textfile.txt'` with your actual document paths.

We use `PyPDFLoader` for PDF files and `TextLoader` for text files.

In [None]:
# Example document loading (replace with your actual document paths)

# For a PDF document:
# loader = PyPDFLoader("path/to/your/document.pdf") 
# documents = loader.load()

# For a text document:
# loader = TextLoader("path/to/your/textfile.txt")
# documents = loader.load()

# Placeholder: Creating dummy documents for demonstration if no files are loaded
from langchain.docstore.document import Document

dummy_content = [
    "The quick brown fox jumps over the lazy dog. This is a common English pangram used for testing typewriters and computer keyboards.",
    "Artificial intelligence is rapidly advancing, with new breakthroughs in machine learning and neural networks. It has applications in various fields.",
    "Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. It turns data scripts into shareable web apps in minutes.",
    "Retrieval-Augmented Generation (RAG) combines information retrieval with text generation. It allows language models to access external knowledge bases, improving factual accuracy and reducing hallucinations.",
    "FAISS, developed by Facebook AI, is a library for efficient similarity search and clustering of dense vectors. It is widely used for building vector databases."
]

documents = [Document(page_content=content) for content in dummy_content]

print(f"Loaded {len(documents)} documents.")

## 3. Text Splitting (Chunking)

Documents are often too large to fit into an LLM's context window directly. We split them into smaller, semantically coherent chunks using `RecursiveCharacterTextSplitter`. This method attempts to split by paragraphs, then sentences, then words, ensuring that chunks try to maintain context.

- `chunk_size`: The maximum size of each text chunk.
- `chunk_overlap`: The number of characters to overlap between consecutive chunks. This helps maintain context across chunk boundaries.

In [None]:
# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
)

# Split the loaded documents into chunks
texts = text_splitter.split_documents(documents)

print(f"Split into {len(texts)} chunks.")
# Optional: print first chunk to verify
# print(texts[0].page_content)

## 4. Embeddings and Vector Store Creation

This step converts our text chunks into numerical vector representations (embeddings) and stores them in a vector database for efficient similarity search.

- **Embeddings**: We use `OpenAIEmbeddings` with the `text-embedding-ada-002` model, known for its high quality.
- **Vector Store**: `FAISS` is chosen for its in-memory performance and suitability for local development. For production, you might consider persistent or cloud-based vector stores.

The embeddings model converts text into a dense vector, where semantically similar texts are closer in the vector space.

In [None]:
# Initialize OpenAI embeddings model
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

# Create a FAISS vector store from the document chunks and embeddings
vectorstore = FAISS.from_documents(texts, embeddings)

print("FAISS vector store created successfully.")

## 5. Initialize the Large Language Model (LLM)

We initialize the `ChatOpenAI` model, which will be responsible for generating answers based on the retrieved context.

- `model_name`: Specifies the OpenAI model to use (e.g., `gpt-3.5-turbo`, `gpt-4`).
- `temperature`: Controls the randomness of the output. Lower values (e.g., 0) make the output more deterministic and factual, while higher values make it more creative.

In [None]:
# Initialize the ChatOpenAI model
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.0)

print("LLM initialized.")

## 6. Advanced Retrieval Setup (MultiQuery and Contextual Compression)

To improve retrieval quality, we employ two advanced techniques:

### a. MultiQueryRetriever
Generates multiple distinct queries from a single user query. This helps to explore different facets of the user's intent and potentially retrieve more relevant documents, especially for ambiguous or broad questions.

### b. ContextualCompressionRetriever
This retriever compresses the retrieved documents to only include the most relevant parts. It uses an LLM (`LLMChainExtractor`) to identify and extract the passages most pertinent to the query, reducing noise and improving the LLM's focus.

In [None]:
# --- MultiQueryRetriever Setup ---
template = """You are an AI language model assistant. Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector database.
By generating multiple perspectives on the user's original query, your goal is to help the user retrieve a broader set of relevant documents.
The original question is: {question}
"""
prompt_perspectives = PromptTemplate(input_variables=["question"], template=template)

# Define the MultiQueryRetriever
# Note: The vectorstore.as_retriever() creates a basic retriever from the FAISS index
retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(), llm=llm
)

print("MultiQueryRetriever initialized.")

# --- ContextualCompressionRetriever Setup ---
# This uses an LLM to extract only the relevant parts of the retrieved documents
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=retriever_from_llm # Using MultiQueryRetriever as the base
)

print("ContextualCompressionRetriever initialized.")

## 7. Conversational Memory

To enable a multi-turn conversation, we use `ConversationBufferMemory`. This stores previous messages (both user inputs and AI responses) and injects them into the context for subsequent queries, allowing the LLM to understand and respond to follow-up questions contextually.

In [None]:
# Initialize conversation memory
memory = ConversationBufferMemory(
    memory_key='chat_history',
    return_messages=True,
    output_key='answer'
)

print("Conversation memory initialized.")

## 8. Conversational Retrieval Chain

The `ConversationalRetrievalChain` is the core of our RAG system. It orchestrates the entire process:

1.  **Retrieval**: Takes the user's query and conversation history, generates relevant queries (via `MultiQueryRetriever`), fetches documents (via `FAISS`), and compresses them (via `ContextualCompressionRetriever`).
2.  **Context Construction**: Combines the retrieved documents with the conversation history.
3.  **Generation**: Passes this combined context to the LLM (`ChatOpenAI`) to generate a coherent and informed answer.

- `llm`: The language model used for generation.
- `retriever`: The retriever responsible for fetching relevant documents. Here we use our enhanced `compression_retriever`.
- `memory`: The component that stores and manages conversation history.
- `return_source_documents`: If `True`, the chain will return the source documents used to generate the answer, which is useful for debugging and transparency.
- `return_generated_question`: If `True`, the chain will return the question it generated to query the retriever based on the chat history.

In [None]:
# Create the conversational retrieval chain
qa = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=compression_retriever, # Use the advanced compression retriever
    memory=memory,
    return_source_documents=True,
    return_generated_question=True
)

print("ConversationalRetrievalChain initialized.")

## 9. Interacting with the RAG System

Now you can interact with your RAG system by asking questions. The `qa` object will process your query, retrieve relevant information, and generate a response. The `chat_history` will maintain context across turns.

**Example Usage:**

In [None]:
# Example questions
query1 = "What is RAG?"
result1 = qa.invoke({"question": query1})
print(f"Question 1: {query1}")
print(f"Answer 1: {result1['answer']}")
print("--- Source Documents for Q1 ---")
for i, doc in enumerate(result1['source_documents']):
    print(f"Document {i+1}:\n{doc.page_content[:200]}...") # Print first 200 chars
print("\n" + "="*50 + "\n")

query2 = "How does it improve LLMs?"
result2 = qa.invoke({"question": query2})
print(f"Question 2: {query2}")
print(f"Answer 2: {result2['answer']}")
print("--- Source Documents for Q2 ---")
for i, doc in enumerate(result2['source_documents']):
    print(f"Document {i+1}:\n{doc.page_content[:200]}...")
print("\n" + "="*50 + "\n")

query3 = "Can you tell me about Streamlit and FAISS?"
result3 = qa.invoke({"question": query3})
print(f"Question 3: {query3}")
print(f"Answer 3: {result3['answer']}")
print("--- Source Documents for Q3 ---")
for i, doc in enumerate(result3['source_documents']):
    print(f"Document {i+1}:\n{doc.page_content[:200]}...")
print("\n" + "="*50 + "\n")