## Vanilla RAG (Retrieval-Augmented Generation)

Vanilla RAG (Retrieval-Augmented Generation) is a model architecture that combines retrieval-based and generative approaches to improve natural language understanding and generation tasks. In vanilla RAG, the model retrieves relevant information from an external knowledge base (such as a document store or search index) and then uses that retrieved information to generate more accurate and contextually rich responses. This setup enhances the model's ability to handle open-domain questions or tasks where a large external knowledge source is necessary.

Vanilla RAG typically consists of two key components:

1. **Retriever**: Retrieves relevant documents or passages based on the input query.
2. **Generator**: Uses the retrieved documents to generate a response, often leveraging a transformer-based language model.

This hybrid approach helps overcome the limitations of solely generative or retrieval models, making it ideal for tasks that require both precise information retrieval and coherent text generation.

## Overview

This notebook helps to create a Retrieval-Augmented Generation (RAG) model using **Langchain**. It walks through the entire pipeline from loading data to creating a custom RAG chain for generating responses.

### Steps:

1. Required Package Installation
2. Imports
3. Loading and Splitting
4. Post-Processing
5. Getting Embeddings
6. Database Creation
7. LLM by Groq
8. System Prompt
9. RAG Chain

### Step-1: Required Package Installation

These dependencies will set up a complete environment for working on a RAG system using Langchain, along with embeddings, document retrieval, and generative models.

In [None]:
!pip install transformers==4.44.2 langchain==0.3.3 \
                             langchain-community==0.3.0 \
                             langchain-core==0.3.10 \
                             langchain-text-splitters==0.3.0 \
                            chroma-hnswlib==0.7.6 \
                             chromadb==0.5.11 \
                             accelerate==1.0.1 \
                             pypdf \
                             ipywidgets \
                            langchain-groq \
                            huggingface-hub==0.25.1 \
                            langchain-huggingface==0.1.0 \
                            InstructorEmbedding==1.0.1 \
                             rank-bm25==0.2.2
!pip install sentence-transformers==2.2.2

### Step-2: Imports

These imports set up an environment that integrates document loading, embeddings, vector stores, and interactions with large language models (LLMs), making it suitable for building RAG (Retrieval-Augmented Generation) systems and other NLP workflows.

In [2]:
# Standard library imports
import os
from pathlib import Path
import ipywidgets as widgets
import warnings
warnings.filterwarnings("ignore")

# Chroma and related imports
from chromadb.config import Settings
from langchain_community.vectorstores import Chroma

# Langchain related imports
import langchain
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq

### Step-3: Loading & Splitting

1. **PDF Loading**: Loads the content of a PDF file located at `pdf_path`.

2. **Document Splitting**: Splits the loaded PDF text into smaller chunks of 1000 characters, with an overlap of 300 characters between chunks. This helps in managing large documents by breaking them into smaller, more manageable parts.

In [None]:
# PDF Loading
pdf_path = "./Orca_paper.pdf"
loader = PyPDFLoader(pdf_path)

docs = loader.load()
if not docs:
    raise ValueError("No documents loaded from the PDF.")

# Splitting docs into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=300)
document_chunks = text_splitter.split_documents(docs)

print(document_chunks[0])

page_content='Orca: Progressive Learning from Complex
Explanation Traces of GPT-4
Subhabrata Mukherjee∗†, Arindam Mitra∗
Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awadallah
Microsoft Research
Abstract
Recent research has focused on enhancing the capability of smaller models
through imitation learning, drawing on the outputs generated by large
foundation models (LFMs). A number of issues impact the quality of these
models, ranging from limited imitation signals from shallow LFM outputs;
small scale homogeneous training data; and most notably a lack of rigorous
evaluation resulting in overestimating the small model’s capability as they
tend to learn to imitate the style, but not the reasoning process of LFMs . To
address these challenges, we develop Orca, a 13-billion parameter model
that learns to imitate the reasoning process of LFMs. Orca learns from
rich signals from GPT-4 including explanation traces; step-by-step thought' metadata={'source': 'data/Orca_paper.pdf', 'page':

### Step-4: Post-Processing

1. **`format_docs(docs)`**: The function takes a list of documents (`docs`) and joins their `page_content` into a single string. The content is separated by two newline characters (`\n\n`) for better readability.


In [4]:
# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

### Step-5: Getting Embeddings

1. **`EMBEDDING_MODEL_NAME`**: Specifies the name of the pre-trained embedding model, which is `"hkunlp/instructor-large"`. This model is designed for generating embeddings.

2. **`get_embeddings()`**: Initializes the `HuggingFaceInstructEmbeddings` with the given model name and provides instructions for how to represent documents and queries for retrieval.


In [5]:
# Pre-trained Embedding Model.
EMBEDDING_MODEL_NAME = "hkunlp/instructor-large"

#Embeddings
def get_embeddings():
    embeddings = HuggingFaceInstructEmbeddings(
            model_name=EMBEDDING_MODEL_NAME,
            embed_instruction="Represent the document for retrieval:",
            query_instruction="Represent the question for retrieving supporting documents:"
        )
    return embeddings

### Step-6: Database Creation and DB Retriever

1. **Database Creation** (`Chroma.from_documents()`): Creates a vector store using **Chroma**, which indexes the document chunks and stores their embeddings in a collection.
   
2. **DB Retriever** (`vector_store.as_retriever()`): Converts the vector store into a retriever, allowing for efficient querying and retrieval of relevant documents based on vector similarity.


In [None]:
# Database Creation
vector_store = Chroma.from_documents(
    documents=document_chunks,       
    embedding=get_embeddings(),    
    collection_name="db_collection"  
)

# DB Retriever
retriever = vector_store.as_retriever()

### Step-7: LLM Setup

In [12]:
#LLM 
api_key = "<Your_API_Key>"
if api_key:
    llm=ChatGroq(groq_api_key=api_key,model_name="Gemma2-9b-It")

### Step-8: System Prompt

1. **System Prompt**: Defines the behavior of the assistant for question-answering tasks. It ensures that the assistant answers strictly based on the provided context and informs the user when the answer is unknown.

2. **Prompt Template** (`ChatPromptTemplate.from_template()`): Creates a flexible prompt template using the system prompt. It replaces the placeholders `{context}` and `{question}` with actual values at runtime.


In [13]:
system_prompt = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer the question. "
        "You must answer questions strictly using the provided context. If you don't know the answer, say that you don't know. "
        "\n\n"
        "{context}"
        "Question: {question}"
    )

prompt = ChatPromptTemplate.from_template(system_prompt)

### Step-9: RAG Chain

1. **Context and Question Preparation**: Retrieves the context using the retriever, formats it, and passes the question as-is without modification.

2. **Prompt Application**: Applies the system prompt, which integrates the context and question.

3. **Language Model**: Passes the structured prompt to the language model for generating a response.

4. **Output Parsing**: Parses the language model's output into a clean string format using `StrOutputParser()`.

In [14]:
# Chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

### Step-10: Testing

In [15]:
# Question
from pprint import pprint

pprint(rag_chain.invoke("What is this document about?"))

('This document describes the AGIEval benchmark and provides information about '
 'the tasks included in it. \n'
 '\n'
 '\n')


In [16]:
pprint(rag_chain.invoke("Explain about System messages?"))

('System messages are hand-crafted prompts designed to elicit specific types '
 'of responses from the language model. \n'
 '\n'
 'There are 16 different system messages used in this research, each designed '
 'to encourage the model to perform different tasks, such as:\n'
 '\n'
 '* Generating long and short answers\n'
 '* Following instructions and formatting guidelines\n'
 '* Creating creative content\n'
 '* Answering information-seeking queries\n'
 '* Providing explanations and step-by-step reasoning \n'
 '\n'
 '\n'
 'These system messages are used to train the Orca model and influence the '
 'types of responses it produces. \n')


In [17]:
pprint(rag_chain.invoke("Explain about Experiment Setup?"))

('The experiment setup included baselines and tasks. \n'
 '\n'
 'The tasks were further divided into:\n'
 '\n'
 "* **Open-ended Generation Capabilities:** This assessed the model's ability "
 'to generate creative and coherent text.\n'
 "* **Reasoning Capabilities:** This evaluated the model's ability to solve "
 'problems and understand logical relationships. \n'
 '\n'
 '\n'
 '\n')


In [18]:
#Out-of-Context Question

pprint(rag_chain.invoke("Who is the Prime Minister"))

'The provided context does not state who is to the right of P.  \n'
