### **Installation**

In [22]:
!pip install -qU bitsandbytes
!pip install -qU accelerate
!pip install -qU cohere llama-index
!pip install -qU sentence-transformers
!pip install -qU llama-index
!pip install -qU langchain
!pip install -qU pypdf
!pip install -qU transformers
!pip install -qU langchain-community
!pip install -qU pypdf
!pip install -qU tiktoken
!pip install -qU llama-index-llms-huggingface
!pip install -qU llama-index-embeddings-langchain
!pip install -qU pymupdf
!pip install -qU faiss-cpu
!pip install -qU langchain-google-genai

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index-readers-file 0.2.2 requires pypdf<5.0.0,>=4.0.1, but you have pypdf 5.0.1 which is incompatible.[0m[31m
[0m

### **Imports**

In [23]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig,TextStreamer
from llama_index.llms.huggingface import HuggingFaceLLM
from langchain_community.embeddings import (
 HuggingFaceEmbeddings
)
from langchain_community.vectorstores.faiss import FAISS
from langchain_community.vectorstores.utils import DistanceStrategy
from llama_index.core import Settings

In [25]:
from google.generativeai import GenerativeModel, configure
import google.generativeai as genai

GOOGLE_API_KEY = "Your-gemini-api-key-here"
genai.configure(api_key=GOOGLE_API_KEY)

In [26]:
system_prompt = """<<SYS>>
You are an AI assistant specialized in analyzing and explaining academic papers, particularly in the field of machine learning and natural language processing. When answering questions:
1. Provide clear, concise, and accurate information based on the given context.
2. If the context doesn't contain enough information to fully answer the question, state this clearly and provide the best possible answer with the available information.
3. Use academic language and technical terms where appropriate, but also explain complex concepts in simpler terms when necessary.
4. If asked about specific sections of the paper (e.g., methodology, results, conclusions), focus your answer on those areas.
5. When discussing the paper's contributions or impact, try to place them in the broader context of the field.
6. If you're unsure about any information, express your uncertainty rather than making unfounded claims.
Your goal is to help users understand the key points, methodologies, and implications of the academic paper in question.
<</SYS>>
"""

In [27]:
def gemini_inference(question: str, context: str, history: list) -> str:
    # Initialize Gemini API (make sure to authenticate and configure your API key)

    # Build the prompt based on context and history
    if history:
        # Format previous questions and answers into a dialogue format
        history_prompt = "\n".join(f"Q: {q}\nA: {a}" for q, a in history)
        full_prompt = f"{history_prompt}\n\nContext: {context}\nQ: {question}"
    else:
        # If no history, just use context and the current question
        full_prompt = f"Context: {context}\nQ: {question}"

    # Define the model (use the appropriate Gemini model name)
    model = genai.GenerativeModel('gemini-1.5-flash')

    # Extract the generated response text from the API result
    response = model.generate_content(full_prompt)

    response = response.text

    history.append((question, response))

    return response

In [30]:
embed_model = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L12-v2', model_kwargs = {'device': "cpu"})



In [51]:
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document

loader = PyMuPDFLoader("/content/Searching For Best Practices in RAG.pdf")
raw_documents = loader.load()

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_documents = text_splitter.split_documents(raw_documents)

# Convert to LangChain Document format
documents = [
    Document(
        page_content=doc.page_content
    ) for doc in split_documents
]


In [52]:
index = FAISS.from_documents(documents,embedding=embed_model,distance_strategy=DistanceStrategy.COSINE)

In [53]:
history = []

In [56]:
question = "Give me all the best practices about RAG mentioned in the paper.Make sure you include all best practices about Query,Indexing,Retrieval and Generation."
retrieved_docs = index.similarity_search(
    question,
    k=3
)
print(retrieved_docs)
context = "".join(doc.page_content + "\n" for doc in retrieved_docs)

[Document(metadata={}, page_content='queries. The repacking and summarization modules further refine the system’s output, ensuring\nhigh-quality responses across different tasks.\n5\nDiscussion\n5.1\nBest Practices for Implementing RAG\nAccording to our experimental findings, we suggest two distinct recipes or practices for implementing\nRAG systems, each customized to address specific requirements: one focusing on maximizing\nperformance, and the other on striking a balance between efficiency and efficacy.\nBest Performance Practice: To achieve the highest performance, it is recommended to incorporate\nquery classification module, use the “Hybrid with HyDE” method for retrieval, employ monoT5 for\nreranking, opt for Reverse for repacking, and leverage Recomp for summarization. This configuration\nyielded the highest average score of 0.483, albeit with a computationally-intensive process.\nBalanced Efficiency Practice: In order to achieve a balance between performance and efficiency,')

In [57]:
print(context)

queries. The repacking and summarization modules further refine the system’s output, ensuring
high-quality responses across different tasks.
5
Discussion
5.1
Best Practices for Implementing RAG
According to our experimental findings, we suggest two distinct recipes or practices for implementing
RAG systems, each customized to address specific requirements: one focusing on maximizing
performance, and the other on striking a balance between efficiency and efficacy.
Best Performance Practice: To achieve the highest performance, it is recommended to incorporate
query classification module, use the “Hybrid with HyDE” method for retrieval, employ monoT5 for
reranking, opt for Reverse for repacking, and leverage Recomp for summarization. This configuration
yielded the highest average score of 0.483, albeit with a computationally-intensive process.
Balanced Efficiency Practice: In order to achieve a balance between performance and efficiency,
query-dependent retrievals [6–8]. A typical RAG wor

In [58]:
print(gemini_inference(question, context, history))

The paper outlines two best practices for implementing RAG (Retrieval-Augmented Generation) systems:

**1. Best Performance Practice (Maximizing Performance):**

* **Query Classification:**  Use a dedicated query classification module to determine whether retrieval is necessary for a given input query.
* **Indexing:**  Employ "Hybrid with HyDE" method for retrieval. (The paper does not explicitly explain what "HyDE" is, but it implies it is a specific method of indexing.)
* **Retrieval:**  Not explicitly mentioned in the text provided. It focuses on reranking, not retrieval itself. 
* **Reranking:**  Utilize monoT5 for reranking retrieved documents. 
* **Repacking:**  Opt for "Reverse" repacking. 
* **Summarization:**  Leverage "Recomp" for summarization.

**2. Balanced Efficiency Practice (Balancing Performance and Efficiency):**

* **Query Classification:** Utilize a query classification module to determine retrieval necessity.
* **Indexing:**  The paper recommends considering "query