## What is RAG (Retrieval-Augmented Generation)?

Retrieval-Augmented Generation (RAG) is a method that combines the strengths of retrieval-based and generative models to improve the performance of natural language understanding and generation tasks. RAG models use a retriever to fetch relevant documents from a large corpus and then use a generative model to produce answers or generate text based on the retrieved documents.
<br><br>
The typical workflow for RAG involves the following steps: <br>
<ul>
<li>Query Processing: A query is given to the system.</li>
<li>Retrieval: The retriever searches a large corpus to find the most relevant documents related to the query.</li>
<li>Generation: The generator uses the retrieved documents to generate a coherent and relevant response.</li>
<ul>

## Implementing RAG

Implementing a RAG model involves two main components:

<ul>
<li>Retriever: This component fetches relevant documents from a large dataset. Common retrieval models include BM25, TF-IDF, and dense vector retrievers like those based on BERT embeddings.</li>
<li>Generator: This component generates the final output using the retrieved documents. Generative models like GPT-3, T5, or BART can be used.</li>
</ul>

In [92]:
import torch
import pandas as pd

## Prepare the Data

In [93]:
# Load your DataFrame
data = {'Questions': ["What is RAG?", "How does RAG work?", "What are the components of RAG?"],
        'Answers': ["RAG stands for Retrieval-Augmented Generation.",
                    "RAG works by combining retrieval-based and generative models.",
                    "RAG has two main components: a retriever and a generator."]}
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Questions,Answers
0,What is RAG?,RAG stands for Retrieval-Augmented Generation.
1,How does RAG work?,RAG works by combining retrieval-based and gen...
2,What are the components of RAG?,RAG has two main components: a retriever and a...


## Loading FAQ-Chat-Bot Question Answer

In [2]:
df = pd.read_csv('Question_Variations_Embeddings_Updated.csv')
df = df[['Question', 'Answer']]
df.head()

## Encode Questions for Retrieval

Use a model like SentenceTransformer to encode the questions into dense vectors.

In [95]:
from sentence_transformers import SentenceTransformer

# Load a pre-trained SentenceTransformer model
retriever = SentenceTransformer('all-MiniLM-L6-v2')

# # Encode the questions from the DataFrame
# question_embeddings = retriever.encode(df['Question'].tolist(), convert_to_tensor=True)

# # Save corpus embeddings for future use
# torch.save(question_embeddings, 'question_embeddings.pt')



## Load the saved corpus embeddings

In [96]:
question_embeddings = torch.load('question_embeddings.pt')

## Set Up the Generator

For generating answers, you can use a model like T5 or GPT-3. Here we will use T5.

In [97]:
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load T5 model and tokenizer
model_name = 't5-base'
tokenizer = T5Tokenizer.from_pretrained(model_name)
generator = T5ForConditionalGeneration.from_pretrained(model_name)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


## Define the Retrieval and Generation Function

Combine the retriever and generator into a pipeline to handle user queries.

In [139]:
from sentence_transformers import util

def retrieve_answer(query, retriever, question_embeddings, df, top_k=1):
    # Encode the query
    query_embedding = retriever.encode(query, convert_to_tensor=True)

    # Retrieve top_k most similar questions
    hits = util.semantic_search(query_embedding, question_embeddings, top_k=top_k)
    print("hits",hits)
    hits = hits[0]  # Get the list of hits for the first query

    # Get the corresponding answer(s) from the DataFrame
    retrieved_answers = [df.iloc[hit['corpus_id']]['Answer'] for hit in hits]
    retrieved_questions = [df.iloc[hit['corpus_id']]['Question'] for hit in hits]

    return retrieved_questions, retrieved_answers

## Use t5 to Generate Answer

In [140]:
def generate_answer_using_t5(query, retrieved_answers, tokenizer, generator):
    # If generative step is needed, combine the retrieved answers
    input_text = query + " " + " ".join(retrieved_answers)
    inputs = tokenizer(input_text, return_tensors='pt', max_length=512, truncation=True)
    # Generate the answer
    output_sequences = generator.generate(
        input_ids=inputs['input_ids'],
        attention_mask=inputs['attention_mask'],
        max_length=150,
        num_return_sequences=1
    )
    # Decode the generated text
    generated_answer = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
    return generated_answer

## Perplexity.ai To generate Answer

In [141]:
from openai import OpenAI
def generate_answer_using_prplx(retrieved_answers):
    YOUR_API_KEY = ""
    messages = [
        {
            "role": "system",
            "content": (
                "You are an artificial intelligence assistant and you need to "
                "engage in a helpful, detailed, polite conversation with a user."
            ),
        },
        {
            "role": "user",
            "content": (
                f"""{"Repherase the following text: "+ " ".join(retrieved_answers)}"""
            ),
        },
    ]
    client = OpenAI(api_key=YOUR_API_KEY, base_url="https://api.perplexity.ai")
    response = client.chat.completions.create(
        model="llama-3-sonar-large-32k-online",
        messages=messages,
    )
    return response.choices[0].message.content

## LLAMA-3 8B

In [142]:
import requests
import json

def llama3(retrieved_answers):
    url = "http://localhost:11434/api/chat"
    headers = {
        "Content-Type": "application/json"
    }
    payload = {
        "model": "llama3",
        "messages": [
            {
                "role": "user",
                "content": f"Rephrase the following text: {' '.join(retrieved_answers)}"
            }
        ],
        "stream": False
    }
    response = requests.post(url, headers=headers, data=json.dumps(payload))
    if response.status_code == 200:
        return response.json().get("message", {}).get("content", "")
    else:
        return f"Error: {response.status_code}, {response.text}"

## Example query

In [143]:
query = "what is the weather outside?"
# Retrieve answer
retrieved_questions, retrieved_answers = retrieve_answer(query, retriever, question_embeddings, df)

hits [[{'corpus_id': 6719, 'score': 0.2754690647125244}]]


In [145]:
#Generate Answer
# answer = generate_answer_using_t5(query, retrieved_answers, tokenizer, generator)
# answer = generate_answer_using_prplx(retrieved_answers)
answer = llama3(retrieved_answers)


print(f"Query: {query} \n")
print(f"Retrieved Questions: {retrieved_questions} \n")
print(f"Retrieved Answers: {retrieved_answers} \n")

print("********************************************************************************************************************")

print(f"Generated Answer: {answer}")

Query: what is the weather outside? 

Retrieved Questions: ['What is up, bro?'] 

Retrieved Answers: ['Hi! How can I assist you today?'] 

********************************************************************************************************************
Generated Answer: Here's a rephrased version:

"Welcome! What brings you here today? I'm happy to help with whatever you need."
