# Retrieval Augmented Generation 

Retrieval Augmented Generation(RAG) is a technique used to retrieve relevant contextual information from a data source and passing that information to a large language model alongside the user’s prompt. This information is used to improve the model’s output (generated text or images) by augmenting the model’s base knowledge. 

RAG is valuable for use-cases where the model needs knowledge which is not included in the information the model has learned from. For example, consider using GPT as chatbot for an e-commerce wesite, where users can interact with the model to get information about products. The GPT model has no knowledge about the company's products and services, and when a user asks questions about information on products the model would be incapable of answering or it would provide incorrect information (called hallucination), since the model hasn't been trained on the product information. Hence, to prevent such scenarios and augment the model's knowledge which would enable the model to generate relevant responses we implement RAG.

### RAG Pipeline

![RAG Pipeline](Rag_pipeline.png)

* A corpus of documents that are not used to train the model are stored as vector representation(Embeddings) in vector store for faster retrieval. These documents are used to augment the model's context while answering user queries that the model has no knowledge about

* The user query is converted to vector representation and a similarity search is performed on the vector store to retrieve the documents that are related to the user query

* These documents are provided as context to the Large Language Model for response generation


## Abstract

This notebook is used to demonstrate the effectiveness of Retrieval Augmented Generation. In this notebook, we create a vector store with the latest 20 news articles. We use a sentence transformer to convert the news text to embeddings. We use the Facebook AI similarity search framework for indexing the text embeddings and perform a similarity search based on the user query.

In [1]:
from openai import OpenAI
from sentence_transformers import SentenceTransformer
import numpy as np
import faiss

  from tqdm.autonotebook import tqdm, trange


In [2]:
with open('api_key.txt') as ap:
    api_key = ap.read()
#     print(api_key)

client = OpenAI(api_key=api_key)

In [3]:
# Latest 20 news articles

corpus = [ 
    "An apparent assassination attempt was made on former President Donald Trump at his golf course in West Palm Beach, Florida on September 16, 2024.",
"The suspect, identified as Ryan Wesley Routh, 58, allegedly aimed an AR-style rifle with a scope at Trump from bushes about 400 yards away.",
"Secret Service agents noticed Routh and fired shots, causing him to flee before being apprehended later.",
"This incident follows a previous assassination attempt on Trump during a campaign rally in Butler, Pennsylvania two months earlier.",
"Trump thanked the Secret Service and supporters, calling it an interesting day.",
"The FBI is investigating the incident as an apparent assassination attempt.",
"Vice President Kamala Harris, Trump's opponent in the current election, stated that Violence has no place in America.",
"Recent polls show Harris with a narrow lead over Trump in the presidential race.",
"Key issues in the campaign include immigration, the economy, abortion rights, and concerns about democracy.",
"Trump has promised measures like mass deportation of undocumented immigrants and ending birthright citizenship.",
"Harris is distancing herself from President Biden, promising a new generation of leadership.",
"The economy remains the most important issue for American voters, despite positive economic indicators.",
"Apple released watchOS 11 on September 16, 2024, introducing new health and fitness features.",
"Hyzon announced the start of production for its Class 8 200kW Fuel Cell Electric Truck on September 16, 2024.",
"Science Translational Medicine published new research on various medical topics in its May 1, 2024 issue.",
]


### Generate Embeddings

In [4]:
# Create embeddings and store them in vector store

# Sentence Transformer model to generate embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode the corpus into vector embeddings
corpus_embeddings = model.encode(corpus)
corpus_embeddings = np.array(corpus_embeddings)

# FAISS index for similarity search
dimension = corpus_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)  #(Euclidean distance)

# Add corpus embeddings to the FAISS index
index.add(corpus_embeddings)


In [5]:
# Retrieve similar documents based on L2 distance

def retrieve_documents(query, top_k=2):
    query_embedding = model.encode([query])
    distances, indices = index.search(np.array(query_embedding), top_k)
    retrieved_docs = [corpus[idx] for idx in indices[0]]
    return retrieved_docs


In [6]:
# Generate response using RAG

def generate_response(query, retrieved_docs):
    # prompt with retrived documents from similarity search
    prompt = f"Answer the following question using the context provided.\n\nContext:\n{retrieved_docs}\n\nQuestion: {query}\nAnswer:"
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                'role': 'user',
                'content':prompt
            }],
        max_tokens=150,
        temperature=0.7,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )
    
    return response.choices


### Response with RAG

In [7]:
query = "Tell me about the presidential race"
retrieved_docs = retrieve_documents(query)

response = generate_response(query, retrieved_docs)

In [8]:
print(f"Relevant Documents :\n {retrieved_docs}")

Relevant Documents :
 ['Key issues in the campaign include immigration, the economy, abortion rights, and concerns about democracy.', 'Recent polls show Harris with a narrow lead over Trump in the presidential race.']


### Response without RAG

In [9]:
query = "Tell me about the presidential race"

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": query}
    ],
    max_tokens=150,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)


#### Query 1

In [10]:
print(f"Response without RAG: {completion.choices[0].message.content}")

Response with RAG: As of October 2023, the presidential race in the United States is gearing up for the 2024 election. Key aspects of the race include:

1. **Major Candidates**: Both the Democratic and Republican parties are preparing for primaries. Incumbent President Joe Biden is expected to seek re-election for the Democrats, while several prominent Republicans, including former President Donald Trump, Florida Governor Ron DeSantis, and former U.N. Ambassador Nikki Haley, are vying for the Republican nomination.

2. **Key Issues**: The campaign is likely to focus on several major issues, including the economy, healthcare, immigration, climate change, and social justice. Each candidate's stance on these topics will be critical in shaping their campaigns and appealing


In [11]:
print(f"Response with RAG: {response[0].message.content}")

Response with RAG: The presidential race is currently focused on key issues such as immigration, the economy, abortion rights, and concerns about democracy. Recent polls indicate that Kamala Harris has a narrow lead over Donald Trump in the race.


#### Query 2

In [18]:
query = "Need info about Apple 2024 event"
retrieved_docs = retrieve_documents(query)

response = generate_response(query, retrieved_docs)

In [19]:
print(f"Relevant Documents :\n {retrieved_docs}")

Relevant Documents :
 ['Apple released watchOS 11 on September 16, 2024, introducing new health and fitness features.', 'An apparent assassination attempt was made on former President Donald Trump at his golf course in West Palm Beach, Florida on September 16, 2024.']


In [20]:
print(f"Response with RAG: {response[0].message.content}")

Response with RAG: Apple released watchOS 11 on September 16, 2024, which introduced new health and fitness features.


In [21]:
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": query}
    ],
    max_tokens=150,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)


In [22]:
print(f"Response without RAG: \n {completion.choices[0].message.content}")

Response without RAG: 
 As of my last knowledge update in October 2023, I don't have specific details about any Apple event scheduled for 2024. Apple typically holds several key events throughout the year, including their Worldwide Developers Conference (WWDC) in June and product launch events in September. 

To get the most accurate and updated information regarding any Apple events in 2024, including product announcements and keynotes, I recommend checking Apple's official website or following reputable tech news sources. These platforms will provide the latest announcements and insights regarding upcoming events.
