# Retrieval-Augmented Generation (RAG)

A combination of IR and text generation. In RAG, a model first retrieves relevant information (using IR techniques like dense embeddings) and then generates a response based on that information, often using a language model like GPT.

IR is a component of RAG, but IR alone is not RAG because it lacks the generation step. RAG enhances traditional IR by allowing an AI to retrieve and synthesize responses dynamically.

<img src="../figures/rag-lewis.png" >

# PART I : Retrieve

In [10]:
# !pip install transformers
# !pip install torch

In [2]:
import torch
import transformers

## 1. Load datset

In [3]:
dataset = []
with open('datasets/cat-facts.txt', 'r') as file:
  dataset = file.readlines()
  print(f'Loaded {len(dataset)} entries')

Loaded 150 entries


In [4]:
# Load model directly
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-small-en-v1.5")
model = AutoModel.from_pretrained("BAAI/bge-small-en-v1.5")

## 2. Preprocessing Vector Database

In [5]:
# Each element in the VECTOR_DB will be a tuple (chunk, embedding)
# The embedding is a list of floats, for example: [0.1, 0.04, -0.34, 0.21, ...]
VECTOR_DB = []

def get_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state[:, 0, :].squeeze().tolist()  # Use CLS token embedding

def add_chunk_to_database(chunk):
    embedding = get_embedding(chunk)
    VECTOR_DB.append((chunk, embedding))

In [6]:
for i, chunk in enumerate(dataset):
    add_chunk_to_database(chunk)
    
    if (i + 1) % 30 == 0 or i + 1 == len(dataset):  # Print every 30 chunks and at the last chunk
        print(f'Added chunk {i+1}/{len(dataset)} to the database')

Added chunk 30/150 to the database
Added chunk 60/150 to the database
Added chunk 90/150 to the database
Added chunk 120/150 to the database
Added chunk 150/150 to the database


## 3. Retrieve Function

In [7]:
def cosine_similarity(a, b):
    dot_product = sum([x * y for x, y in zip(a, b)])
    norm_a = sum([x ** 2 for x in a]) ** 0.5
    norm_b = sum([x ** 2 for x in b]) ** 0.5
    return dot_product / (norm_a * norm_b)

In [8]:
def retrieve(query, top_n=5):
    query_embedding = get_embedding(query)
    # temporary list to store (chunk, similarity) pairs
    similarities = []
    for chunk, embedding in VECTOR_DB:
        similarity = cosine_similarity(query_embedding, embedding)
        similarities.append((chunk, similarity))
    # sort by similarity in descending order, because higher similarity means more relevant chunks
    similarities.sort(key=lambda x: x[1], reverse=True)
    # finally, return the top N most relevant chunks
    return similarities[:top_n]

## 4. Testing

In [9]:
input_query = input('Ask me a question: ') #Do cat love fish?
retrieved_knowledge = retrieve(input_query)

print('Retrieved knowledge:')
for chunk, similarity in retrieved_knowledge:
    print(f' - (similarity: {similarity:.2f}) {chunk}')

Ask me a question:  Do cat love fish?


Retrieved knowledge:
 - (similarity: 0.71) Unlike dogs, cats do not have a sweet tooth. Scientists believe this is due to a mutation in a key taste receptor.

 - (similarity: 0.68) Contrary to popular belief, the cat is a social animal. A pet cat will respond and answer to speech, and seems to enjoy human companionship.

 - (similarity: 0.67) A cat lover is called an Ailurophilia (Greek: cat+lover).

 - (similarity: 0.65) Cats hate the water because their fur does not insulate well when it’s wet. The Turkish Van, however, is one cat that likes swimming. Bred in central Asia, its coat has a unique texture that makes it water resistant.

 - (similarity: 0.65) Cats, especially older cats, do get cancer. Many times this disease can be treated successfully.



# PART II : Read

In [None]:
instruction_prompt = f'''You are a helpful chatbot.
Use only the following pieces of context to answer the question. Don't make up any new information:
{'\n'.join([f' - {chunk}' for chunk, similarity in retrieved_knowledge])}
'''

In [None]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")