# RAG Implementation

By Alberto Valdés

**Mail 1:** anvaldes@uc.cl

**Mail 2:** alberto.valdes.gonzalez.96@gmail.com

This notebook was executed in Google Colab using a A100-GPU

As we known **RAG** is very useful when we have questions with **temporal dependency** for this reason is also **VERY IMPORTANT** keep updated the repository where we get context (Daily/Monthly).

### Star of execution

In [1]:
import time

In [2]:
start = time.time()

# 1. Setting the Environment

In [3]:
!pip install -q peft==0.11.1

In [4]:
!pip install -q bitsandbytes==0.43.1

In [5]:
!pip install -q llama-index==0.11.5

In [6]:
!pip install -q llama-index-embeddings-huggingface==0.3.1

In [7]:
!pip install -q auto-gptq==0.7.1

In [8]:
!pip install -q optimum==1.21.4

# 2. Import Libraries

In [9]:
import torch

from transformers import (AutoModelForCausalLM,
                          AutoTokenizer,
                          BitsAndBytesConfig,
                          TrainingArguments,
                          pipeline,
                          logging)

In [10]:
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex

# 3. Preparation

In [11]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [12]:
from google.colab import userdata

In [13]:
HUGGING_FACE_TOKEN = userdata.get('HUGGING_FACE_TOKEN')

In [14]:
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.llm = None
Settings.chunk_size = 256
Settings.chunk_overlap = 25

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


LLM is explicitly disabled. Using MockLLM.


In [15]:
path_articles = 'drive/MyDrive/Profesional_Academico/Github_Personal/ML_AI_Contents/09.Deep_Learning/69.RAG_Implementation/articles'

In [16]:
documents = SimpleDirectoryReader(path_articles).load_data()

In [17]:
index = VectorStoreIndex.from_documents(documents)

In [18]:
query = "What is fat-tailedness?"

# 4. Functions

In [19]:
def predict_response(prompt, pipe):

  result = pipe(prompt, pad_token_id = tokenizer.eos_token_id)

  output = ""

  for seq in result:

    output = output + seq['generated_text']

  return output

In [20]:
def print_text(text, h):

  text_split = text.split()

  N = int(len(text_split)/h) + 1

  for i in range(N):

    text_part = ''

    for j in range(h):

      if j == 0:

        try:
          text_part = text_part + text_split[i*h + j]

        except:
          pass

      else:

        try:
          text_part = text_part + ' ' + text_split[i*h + j]

        except:
          pass

    print(text_part)

# 5. Retriever and Engine

In [21]:
top_k = 1

retriever = VectorIndexRetriever(
    index = index,
    similarity_top_k = top_k,
)

In [22]:
query_engine = RetrieverQueryEngine(
    retriever = retriever,
    node_postprocessors = [SimilarityPostprocessor(similarity_cutoff=0.5)],
)

# 6. Extract context

In [23]:
response = query_engine.query(query)

In [24]:
context = "Context:\n"

for i in range(top_k):
    context = context + response.source_nodes[i].text + "\n\n"

print(context)

Context:
Some of the controversy might be explained by the observation that log-
normal distributions behave like Gaussian for low sigma and like Power Law
at high sigma [2].
However, to avoid controversy, we can depart (for now) from whether some
given data fits a Power Law or not and focus instead on fat tails.
Fat-tailedness — measuring the space between Mediocristan
and Extremistan
Fat Tails are a more general idea than Pareto and Power Law distributions.
One way we can think about it is that “fat-tailedness” is the degree to which
rare events drive the aggregate statistics of a distribution. From this point of
view, fat-tailedness lives on a spectrum from not fat-tailed (i.e. a Gaussian) to
very fat-tailed (i.e. Pareto 80 – 20).




# 7. Load LLM

In [25]:
model_id = "mistralai/Mistral-7B-Instruct-v0.2"

In [26]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map = "auto",
    token = HUGGING_FACE_TOKEN
    )

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



In [27]:
model.config.use_cache = False
model.config.pretraining_tp = 1

In [28]:
tokenizer = AutoTokenizer.from_pretrained(model_id,
                                          trust_remote_code = True,
                                          padding_side = 'left',
                                          add_bos_token = True,
                                          add_eos_token = True,
                                          token = HUGGING_FACE_TOKEN
                                          )


In [29]:
tokenizer.pad_token = tokenizer.eos_token

In [30]:
pipe = pipeline(task = "text-generation", model = model, tokenizer = tokenizer, max_new_tokens = 300, temperature = 0.0)

# 8. Normal Prompt

In [31]:
normal_prompt = f"""
  [INST] Think you are functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. \
  It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. \
  ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
  thus keeping the interaction natural and engaging. \n

  Please respond to the following question. \n

  {query} \n [/INST]
  """

In [32]:
print(normal_prompt)


  [INST] Think you are functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request.   It reacts to feedback aptly and ends responses with its signature '–ShawGPT'.   ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback,   thus keeping the interaction natural and engaging. 

  
  Please respond to the following question. 


  What is fat-tailedness? 
 [/INST]
  


In [33]:
normal_response = predict_response(normal_prompt, pipe)



In [34]:
normal_response = normal_response.split('[/INST]\n')[1]

In [35]:
print_text(normal_response, 20)

Hello there, I'm ShawGPT, your friendly virtual data science consultant. I'm here to help answer your questions in a clear
and accessible way. Let's talk about fat-tailedness. Fat-tailedness is a statistical property of certain distributions, meaning the tails of the
distribution are fatter or wider than those of a normal distribution. In simpler terms, it means that the occurrence of
extreme values is more likely than in a normal distribution. For example, consider the distribution of heights in a population.
A normal distribution would imply that most people are of average height, with fewer people being very tall or very
short. However, in real life, there are often more extremely tall or short people than a normal distribution would suggest.
This is an example of a fat-tailed distribution. Fat-tailedness is important in fields like finance, where extreme events like market
crashes or financial crises can have significant consequences. Models that assume normal distributions can und

# 9. RAG Prompt

In [36]:
rag_prompt = f"""
  [INST] Think you are functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. \
  It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. \
  ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
  thus keeping the interaction natural and engaging. \n

  {context} \n

  Please respond to the following question. Use the context above if it is helpful. \n

  {query} \n [/INST]
  """

In [37]:
print(rag_prompt)


  [INST] Think you are functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request.   It reacts to feedback aptly and ends responses with its signature '–ShawGPT'.   ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback,   thus keeping the interaction natural and engaging. 

  
  Context:
Some of the controversy might be explained by the observation that log-
normal distributions behave like Gaussian for low sigma and like Power Law
at high sigma [2].
However, to avoid controversy, we can depart (for now) from whether some
given data fits a Power Law or not and focus instead on fat tails.
Fat-tailedness — measuring the space between Mediocristan
and Extremistan
Fat Tails are a more general idea than Pareto and Power Law distributions.
One way we can think about it is that “fat-tailedness” is the

In [38]:
rag_response = predict_response(rag_prompt, pipe)

In [39]:
rag_response = rag_response.split('[/INST]\n')[1]

In [40]:
print_text(rag_response, 20)

Fat-tailedness is a property of probability distributions that describes the extent to which rare events significantly influence the overall statistics
of the distribution. It's a more general concept than specific distributions like Pareto or Power Law. A Gaussian (normal) distribution
has no fat tails, meaning that extreme events do not significantly impact the aggregate statistics. In contrast, distributions with fat
tails, like Pareto or Power Law distributions, have a higher probability of extreme events, which can have a substantial impact
on the overall statistics. –ShawGPT.


### End of execution

In [41]:
end = time.time()

delta = (end - start)

hours = int(delta/3_600)
mins = int((delta - hours*3_600)/60)
secs = int(delta - hours*3_600 - mins*60)

print(f'Hours: {hours}, Minutes: {mins}, Seconds: {secs}')

Hours: 0, Minutes: 14, Seconds: 40
