# Improving Fine-tuned Model using RAG

Code authored by: Shaw Talebi <br>
Article link: https://towardsdatascience.com/how-to-improve-llms-with-rag-abdc132f76ac <br>
Video link: https://youtu.be/Ylz779Op9Pw?si=iOvBETQDrgoK_sO6 <br>
<br>
Colab: https://colab.research.google.com/drive/1peJukr-9E1zCo1iAalbgDPJmNMydvQms?usp=sharing

### imports

In [3]:
!pip3.10 install llama-index-embeddings-huggingface
!pip3.10 install peft
!pip3.10 install auto-gptq
!pip3.10 install llama-index
!pip3.10 install optimum
!pip3.10 install bitsandbytes

Collecting llama-index-embeddings-huggingface
  Using cached llama_index_embeddings_huggingface-0.4.0-py3-none-any.whl.metadata (767 bytes)
Collecting huggingface-hub>=0.19.0 (from huggingface-hub[inference]>=0.19.0->llama-index-embeddings-huggingface)
  Using cached huggingface_hub-0.26.3-py3-none-any.whl.metadata (13 kB)
Collecting llama-index-core<0.13.0,>=0.12.0 (from llama-index-embeddings-huggingface)
  Using cached llama_index_core-0.12.2-py3-none-any.whl.metadata (2.5 kB)
Collecting sentence-transformers>=2.6.1 (from llama-index-embeddings-huggingface)
  Using cached sentence_transformers-3.3.1-py3-none-any.whl.metadata (10 kB)
Collecting filelock (from huggingface-hub>=0.19.0->huggingface-hub[inference]>=0.19.0->llama-index-embeddings-huggingface)
  Using cached filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting fsspec>=2023.5.0 (from huggingface-hub>=0.19.0->huggingface-hub[inference]>=0.19.0->llama-index-embeddings-huggingface)
  Using cached fsspec-2024.10.0-py3-

In [1]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor


  from .autonotebook import tqdm as notebook_tqdm


### Define Settings

In [2]:
# import any embedding model on HF hub (https://huggingface.co/spaces/mteb/leaderboard)
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Settings.embed_model = HuggingFaceEmbedding(model_name="thenlper/gte-large") # alternative model

Settings.llm = None
Settings.chunk_size = 256
Settings.chunk_overlap = 25

LLM is explicitly disabled. Using MockLLM.


### Read and Store Docs into Vector DB

In [3]:
# articles available here: {add GitHub repo}

import os

# List all files in the current directory
print(os.listdir('./'))


# pdf_path = "
# !pip install PyPDF2  # Install PyPDF2 if not already installed

# from PyPDF2 import PdfReader

# book_content = ""
# # Open and read the PDF
# reader = PdfReader(pdf_path)
# for page in reader.pages:
#     book_content += page.extract_text()

# print(book_content)

documents = SimpleDirectoryReader("data").load_data()

['rag_example.ipynb', 'NLP-DL_Project.ipynb', 'data']


In [4]:
# some ad hoc document refinement
print(documents)
print(len(documents))
for doc in documents:
    if "Member-only story" in doc.text:
        documents.remove(doc)
        continue

    if "The Data Entrepreneurs" in doc.text:
        documents.remove(doc)

    if " min read" in doc.text:
        documents.remove(doc)

print(len(documents))

324
324


In [5]:
# store docs into vector DB
index = VectorStoreIndex.from_documents(documents)

### Set Up Search Function

In [6]:
# set number of docs to retreive
top_k = 5

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=top_k,
)

In [7]:
# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)],
)

### Retrieve Relevant Docs

In [8]:
# query documents
query = "What were the learnings of the author from When he started BrainQUICKEN LLC"
response = query_engine.query(query)

In [9]:
# reformat response
context = "Context:\n"
for i in range(top_k):
    context = context + response.source_nodes[i].text + "\n\n"

print(context)

Context:
It just takes a dif-
ferent form. 
When I started BrainQUICKEN LLC in 2001, it was with a clear 
goal in mind: Make $1,000 per day whether I was banging my head 
on a laptop or cutting my toenails on the beach. It was to be an auto-
mated source of cash flow. If you look at my chronology, it is obvi-
ous that this didn't happen until a meltdown forced it, despite the 
requisite income. Why? The goal wasn't specific enough. I hadn't 
defined alternate activities that would replace the initial workload. 
Therefore, I just continued working, even though there was no fi-
nancial need. I needed to feel productive and had no other vehicles. 
This is how most people work until death: "I'll just work until I 
have X dollars and then do what I want."

148 STEP III: A IS FOR AUTOMATION 
It is said that if everyone is your customer, then no one is your 
customer. If you start off aiming to sell a product to dog- or car-
lovers, stop. It's expensive to advertise to such a broad market, an

### Import LLM

In [None]:
# load fine-tuned model from hub
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer


model_name = "winglian/Llama-2-3b-hf"
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

config = PeftConfig.from_pretrained("shawhin/shawgpt-ft")
model = PeftModel.from_pretrained(model, "shawhin/shawgpt-ft")

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

ValueError: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in this format.

### Use LLM

In [None]:
# prompt (no context)
intstructions_string = f"""I am a student that is learning from a book called the 4 hour work week

Please respond to the following comment.
"""
prompt_template = lambda comment: f'''[INST] {intstructions_string} \n{comment} \n[/INST]'''

In [None]:
comment = "What were the learnings of the author from When he started BrainQUICKEN LLC"

prompt = prompt_template(comment)
print(prompt)

[INST] I am a student that is learning from a book called the 4 hour work week 

Please respond to the following comment.
 
What were the learnings of the author from When he started BrainQUICKEN LLC 
[/INST]


In [None]:
model.eval()

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)

print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<s> [INST] I am a student that is learning from a book called the 4 hour work week 

Please respond to the following comment.
 
What were the learnings of the author from When he started BrainQUICKEN LLC 
[/INST] The author, Tim Ferriss, started BrainQUICKEN LLC as a side business while he was still working his day job. The main learnings from this experience were:

1. Outsourcing: He discovered the power of outsourcing tasks to virtual assistants (VAs) in the Philippines. This allowed him to focus on the high-level tasks that only he could do, while the VAs handled the administrative and customer service tasks.
2. Automation: He automated as many processes as possible, such as setting up an autoresponder for email inquiries and creating a sales funnel for his product.
3. Delegation: He learned to delegate tasks to others, even if it meant paying for it. He realized that his time was worth more than the cost of outsourcing.
4. Focusing on the core business: He learned to focus on the c

In [None]:
# prompt (with context)
prompt_template_w_context = lambda context, comment: f"""[INST]I am a student that is learning from a book called the 4 hour work week

{context}
Please respond to the following comment. Use the context above if it is helpful.

{comment}
[/INST]
"""

In [None]:
prompt = prompt_template_w_context(context, query)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)

print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


<s> [INST]I am a student that is learning from a book called the 4 hour work week 

Context:
Turns 
out that you can outsource everything from manufacturing to ad 
design. Two weeks and $5,000 of credit card debt later, I have my 
first batch in production and a live website. Good thing, too, as I'm 
fired exactly one week later. 
2002-2003 BrainQUICKEN LLC has taken off, and I'm now 
making more than $4oK per month instead of $4oK per year. The 
only problem is that I hate life and now work 12-hour-plus days 7 
days a week. Kinda painted myself into a corner. I take a one-week 
"vacation" to Florence, Italy, with my family and spend 10 hours a 
day in an Internet cafe freaking out. Sh*t balls. I begin teaching 
Princeton students how to build "successful" (i.e., profitable) 
companies. 
Winter 2004 The impossible happens and I'm approached by an 
infomercial production company and an Israeli conglomerate (huh?) 
interested in buying my baby BrainQUICKEN.

148 STEP III: A IS FOR AUTOMA