# Improving Fine-tuned Model using RAG

Code authored by: Shaw Talebi <br>
Article link: https://towardsdatascience.com/how-to-improve-llms-with-rag-abdc132f76ac <br>
Video link: https://youtu.be/Ylz779Op9Pw?si=iOvBETQDrgoK_sO6 <br>
<br>
Colab: https://colab.research.google.com/drive/1peJukr-9E1zCo1iAalbgDPJmNMydvQms?usp=sharing

### imports

In [1]:
!pip install llama-index
!pip install llama-index-embeddings-huggingface
!pip install peft
!pip install auto-gptq
!pip install optimum
!pip install bitsandbytes

Collecting llama-index
  Downloading llama_index-0.11.10-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.4.0,>=0.3.1 (from llama-index)
  Downloading llama_index_agent_openai-0.3.3-py3-none-any.whl.metadata (728 bytes)
Collecting llama-index-cli<0.4.0,>=0.3.1 (from llama-index)
  Downloading llama_index_cli-0.3.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.12.0,>=0.11.10 (from llama-index)
  Downloading llama_index_core-0.11.10-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.3.0,>=0.2.4 (from llama-index)
  Downloading llama_index_embeddings_openai-0.2.5-py3-none-any.whl.metadata (686 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.3.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.3.1-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post3-py3-none-any.whl.metadata (8.5 kB)
Collecti

In [2]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

### Define Settings

In [3]:
# import any embedding model on HF hub (https://huggingface.co/spaces/mteb/leaderboard)
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Settings.embed_model = HuggingFaceEmbedding(model_name="thenlper/gte-large") # alternative model

Settings.llm = None
Settings.chunk_size = 256
Settings.chunk_overlap = 25

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

LLM is explicitly disabled. Using MockLLM.


### Read and Store Docs into Vector DB

In [5]:
# articles available here: {add GitHub repo}
documents = SimpleDirectoryReader("/content").load_data()

In [23]:
# some ad hoc document refinement
print(len(documents))
for i, doc in enumerate(documents):
  if i == 20:
    print(doc.text)
    #print(doc.text[-100:])
  """    if "Member-only story" in doc.text:
          documents.remove(doc)
          continue

      if "The Data Entrepreneurs" in doc.text:
          documents.remove(doc)

      if " min read" in doc.text:
          documents.remove(doc)"""

print(len(documents))

114
 THE RE GIONS  17 we activated emergency contingency measures worth $47 million and $150 million, 
respectively, to support disaster response. In Malawi, this support helped purchase 65,000 metric tons of maize and medical supplies to address food insecurity and 
the spread of cholera ( see Spotlight on page 19 ). In Mozambique, this financing is 
helping rehabilitate roads, bridges, schools, health centers, power lines, water sup -
ply, and drainage structures. It is also helping restore rural livelihoods by distributing 
agriculture seeds and tools to affected farmers. 
Increasing resilience to climate shocks
Crippling droughts, devastating floods, and rapidly rising temperatures are severely hitting African economies. We provided $385 million to help countries in the Horn of Africa better adapt to climate change. This project will foster cooperation between 
Ethiopia, Kenya, and Somalia to draw on the region’s largely untapped groundwater 
resources. It will reach more than 3 mi

In [24]:
# store docs into vector DB
index = VectorStoreIndex.from_documents(documents)

### Set Up Search Function

In [25]:
# set number of docs to retreive
top_k = 3

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=top_k,
)

In [26]:
# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)],
)

### Retrieve Relevant Docs

In [27]:
# query documents
query = "How much Morocco provides to improve financial inclusion, digital entrepreneurship, and access to digital infrastructure?"
response = query_engine.query(query)

In [28]:
# reformat response
context = "Context:\n"
for i in range(top_k):
    context = context + response.source_nodes[i].text + "\n\n"

print(context)

Context:
THE RE GIONS  37
Promoting private sector engagement and financial inclusion
The World Bank works with countries to help strengthen the private sector and 
expand access to finance. In T unisia, a $120 million project is helping small and 
medium enterprises access long-term lines of credit from the Ministry of Finance 
through participating financial institutions.
In Morocco, we are providing $450 million—the third round in a series of financ -
ing—to help the government implement reforms to improve financial inclusion,  
digital entrepreneurship, and access to digital infrastructure and services for peo-ple and businesses. The series has enabled the country to increase access to finan -
cial services to 44 percent in 2023 from 29 percent in 2017 and digital payments to 30 percent from 17 over the same period. It has also improved the infrastructure 
for digital payments, mobile payment networks, microinsurance, collateral regis -
tries, and women’s access to finance and econ

### Import LLM

In [29]:
# load fine-tuned model from hub
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

config = PeftConfig.from_pretrained("shawhin/shawgpt-ft")
model = PeftModel.from_pretrained(model, "shawhin/shawgpt-ft")

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

config.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/4.16G [00:00<?, ?B/s]

Some weights of the model checkpoint at TheBloke/Mistral-7B-Instruct-v0.2-GPTQ were not used when initializing MistralForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.11.mlp.down_proj.bias', 'model.layers.11

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/8.40M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

### Use LLM

In [37]:
# prompt (no context)
intstructions_string = f"""ZAIMYGPT, functioning as a data scientist and AI consultant. \n
He did this notebook to show its ability of using RAG solution. \n
Its GPT improved by RAG solution will reacts to feedback aptly and ends responses with its signature '–ZAIMYGPT'.

Please respond to the following comment.
"""
prompt_template = lambda comment: f'''[INST] {intstructions_string} \n{comment} \n[/INST]'''

In [38]:
comment = "How much Morocco provides to improve financial inclusion, digital entrepreneurship, and access to digital infrastructure?"

prompt = prompt_template(comment)
print(prompt)

[INST] ZAIMYGPT, functioning as a data scientist and AI consultant. 

He did this notebook to show its ability of using RAG solution. 

Its GPT improved by RAG solution will reacts to feedback aptly and ends responses with its signature '–ZAIMYGPT'.

Please respond to the following comment.
 
How much Morocco provides to improve financial inclusion, digital entrepreneurship, and access to digital infrastructure? 
[/INST]


In [40]:
model.eval()

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=500)

print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] ZAIMYGPT, functioning as a data scientist and AI consultant. 

He did this notebook to show its ability of using RAG solution. 

Its GPT improved by RAG solution will reacts to feedback aptly and ends responses with its signature '–ZAIMYGPT'.

Please respond to the following comment.
 
How much Morocco provides to improve financial inclusion, digital entrepreneurship, and access to digital infrastructure? 
[/INST] I'm glad you're interested in Morocco's role in improving financial inclusion, digital entrepreneurship, and access to digital infrastructure.

Morocco has been making significant strides in these areas. According to the World Bank, Morocco's financial inclusion rate was 63.3% in 2014, and it has been steadily increasing since then. The country's National Strategy for Digital Transition (SNTD) aims to increase financial inclusion to 75% by 2025.

In terms of digital entrepreneurship, Morocco has been investing in various initiatives to support startups and small bu

In [53]:
# prompt (with context)
prompt_template_w_context = lambda context, comment: f"""[INST]ZAIMYGPT, functioning as a data scientist and AI consultant.
He did this notebook to show its ability of using RAG solution.
Its GPT improved by RAG solution will reacts to feedback aptly and ends responses with its signature '–ZAIMYGPT'.


Please respond to the following: \n
{comment} \n
Use the relevant information from the context below \n
{context}

[/INST]
"""

In [54]:
prompt = prompt_template_w_context(context, comment)
print(prompt)

[INST]ZAIMYGPT, functioning as a data scientist and AI consultant. 
He did this notebook to show its ability of using RAG solution. 
Its GPT improved by RAG solution will reacts to feedback aptly and ends responses with its signature '–ZAIMYGPT'.


Please respond to the following: 

How much Morocco provides to improve financial inclusion, digital entrepreneurship, and access to digital infrastructure? 

Use the relevant information from the context below 

Context:
THE RE GIONS  37
Promoting private sector engagement and financial inclusion
The World Bank works with countries to help strengthen the private sector and 
expand access to finance. In T unisia, a $120 million project is helping small and 
medium enterprises access long-term lines of credit from the Ministry of Finance 
through participating financial institutions.
In Morocco, we are providing $450 million—the third round in a series of financ -
ing—to help the government implement reforms to improve financial inclusion,  


In [55]:
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=500)

print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST]ZAIMYGPT, functioning as a data scientist and AI consultant. 
He did this notebook to show its ability of using RAG solution. 
Its GPT improved by RAG solution will reacts to feedback aptly and ends responses with its signature '–ZAIMYGPT'.


Please respond to the following: 

How much Morocco provides to improve financial inclusion, digital entrepreneurship, and access to digital infrastructure? 

Use the relevant information from the context below 

Context:
THE RE GIONS  37
Promoting private sector engagement and financial inclusion
The World Bank works with countries to help strengthen the private sector and 
expand access to finance. In T unisia, a $120 million project is helping small and 
medium enterprises access long-term lines of credit from the Ministry of Finance 
through participating financial institutions.
In Morocco, we are providing $450 million—the third round in a series of financ -
ing—to help the government implement reforms to improve financial inclusion