# Article Series QA Assistant with RAG
## ABB #1 - Session 4

Code authored by: Shaw Talebi

### imports

In [1]:
import json
from sentence_transformers import SentenceTransformer
import torch
from IPython.display import display, Markdown
from functions import *

from openai import OpenAI
from top_secret import my_sk

import os 
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [2]:
# setup api client
client = OpenAI(api_key=my_sk)

### load data & model

In [3]:
# load chunks
filename = 'data/chunk_list.json'
with open(filename, 'r', encoding='utf-8') as f:
    chunk_list = json.load(f)

# load embeddings
chunk_embeddings = torch.load('data/chunk_embeddings.pt', weights_only=False)

In [4]:
print("Num chunks:",len(chunk_list))
print(chunk_embeddings.shape)

Num chunks: 778
(778, 384)


In [5]:
# load model
model = SentenceTransformer("multi-qa-MiniLM-L6-cos-v1")

### 1) define query

In [6]:
# define query
query = "When does it make sense to use RAG vs fine-tuning?"

### 2) context retreival

In [7]:
results_markdown = semantic_search(query, model, chunk_embeddings, chunk_list, temp=0.1, k=10, threshold=0.01)

In [8]:
display(Markdown(results_markdown))

1. **Article title:** LLM Fine-tuning — FAQs  
   **Section:** RAG vs Fine-tuning?  
   **Snippet:** We’ve already mentioned situations where RAG and fine-tuning perform well. However, since this is such a common question, it’s worth reemphasizing when each approach works best.  

2. **Article title:** LLM Fine-tuning — FAQs  
   **Section:** RAG vs Fine-tuning?  
   **Snippet:** Here’s high-level guidance on when to use each.  

3. **Article title:** How to Improve LLMs with RAG  
   **Section:** Why we care  
   **Snippet:** Previous articles in this series discussed fine-tuning, which adapts an existing model for a particular use case. While this is an alternative way to endow an LLM with specialized knowledge, empirically, fine-tuning seems to be less effective than RAG at doing this [1].  

4. **Article title:** LLM Fine-tuning — FAQs  
   **Section:** RAG vs Fine-tuning?  
   **Snippet:** RAG is when we inject relevant context into an LLM’s input prompt so that it can generate more helpful responses. For example, if we have a domain-specific knowledge base (e.g., internal company documents and emails), we might identify the items most relevant to the user’s query so that an LLM can synthesize information in an accurate and digestible way.  

5. **Article title:** LLM Fine-tuning — FAQs  
   **Section:** RAG vs Fine-tuning?  
   **Snippet:** Notice that these approaches are not mutually exclusive. In fact, the original RAG system proposed by Facebook researchers used fine-tuning to better use retrieved information for generating responses [4].  

6. **Article title:** How to Improve LLMs with RAG  
   **Section:** Some Nuances  
   **Snippet:** Document preparation—The quality of a RAG system is driven by how well useful information can be extracted from source documents. For example, if a document is unformatted and full of images and tables, it will be more difficult to parse than a well-formatted text file.  

7. **Article title:** How to Improve LLMs with RAG  
   **Section:** Some Nuances  
   **Snippet:** While the steps for building a RAG system are conceptually simple, several nuances can make building one (in the real world) more complicated.  

8. **Article title:** LLM Fine-tuning — FAQs  
   **Section:** When NOT to Fine-tune  
   **Snippet:** The effectiveness of any approach will depend on the details of the use case. For example, fine-tuning is less effective than retrieval augmented generation (RAG) to provide LLMs with specialized knowledge [1].  

9. **Article title:** How to Improve LLMs with RAG  
   **Section:** How it works  
   **Snippet:** There are 2 key elements of a RAG system: a retriever and a knowledge base.  

10. **Article title:** How to Improve LLMs with RAG  
   **Section:** Why we care  
   **Snippet:** Notice that RAG does not fundamentally change how we use an LLM; it's still prompt-in and response-out. RAG simply augments this process (hence the name).  



### 3) prompt engineering

In [9]:
prompt_template = lambda query, results_markdown : f""" You are an AI assistant tasked with answering user questions based on excerpts from blog posts. Use the following snippets to \
provide accurate, concise, and synthesized answers. If the snippets don’t provide enough information, let the user know and suggest further exploration.

## Question:
{query}

## Relevant Snippets:
{results_markdown}

---

## Response:
Provide a clear and concise response below, synthesizing information from the snippets and referencing them directly. If additional information is \
required, suggest further follow-ups or note what’s missing.
"""

In [10]:
prompt = prompt_template(query, results_markdown)
# print(prompt)

### 4) prompt GPT-4o-mini

In [11]:
# make api call
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": prompt}
    ], 
    temperature = 0.5
)

# extract response
answer = response.choices[0].message.content

### 5) display results

In [12]:
print()
print(query)
print()
display(Markdown(answer))


When does it make sense to use RAG vs fine-tuning?



When deciding between Retrieval-Augmented Generation (RAG) and fine-tuning for enhancing large language models (LLMs), consider the following:

1. **RAG** is ideal when you need to inject relevant context into the model's input to improve response quality. It works well with domain-specific knowledge bases, allowing the model to synthesize information from relevant documents effectively (Snippet 4). This method is particularly useful when fine-tuning is less effective at providing specialized knowledge (Snippet 8).

2. **Fine-tuning** adapts an existing model for a specific use case but is generally considered less effective than RAG for embedding specialized knowledge (Snippet 3). It can be beneficial when you have a well-defined dataset and the goal is to customize the model's behavior more fundamentally.

3. Both approaches can be used together; for instance, the original RAG system utilized fine-tuning to enhance how retrieved information is employed in generating responses (Snippet 5).

In summary, use RAG when you need to leverage external knowledge sources for better context in responses. Opt for fine-tuning when you want to fundamentally adjust the model's capabilities for a specific task, keeping in mind that it may be less effective for specialized knowledge compared to RAG. If you need more detailed guidance on specific use cases, further exploration of the topic may be beneficial.

In [13]:
# bringing it all together
query = "What are the benefits of LLM fine-tuning?"
results_markdown = semantic_search(query, model, chunk_embeddings, chunk_list, temp=0.1, k=10, threshold=0.01)
answer = answer_query(query, results_markdown, prompt_template, client)
display(Markdown(answer))

The benefits of fine-tuning large language models (LLMs) include:

1. **Improved Performance for Specific Tasks**: Fine-tuned models can outperform larger pre-trained models for particular use cases, even when clever prompt engineering is applied (Snippet 6).

2. **Lower Inference Costs**: Fine-tuning can lead to reduced inference costs, making it a practical choice for deploying AI assistants (Snippet 9).

3. **Customization**: Fine-tuning allows for the adaptation of a model to specialized knowledge or tasks, enhancing its relevance and effectiveness (Snippet 2).

4. **Quality of Training Data**: The performance of a fine-tuned model is heavily influenced by the quality of the training dataset used, emphasizing the importance of data preparation (Snippet 7).

However, it is important to note that fine-tuning is not a one-size-fits-all solution. It may not be as effective as other techniques like retrieval augmented generation (RAG) for certain applications (Snippet 1), and it can incur an "alignment tax," where performance may drop in some tasks (Snippet 5). 

For further exploration, consider looking into specific use cases where fine-tuning has shown significant benefits or challenges.