# RAG vs Finetuning--Which Is the Best Tool to Boost Your LLM Application? 

## RAG 
- This approach integrates the power of retrieval (or searching) into LLM text generation. 
- It combines a retriever system, which fetches relevant document snippets from a large corpus, and an LLM, which produces answers using the information from those snippets. 
- In essence, RAG helps the model to "look up" external information to improve its responses. 

![1_Jq9bEbitg1Pv4oASwEQwJg.png](attachment:1_Jq9bEbitg1Pv4oASwEQwJg.png)

## Finetuning
- This is the process of taking a pre-trained LLM and further training it on a smaller, specific dataset to adapt it for a particular task or to improve its performance. 
- By finetuning, we are adjusting the model's weights based on our data, making it more tailored to out application's unique needs. 

![1_JSJBBnslBE9S5i77Rz9r_g.png](attachment:1_JSJBBnslBE9S5i77Rz9r_g.png)

##

## Why should you care? 
- Choosing the right technique for adapting large language models can have a major impact on the success of your NLP applications. 
- Selecting the wrong approach can lead to:
    - Poor model performance on your specific task, resulting inaccurate outputs. 
    - Increased compute costs for model training and inference if the technique is not optimized for your use case. 
    - Additional development and iteration time if you need to pivot to different technique later on. 
    - Delays in deploying your application and getting it in front of users. 
    - A lack of model interpretability if you choose an overly complex adaptation approach. 
    - Difficulty deploying the model to production due to size or computational constraints. 
- The nuances between RAG and finetuning span model architecture, data requirements, computational complexity, and more. 
- Overlooking these details can derail your project timeline and budget. 

## Does our use case require access to external data sources? 
- When choosing between finetuning an LLM or using RAG, one key consideration is whether the application requires access to external data sources. If the answer is yes, RAG is likely the better option. 
- RAG systems are, by definition, designed to augment an LLM's capabilities by retrieving relevant information from knowledge sources before generating a response.  
 - This makes this technique well-suited for applications that need to query databases, documents or other structured/unstructured data repositories. The retriever and generator components can be optimised to leverage these external sources. 

- In constrast, while it is possible to finetune an LLM to learn some external knowledge, doing so requires a large labelled dataset of question-answer pairs from the target domain. 
 - This dataset must be updated as the underlying data changes, making it impratical for frequently changing data sources. 
 - The finetuning process also does not explicitly model the retrieval and reasoning steps involved in querying external knowledge. 

- So in summary, if our application needs to leverage external data sources, using a RAG system will likely be more effective and scalable than attempting to "bake in" the required knowledge through finetuning alone.  


## Do we need to modify the model's behavior, writing style, or domain-specific knowledge? 
- Finetuning excels in its ability to adapt an LLM's behaviour to specific nuances, tones, or terminologies. 
- RAG, while powerful in incorporating external knowledge, primarily focuses on information retrieval and doesn't inherently adapt is linguistic style or domain-specificity based on the retrieved information. 

![1_KFwNQ1Xh6ZvNC5_di4wFcQ.png](attachment:1_KFwNQ1Xh6ZvNC5_di4wFcQ.png)

## How crucial is to suppress hallucinatiosn? 
- LLM's is their tendency to hallucinate-making up facts or details that have no basis in reality. This can be highly problematic in applications where accuracy and truthfulness are critical. 
 - Finetuning can help reduce hallucinations to some extent by grounding the model in a specific domain's training data. 
 - However, the model may still fabricate responses when faced with unfamiliar inputs. 
 - Retraining on new data is required continously minimise false fabrications. 

- In constrast, RAG system are inherently less prone to hallucination because they ground each responses in retrieved evidence.  
 - The retriever identifies relevant facts from the external knowledge source before the generator constructs the answer. 
 - This retrival step acts as a fact-checking mechanism, reducing the model's ability to confabulate. 
 - The generator is constrained to syntheise a response supported by the retrieved context. 

- So in applications where suppressing falsehoods and imaginative fabrications is vital, RAG systems provide in-built mechanisms to minimise hallucinations. 
- The retrieval of supporting evidence prior to response generation gives RAG an advantage in ensuring factually accurate and truthful outputs.

## How much labelled training data is available? 
- Finetuning an LLM to adapt to specific tasks or domains is heavily dependent on the quality and quantity of the labelled data available. 
- A rich dataset can help the model deeply understand the nuances, intricacies, and unique patterns of a particular domain, allowing it to generate more accurate and contextually relevant responses. 
- However, if we are working with a limited dataset, the improvements from finetuning might be marginal. 
- In some cases, a scant dataset might even lead to overfitting, where the model performs well on the training data but struggles with unseen or real-world inputs. 

- On the contrary, RAG systems are independent from training data because they leverage external knowledge sources to retrieve relevant information. 
- Even if we don't have an extensive labelled dataset, a RAG system can still perform competently by accessing and incorporating insgiht from its external data sources. 

- In essence, if we have a wealth of labelled data that captures the domain's intricacies, finetuning can offer a more tailored and refined model behaviour. 
- But in scenarious where such data is limited, a RAG system provides a robust alternative, ensuring the application remains data-informed and contextually aware through its retrieval capabilities. 

## How static/dynamic is the data? 
- Finetuning an LLM on a specific dataset means the model's knowledge becomes a static snapshot of that data at the time of training. 
 - If the data undergoes frequent updates, changes, or expansions, this can quickly render the model outdated. 
 - To keep the LLM current in such dynamic environments, we would have to retrain it frequently, a process that can be both time-consuming and resource-intensive. 
 - Additinally, each iteration requires careful monitoring to ensure that the updated model still performs well across different scenarios and hasn't developed new biases or gaps in understanding. 
 
- In constrast, RAG systems inherently possess an advantage in environments with dynamic data. 
 - Their retrieval mechanisn constanyly queries external sources, ensuring that the information they pull in for generating responses is up-to-date. 
 - As the external knowledge bases or databases update, the RAG system seamlessly integreates these changes, maintaining its relevance without the need for frequent model retraining. 

- In summary, if we are grappling with a rapidly evolving data landscape, RAG offers an agility that's hard to match with traditional finetuning. 
 - By always staying connected to the most recent data, RAG ensures that the responses generated are in tune with the current state of information, making it an ideal choice for dynamic data scenarios. 

## How transparent/interpretable does our LLM app need to be? 
- Finetuning an LLM, while incredibly powerful, operates like a black box, making the reasoning behind its responses more opaque. 
 - As the model internalises the information from the dataset, it becomes challenging to discern the exact source or reasoning behind each response. 
 - This can make it difficult for developers or users to trust the model's outputs, especially in critical applications where understanding the "why" behind an answer is vital. 

- RAG systems, on the other hand, offer a level of transparency that's not typically found in solely finetuned models. 
 - Given the two-step nature of RAG -- retrieval and then generation -- users can peek into the process. 
 - The retrieval documents or data points are selected as relevant. 
 - This provides a tangible trail of evidence or reference that can be evaluated to understand the foundation upon which a response is built. 
 - The ability to trace back a model's answer to specific data sources can be invaluable in applications that demand a high degree of accountability or when there's a need to validate the accuracy of the generated content. 

- In essence, if transparency and the ability to interpret the underpinnings of a model's responses are priorities, RAG offers a clear advantege. 
 - By breaking down the response generation into distinct stages and allowing insight into its data retrieval, RAG fosters greater trust and understanding in its outputs.  

# Summary
- Choosing between RAG and finetuning becomes more intuitive when considering these dimensions. 
- If we need lean towards accessing external knowledge and valuing transparency, RAG is our go-to. 
- On the other hand, if we are working with stable labelled data and aim to adapt the model more closely to specific needs, finetuning is the better choice. 
- ![1_To-PwvmU47tqyxPzhar6vg.png](attachment:1_To-PwvmU47tqyxPzhar6vg.png)

# Use Cases
- Let's look at some popular use cases and how the above framework can be used to choose the right method.

## Summarisation
1. External Knowledge Required? 
    - For the task of summarizing in the style of previos summaries, the primary data source would be the previos summaries themselves. 
    - If these summaries are contained within a static dataset, there's little need for continuous external data retrieval. 
    - However, if there's a dynamic database of summaries that frequently updated and the goal is to continually align the style with the newest entries, RAG might be useful here. 

2. Model adaptation required? 
    - The core of this use case revolves around adapting to a specialised domain or a and/or a specific writing style. 
    - Finetuning is particularly adept at capturing stylistic nuances, tonal variations, and specific domain vocabularies, making it an optimal choice for this dimension. 

3. Crucial to minimise hallucinations? 
    - Hallucinations are problematic in most LLM applications, including summarisation. 
    - However, in this case, the text to be summarised is typically provided as context. 
    - This makes hallucinations less of a concern compared to other use cases. 
    - The source text constraints the model, reducing imaginative fabrications.
    - So while factual accuracy is always desirable, suppressing hallucinations is a lower priority for summarisation given the contextual gorunding. 

4. Training data available? 
    - If there is a substantial collection of previous summaries that are labelled or structured in a way that the model can learn from them, finetuning becomes a very attractive option. 
    - On the other hand, if the dataset is limited, and we are leaning on external databases for stylistic alignment, RAG could play a role, although its primary strength isn't style adaption. 

5. How dynamic is the data? 
    - If the database of previous summaries is static or updates infrequently, the finetuned model's knowledge will likely remain relevant for a longer time. 
    - However, if the summaries update frequently and there is a need for the model to align with the newest stylistic changes continously, RAG might have an edge due to its dynamic data retrieval capabilities. 

6. Transpareny/Interpretability required? 
    - The primary goal here is stylistic alignment, so the "why" behind a particular summarisation style might be less critical than in other use cases. 
    - That said, if there is a need to trace back and understand which previous summaries influenced a particular output, RAG offers a bit more transparency. 
    - Still, this might be a secondary concern for this use case. 
    - Recommendation: For this use finetuning appears to be the more fitting choice.  





# Additional Aspect to Consider

## Scability 
- As an organisation grows and its needs evolve, how scalable are the methods in question? 
- RAG systems, given their modular nature, might offer more straightforward scability, especially if the knowledege base grows. 
- On the other hand, freqeuntly finetuning a model to cater to expanding datasets can be computationally demanding. 

## Latency and Real-Time Requirements
- If the application requires real-time or near-real-time responses, consider the latency introduced by each method. 
- RAG systems, which involve retrieving data before generating a response, might introduce more latency comparaed to a finetuned LLM that generates responses based on internalised knowledge. 

## Maintenance and Support 
- Think about the long-term. Which system aligns better with the organisation's ability to provide consistent maintenance and support? 
- RAG might require upkeep of the database and the retrieval mechanism, while finetuning would necessitate consistent retraining efforts, especially if the data or requirements change. 

## Robustnes and Reliability 
- How robust is each method to different types of inputs? 
- While RAG systems can pull from external knowledge sources and might handle a broad array of questions. 
- A well finetuned model might offer more consistency in certain domains. 

## User Experience 
- Consider the end-users and their needs. 
- If they require detailed, reference-backed answersi RAG could be preferable. 
- If they value speed and domain-specific expertise, a finetuned model might be more suitable. 

## Cost 
- Finetuning can get expensive, especially for really large models. 
- But in the past few months costs have gone down significantly thanks to parameter efficient techniques like QLoRA. 
- Setting up RAG can be a large initial investment-covering the integration, database access, maybe even licensing fees -- but then there is also the regular maintenance of that external knowledge base to thnik about. 

# Conclusion 
- As we have explored, choosing between RAG and finetuning requires a nuanced evaluation of an LLM application's unique needs and priorities. 
- There is no-one-size-fits-all solution; success lies in aligning the optimisation method with the specific requirements of the task. 
- Organisations can make an informed decision on the best path forward. 