# The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities

**Source**: [original paper](https://arxiv.org/pdf/2408.13296v1) and many online sources

**Note**: This is a summary of what I've learned and understood from the original papers. I've also included additional information collected from online sources.

## Chaper 1: Introduction

### Background of Large Language Models

 Large Language Models (LLMs) represent a significant leap in computational systems capable of under
standing and generating human language. 

 Notable examples, such as GPT-3 and GPT-4, leverage the self-attention mecha
nism within Transformer architectures to efficiently manage sequential data and understand long-range
 dependencies. 

### Historical Development and Key Milestones

 Language models are fundamental to natural language processing (NLP), leveraging mathematical tech
niques to generalise linguistic rules and knowledge for tasks involving prediction and generation.

 Over
 several decades, language modelling has evolved from early statistical language models (SLMs) to to
day’s advanced large language models (LLMs). 

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:400px">
    <img src="image/timeline.png" alt="timeline" />
</div>

### Evolution from Traditional NLP Models to State-of-the-Art LLM

#### Statistical Language Model (SLM)

 Emerging in the 1990s, SLMs analyse natural language using probabilistic methods to determine the
 likelihood of sentences within texts. 

+ Probability: SLMs assign probabilities to sequences of words or sentences.
+ N-gram models: The most common type, especially for earlier SLMs. They predict the next word based on the previous n-1 words.
+ Limitations: Traditional SLMs struggle with long-range dependencies and context.

#### Neural Language Model (NLM)

NLMs leverage neural networks to predict word sequences, overcoming SLM limitations. Word vectors
 enable computers to understand word meanings. 
 
 Tools like Word2Vec represent words in a vector
 space where semantic relationships are reflected in vector angles. 

The input layer concatenates word vectors,
 the hidden layer applies a non-linear activation function, and the output layer predicts subsequent words
 using the Softmax function to transform values into a probability distribution.

#### Pretrained Language Model (PLM)

 PLMs are initially trained on extensive volumes of unlabelled text to understand fundamental language
 structures (pre-training). They are then fine-tuned on a smaller, task-specific dataset. This ”pre-training
 and fine-tuning” paradigm, exemplified by GPT-2 and BERT, has led to diverse and effective model
 architectures.

#### Large Language Models (LLM)

 LLMs like GPT-3, GPT-4, PaLM, and LLaMA are trained on massive text corpora with tens of
 billions of parameters. LLMs undergo a two-stage process: initial pre-training on a vast corpus followed by alignment with human values.

### Overview of Current Leading LLMs

LLMs’ rapid development has spurred research into architectural innovations, training strategies, extending context lengths, fine-tuning techniques, and integrating multi-modal data. 

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:500px">
    <img src="image/LLMdimension.png" alt="" />
</div>

### Types of LLM Fine-Tuning

#### Unsupervised Fine-Tuning

 This method does not require labelled data. Instead, the LLM is exposed to a large corpus of unla
belled text from the target domain, refining its understanding of language. This approach is useful for
 new domains like legal or medical fields but is less precise for specific tasks such as classification or
 summarisation.

#### Supervised Fine-Tuning (SFT)

SFT involves providing the LLM with labelled data tailored to the target task.

 While effective, this method requires substantial labelled data, which can be costly and time-consuming
 to obtain.

#### Instruction Fine-Tuning via Prompt Engineering

This method relies on providing the LLM with natural language instructions, useful for creating spe
cialised assistants. It reduces the need for vast amounts of labelled data but depends heavily on the
 quality of the prompts.

### Pre-training vs Fine-tuning

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:300px">
    <img src="image/table_pre_fine.png" alt="" />
</div>

### Importance of Fine-Tuning LLM

1. *Transfer Learning*: Fine-tuning leverages the knowledge acquired during pre-training, adapting it to specific tasks with reduced computation time and resources.
2. *Reduced Data Requirements*: Fine-tuning requires less labelled data, focusing on tailoring pre-trained features to the target task.
3. *Improved Generalisation*: Fine-tuning enhances the model’s ability to generalise to specific tasks or domains, capturing general language features and customising them.
4. *Efficient Model Deployment*: Fine-tuned models are more efficient for real-world applications, being computationally efficient and well-suited for specific tasks.
5. *Adaptability to Various Tasks*: Fine-tuned LLMs can adapt to a broad range of tasks, performing well across various applications without task-specific architectures.
6. *Domain-Specific Performance*: Fine-tuning allows models to excel in domain-specific tasks by adjusting to the nuances and vocabulary of the target domain.
7. *Faster Convergence*: Fine-tuning usually achieves faster convergence, starting with weights that already capture general language features.

### Retrieval Augmented Generation (RAG)

Apopular method to utilise your own data is by incorporating it into the prompt when querying the LLM
 model.

 This approach, known as Retrieval-Augmented Generation (RAG), involves retrieving relevant
 data and using it as additional context for the LLM. Instead of depending solely on knowledge from the
 training data, a RAG workflow pulls pertinent information, connecting static LLMs with real-time data
 retrieval. 

With RAG architecture, organisations can deploy any LLM model and enhance it to return
 relevant results by providing a small amount of their own data

This
 process avoids the costs and time associated with fine-tuning or pre-training the model.

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:400px">
    <img src="image/rag.png" alt="" />
</div>

#### Traditional RAG Pipeline and Steps

1. *Data Indexing*: Organise data efficiently for quick retrieval. This involves processing, chunking,
 and storing data in a vector database using indexing strategies like search indexing, vector indexing,
 and hybrid indexing
2. *Input Query Processing*: Refine user queries to improve compatibility with indexed data. This
 can include simplification or vector transformation of queries for enhanced search efficiency.
3. *Searching and Ranking*: Retrieve and rank data based on relevance using search algorithms
 such as TF-IDF, BM25, and deep learning models like BERT to interpret the query’s intent and
 context.
4. *Prompt Augmentation*: Incorporate relevant information from the search results into the origi
nal query to provide the LLM with additional context, enhancing response accuracy and relevance.
5. *Response Generation*: Usetheaugmentedprompttogenerate responses that combine the LLM’s
 knowledge with current, specific data, ensuring high-quality, contextually grounded answers.


#### Benefits of Using RAG

+ *Up-to-Date and Accurate Responses*: Enhances the LLM’s responses with current external
 data, improving accuracy and relevance.
+ *Reducing Inaccurate Responses*: Grounds the LLM’s output in relevant knowledge, reducing
 the risk of generating incorrect information.
+ *Domain-Specific Responses*: Delivers contextually relevant responses tailored to an organisa
tion’s proprietary data.
+ *EfficiencyandCost-Effectiveness*: Offersacost-effective method for customising LLMs without
 extensive model fine-tuning.

####  Challenges and Considerations in Serving RAG

1. *User Experience*: Ensuring rapid response times suitable for real-time applications.
2. *Cost Efficiency*: Managing the costs associated with serving millions of responses.
3. *Accuracy*: Ensuring outputs are accurate to avoid misinformation.
4. *Recency and Relevance*: Keeping responses and content current with the latest data.
5. *Business Context Awareness*: Aligning LLM responses with specific business contexts.
6. *Service Scalability*: Managing increased capacity while controlling costs.
7. *Security and Governance*: Implementing protocols for data security, privacy, and governance.

#### Considerations for Choosing Between RAG and Fine-Tuning

When considering external data access, RAG is likely a superior option for applications needing to access
 external data sources. Fine-tuning, on the other hand, is more suitable if you require the model to ad
just its behaviour, and writing style, or incorporate domain-specific knowledge. 

 In terms of suppressing
 hallucinations and ensuring accuracy, RAG systems tend to perform better as they are less prone to gen
erating incorrect information. 

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:450px">
    <img src="image/compare_rag_fine.png" alt="" />
</div>