# Fine-tuning Open-weight LLMs with Anyscale

**⏱️ Time to complete**: N/A

Fine-tuning LLMs is an easy and cost-effective way to tailor their capabilities towards niche applications with high-acccuracy. While Ray and RayTrain offer a generic primitives for building such workloads, at Anyscale we have created a higher-level library called _LLMForge_ that builds on top of Ray and other open-source libraries to provide an easy to work with interface for fine-tuning and training LLMs. 

This template is a guide on how to use LLMForge for fine-tuning LLMs.


### Table of content

## What is LLMForge?

LLMForge is a library that implements a collection of design patterns that use Ray, RayTrain, and RayData in combination with other open-source libraries (e.g. Deepspeed, 🤗 Huggingface accelerate, transformers, etc.) to provide an easy to use library for fine-tuning LLMs. In addition to these design patterns, it offers tight integrations with the Anyscale platform, such as model registery, streamlined deployment, observability, Anyscale's job submission, etc.

### Configurations

LLMForge workloads are specified using YAML configurations ([documentation here](https://docs.anyscale.com/reference/finetuning-config-api)). The library offers two main modes: `auto` and `custom`.

#### Auto Mode
Similar to OpenAI's finetuning experience, the `auto` mode provides a minimal and efficient setup. It allows you to quickly start a finetuning job by setting just a few parameters (`model_id` and `train_path`). All other settings are optional and will be automatically selected based on dataset statistics and predefined configurations.

#### Custom Mode
The `custom` mode offers more flexibility and control over the finetuning process, allowing for advanced optimizations and customizations. You need to provide more configurations to setup this mode (e.g. prompt format, hardware, batch size, etc.)

Here's a comparison of the two modes:

| Feature | Auto Mode | Custom Mode |
|---------|-----------|-------------|
| Ideal For | Prototyping what's possible, focusing on dataset cleaning, finetuning, and evaluation pipeline | Optimizing model quality by controlling more parameters, hardware control |
| Command | `llmforge anyscale finetune config.yaml --auto` | `llmforge anyscale finetune config.yaml` |
| Model Support | Popular chat-format models (e.g., `meta-llama/Meta-Llama-3-8B-Instruct`) | Any HuggingFace model, any format (e.g., `meta-llama/Meta-Llama-Guard-2-8B`) |
| Task Support | Instruction tuning for multi-turn chat | CausalLM, instruction tuning, classification, preference tuning |
| Data Format | Fixed for chat-style data | Flexible, supports various prompt formats |
| Hardware | Automatically selected (limited by availability) | User-configurable |
| Fine-tuning type| Only supports LoRA (Rank-8, all linear layers) | User-defined LoRA and Full-parameter |

Choose the mode that best fits your project requirements and level of customization needed.

### Models Supported in Auto Mode

Auto mode supports a select list of models, with a fixed cluster type of 8xA100-80G. Here are the supported models and their configurations:

| model_id | Supported context lengths |
|-------|-----------------|
| meta-llama/Meta-Llama-3-8B-Instruct |  512, 1024, 2048, 4096 |
| meta-llama/Meta-Llama-3-70B-Instruct  | 512, 1024, 2048, 4096 |
| mistralai/Mistral-7B-Instruct-v0.2 | 512, 1024, 2048, 4096 |

Note: 
- Cluster type for all models: 8xA100-80G



## Highlight of features

* Support both Full parameter and LoRA
*‌ Flash attention v2, full deepspeed support (zero-DDP sharding)
* Gradient checkopinting, mixed precision training, etc.
* Unified chat data format with flexible prompt format support
* Flexible task support: CasualLM, Instruction tuning, Classification, Preference tuning
* Support for multi-stage continuous fine-tuning
* Support for context length extension
* Customization of hyper-parameters
* Anyscale integrations: Model registery, Monitoring and observability, Compatible with Anyscale jobs.

## Examples

### Auto

```bash
llmforge anyscale finetune training_configs/auto/llama-3-8b/simple.yaml --auto
```

Config is simple:

```yaml
model_id: meta-llama/Meta-Llama-3-8B-Instruct
train_path: s3://...
valid_path: s3://...
logger:
    wandb:
        project: ...
        entity: ...        
```

### Custom

```bash
llmforge anyscale finetune training_configs/custom/llama-3-8b/lora.yaml 
```

```yaml
model_id: meta-llama/Meta-Llama-3-8B-Instruct
train_path: s3://...
valid_path: s3://...
num_epochs: 3
learning_rate: 1e-4
logger:
    wandb:
        project: ...
        entity: ...
```

More examples can be found in `./training_configs`. For specific features read [cookbooks](#cookbooks) and [end-to-end examples](#end-to-end-examples).

## Cookbooks

After you are with the above, you can find recipies that extend the functionality of this template under the cookbooks folder:

* Serve fine-tuned models
* Setup WandB
* [Bring your own data](cookbooks/bring_your_own_data/README.md): Everything you need to know about using custom datasets for fine-tuning.
* [Bring any huggingface model and prompt format](cookbooks/bring_any_hf_model/README.md): Learn how you can finetune any 🤗Hugging Face model with a custom prompt format (chat template). 
* Deepdive on Auto mode
* Deepdive on custom mode
* [LoRA vs. full-parameter training](cookbooks/continue_from_checkpoint/README.md): Learn the differences between LoRA and full-parameter training and how to configure both.
* [Continue fine-tuning from a previous checkpoint](cookbooks/continue_from_checkpoint/README.md): A detailed guide on how you can use a previous checkpoint for another round of fine-tuning.
* [Modifying hyperparameters](cookbooks/modifying_hyperparameters/README.md): A brief guide on customization of your fine-tuning job.
* [Optimizing Cost and Performance for Finetuning](cookbooks/optimize_cost/README.md): A detailed guide on default performance-related parameters and how you can optimize throughput for training on your own data.
* [Run finetuning as Anyscale Job](cookbooks/launch_as_anyscale_job/README.md): A detailed guide on how to submit a finetuning workflow as a job (outside the context of workspaces.)

## End-to-end Examples

Here is a list of end-to-end examples that involve more steps such as data preprocessing, evaluation, etc but with a main focus on improving model quality via fine-tuning.

* [Fine-tuning for Function calling on custom data](end-to-end-examples/fine-tune-function-calling/README.md)

## LLMForge Versions

Here is a list of LLMForge image versions:

| version | image_uri |
|---------|-----------|
| `0.5.2`  | `localhost:5555/anyscale/llm-forge:0.5.2` |
| `0.5.1`  | `localhost:5555/anyscale/llm-forge:0.5.1` |
| `0.5.0.1`  | `localhost:5555/anyscale/llm-forge:0.5.0.1-ngmM6BdcEdhWo0nvedP7janPLKS9Cdz2` |