# Fine-tuning with LoRA on Remote server

## Fine-tuning methods
- Full fine-tuning
- Freeze fine-tuning (Incremental)
- Low-Rank Adaptation (LoRA)

## What is LoRA？
- 通过局部微调，可以在保持原始模型性能的同时，降低模型的大小和计算量
- 将原始模型的大矩阵分解为两个小矩阵
- 将训练的结果**融合**到原始模型中，以获得更好的性能
- 融合的技术有：知识蒸馏、模型融合、模型压缩等

## LLaMA-Factory
- [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#installation)

### 1. Install LLaMA-Factory

```bash
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
```

### 2. Launch LLaMA-Factory webUI

```bash
nohup llamafactory-cli webui &
```
- Modify source code to support remote access webUI
```bash
export GRADIO_SHARE=True
export GRADIO_SERVER_NAME=Server_Public_IP
```

### 3. Download pre-trained model

In [None]:
# !pip install modelscope
from modelscope import snapshot_download

# Try Meta Llama 3.2 1B Instruct
snapshot_download('LLM-Research/Llama-3.2-1B-Instruct', cache_dir='/root/autodl-tmp/models')

### 4. Upload training dataset to LLaMA-Factory project folder

```bash
# copy dataset to remote server
scp ../local_dataset/Llama3Data/fintech.json llamafactory@Server_Public_IP:/path/to/LLaMA-Factory/data

# modify dataset_info.json to include the new dataset
{
  ... ...
  "fintech": {
    "file_name": "fintech.json"
  }
}
```

### 5. WebUI: Configure settings for LLaMA-Factory


<img src="./assets/llama-factory-lora-configuration-1.jpg" style="margin-left: 0px" width=1024px>
<img src="./assets/llama-factory-lora-configuration-2.jpg" style="margin-left: 0px" width=1024px>
<img src="./assets/llama-factory-lora-configuration-3.jpg" style="margin-left: 0px" width=1024px>


### 6. CLI that mapping LlaMA-Factory WebUI settings

#### 6.1 Command with parameters:

```bash
llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template llama3 \
    --flash_attn auto \
    --dataset_dir data \
    --dataset identity,fintech \
    --cutoff_len 2048 \
    --learning_rate 5e-05 \
    --num_train_epochs 1000.0 \
    --max_samples 10000 \
    --per_device_train_batch_size 10 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --packing False \
    --report_to none \
    --output_dir saves/Llama-3.2-1B-Instruct/lora/train_2025-01-11-00-05-34 \
    --bf16 True \
    --plot_loss True \
    --trust_remote_code True \
    --ddp_timeout 180000000 \
    --optim adamw_torch \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --val_size 0.02 \
    --eval_strategy steps \
    --eval_steps 100 \
    --per_device_eval_batch_size 10
```


#### 6.2 Command with configuration file `configuration.yaml`

```bash
llamafactory-cli train --config configuration.yaml
```

- yaml content for `configuration.yaml`:

```bash
top.booster: auto
top.checkpoint_path: []
top.finetuning_type: lora
top.model_name: Llama-3.2-1B-Instruct
top.quantization_bit: none
top.quantization_method: bitsandbytes
top.rope_scaling: none
top.template: llama3
train.additional_target: ''
train.badam_mode: layer
train.badam_switch_interval: 50
train.badam_switch_mode: ascending
train.badam_update_ratio: 0.05
train.batch_size: 5
train.compute_type: fp16
train.create_new_adapter: false
train.cutoff_len: 2048
train.dataset:
- identity
- fintech
train.dataset_dir: data
train.ds_offload: false
train.ds_stage: none
train.extra_args: '{"optim": "adamw_torch"}'
train.freeze_extra_modules: ''
train.freeze_trainable_layers: 2
train.freeze_trainable_modules: all
train.galore_rank: 16
train.galore_scale: 0.25
train.galore_target: all
train.galore_update_interval: 200
train.gradient_accumulation_steps: 8
train.learning_rate: 5e-5
train.logging_steps: 5
train.lora_alpha: 16
train.lora_dropout: 0
train.lora_rank: 8
train.lora_target: ''
train.loraplus_lr_ratio: 0
train.lr_scheduler_type: cosine
train.mask_history: false
train.max_grad_norm: '1.0'
train.max_samples: '10000'
train.neat_packing: false
train.neftune_alpha: 0
train.num_train_epochs: '1000'
train.packing: false
train.ppo_score_norm: false
train.ppo_whiten_rewards: false
train.pref_beta: 0.1
train.pref_ftx: 0
train.pref_loss: sigmoid
train.report_to: false
train.resize_vocab: false
train.reward_model: null
train.save_steps: 100
train.shift_attn: false
train.swanlab_api_key: ''
train.swanlab_mode: cloud
train.swanlab_project: llamafactory
train.swanlab_run_name: ''
train.swanlab_workspace: ''
train.train_on_prompt: false
train.training_stage: Supervised Fine-Tuning
train.use_badam: false
train.use_dora: false
train.use_galore: false
train.use_llama_pro: false
train.use_pissa: false
train.use_rslora: false
train.use_swanlab: false
train.val_size: 0.02
train.warmup_steps: 0
```

### 7. Using VLLM for inference framework to interact with fine-tuned model

- LLaMA-Factory requirements dependencies
```bash
pip install -e ".[vllm]"
```

- LLama-Factory WebUI settings

<img src="./assets/llama-factory-lora-chat.jpg" style="margin-left: 0px" width=1024px>


### 8. Export Lora model with Pre-trained model

- Configuration for exporting model
<img src="./assets/llama-factory-lora-export-1.jpg" style="margin-left: 0px" width=1024px>

- Server location for exporting model
<img src="./assets/llama-factory-lora-export-2.jpg" style="margin-left: 0px" width=1024px>

- Chat with the exported model
<img src="./assets/llama-factory-lora-export-3.jpg" style="margin-left: 0px" width=1024px>

### Future Work - Evaluate & Predict
- Validation tool: [OpenCompass](opencompass.github.io)
- Quantization