# Local fine-tuning example - Mistral 7b & Microsoft Phi-2 2.7b
This notebook is a companion to the following paper, please cite it accordingly: 
- [de Kok (2024) - SSRN](https://papers.ssrn.com/abstract=4429658)


**Author:** [Ties de Kok](https://www.tiesdekok.com/)

**Important:** please read the instructions in the corresponding `readme.md` file for the setup instructions.

## Imports

In [1]:
import os, sys, re, random, copy, json
from pathlib import Path

### Prepare the data

##### Download the training data

You can upload your own training data to the Runpod intance using either the Jupyter Lab upload button, SFTP, or the Cloud Sync option in the dashboard.

In [2]:
!wget https://raw.githubusercontent.com/TiesdeKok/chatgpt_paper/main/code_examples/vast_ai/data/training_data.jsonl

--2024-01-27 15:50:41--  https://raw.githubusercontent.com/TiesdeKok/chatgpt_paper/main/code_examples/vast_ai/data/training_data.jsonl
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11265 (11K) [text/plain]
Saving to: ‘training_data.jsonl’


2024-01-27 15:50:41 (63.9 MB/s) - ‘training_data.jsonl’ saved [11265/11265]



##### Load the training data

In [3]:
training_data = []
with open("training_data.jsonl", "r") as f:
    for line in f.readlines():
        training_data.append(json.loads(line))

In [4]:
training_data[0]

{'INSTRUCTION': 'Text:\nWe expect the implementation of our new data-driven pricing strategy <to> result in a 5% <increase> in gross margin over the next two quarters.\n####\n',
 'RESPONSE': '["to", "increase"] <|end|>'}

##### Prepare the data for LLaMa-Factory
*Notes:* 
- Technically this isn't nescessary here as we the jsonl is already in a format that is loadable. However, I am demonstrating the below so that you can adapt it to your own data.
- I am removing the special tokens at the beginning and end as `LlaMA-Factory` will add them automatically.

In [5]:
items = []
for item in training_data:
    items.append({
        "prompt" : item["INSTRUCTION"][6:].replace("\n####\n", ""), ## This way we remove the special tokens at the start and end, which llamafactory will add by itself
        "completion" : item["RESPONSE"].replace(" <|end|>", "") ## Same, remove EOS token
    })

In [6]:
with open("data.json", "w") as f:
    json.dump(items, f)

##### Add our dataset to the datasets of LlaMA-Factory

LlaMA-Factory requires that the dataset is included in the `dataset_info.json` file.  
The dataset itself can be located anywhere on your drive. 

In [7]:
with open("/workspace/LLaMA-Factory/data/dataset_info.json", "r") as f:
    dataset_info = json.loads(f.read())

dataset_info["my_data"] = {
    "file_name" : "/workspace/data.json",
    "columns" : {
        "prompt" : "prompt",
        "response" : "completion"
    }
}

with open("/workspace/LLaMA-Factory/data/dataset_info.json", "w") as f:
    json.dump(dataset_info, f)

-----
# Mistral-7B-Instruct
----

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1

## For a full fine-tune

##### For Deepspeed to work we need a config

This is an adapted version from the default from the LlaMA-Factory Github page.

Note: the "offload_optimizer" --> CPU step is crucial here. Without it you will get a CUDA OOM error. 
The downside is that training is slower. 

For more details on the Deepspeed config setting, see e.g.:   
https://huggingface.co/docs/accelerate/v0.11.0/en/deepspeed

In [12]:
import os, json ## Saves having to scroll back up after kernel restart. :)

ds_config = {
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "zero_allow_untested_optimizer": True,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "initial_scale_power": 16,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": True,
    "allgather_bucket_size": 5e8,
    "reduce_scatter": False,
    "reduce_bucket_size": 5e8,
    "overlap_comm": False,
    "contiguous_gradients": True,
    "offload_optimizer": {
        "device": "cpu",
    }
  }
}

with open("/workspace/ds_config_full.json", "w") as f:
    json.dump(ds_config, f)

### Our command

Copy the below into the terminal and keep your fingers crossed!

```bash
cd /workspace/LLaMA-Factory
deepspeed --num_gpus 5 --master_port=9901 src/train_bash.py \
    --deepspeed /workspace/ds_config_full.json \
    --stage sft \
    --do_train \
    --model_name_or_path mistralai/Mistral-7B-Instruct-v0.1 \
    --dataset "my_data" \
    --template default \
    --finetuning_type full \
    --lora_target q_proj,v_proj \
    --output_dir "/workspace/ft_full_1" \
    --overwrite_cache \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 3 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 30.0 \
    --plot_loss \
    --fp16
```

## For a partial fine-tune (i.e., LoRA)

##### For Deepspeed to work we need a config, this is the default from the LlaMA-Factory Github page

In [2]:
import os, json ## Duplicative, but saves having to scroll back up after kernel restart. :)

ds_config = {
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "zero_allow_untested_optimizer": True,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "initial_scale_power": 16,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "zero_optimization": {
    "stage": 1,
    "allgather_partitions": True,
    "allgather_bucket_size": 5e8,
    "reduce_scatter": False,
    "reduce_bucket_size": 5e8,
    "overlap_comm": False,
    "contiguous_gradients": True,
  }
}

with open("/workspace/ds_config_lora.json", "w") as f:
    json.dump(ds_config, f)

### Our command

Copy the below into the terminal and keep your fingers crossed!

```bash
cd /workspace/LLaMA-Factory
deepspeed --num_gpus 5 --master_port=9901 src/train_bash.py \
    --deepspeed /workspace/ds_config_lora.json \
    --stage sft \
    --do_train \
    --model_name_or_path mistralai/Mistral-7B-Instruct-v0.1 \
    --dataset "my_data" \
    --template default \
    --finetuning_type lora  \
    --lora_target q_proj,v_proj \
    --output_dir "/workspace/ft_lora_1" \
    --overwrite_cache \
    --per_device_train_batch_size 8 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 3 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 100.0 \
    --plot_loss \
    --fp16
```

-----
# microsoft/phi-2 (2.7b)
----
https://huggingface.co/microsoft/phi-2

## For a full fine-tune

##### For Deepspeed to work we need a config

This is an adapted version from the default from the LlaMA-Factory Github page.

In [2]:
import os, json ## Duplicative, but saves having to scroll back up after kernel restart. :)

ds_config = {
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "zero_allow_untested_optimizer": True,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "initial_scale_power": 16,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": True,
    "allgather_bucket_size": 5e8,
    "reduce_scatter": False,
    "reduce_bucket_size": 5e8,
    "overlap_comm": False,
    "contiguous_gradients": True,
  }
}

with open("/workspace/ds_config_full_phi2.json", "w") as f:
    json.dump(ds_config, f)

### Our command

Copy the below into the terminal and keep your fingers crossed!

```bash
cd /workspace/LLaMA-Factory
deepspeed --num_gpus 5 --master_port=9901 src/train_bash.py \
    --deepspeed /workspace/ds_config_full_phi2.json \
    --stage sft \
    --do_train \
    --model_name_or_path microsoft/phi-2 \
    --dataset "my_data" \
    --template default \
    --finetuning_type full \
    --lora_target q_proj,v_proj \
    --output_dir "/workspace/ft_full_phi2_1" \
    --overwrite_cache \
    --per_device_train_batch_size 8 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 5 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 40.0 \
    --plot_loss \
    --fp16
```

----
# Use the resulting model
---

**WARNING:** After running the below you need to restart the kernel before starting your next fine-tune!

An easy way to download your model to your computer is to enter the model folder on the left and right click --> "Download Current Folder as Archive".

In [3]:
import torch, json
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_device("cuda")

In [5]:
#model = "ft_full_1"
#model = "ft_lora_1"
model = "ft_full_phi2_1"

model_path = Path(f"/workspace/{model}")
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype = "auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [15]:
text = "In the year 2021 our <production> <numbers> are looking a little down, by about 3%." 
inputs = tokenizer(
    f"""Human: {text}\nAssistant:""",
    return_tensors = "pt",
    return_attention_mask = "False",    
)

outputs = model.generate(
    **inputs,
    max_length = 50,
    eos_token_id = tokenizer.eos_token_id,
    pad_token_id = tokenizer.eos_token_id
)

text = tokenizer.batch_decode(outputs, skip_special_tokens = True)[0]
print(text)

result = json.loads(text.split("\nAssistant:")[-1])
result

Human: In the year 2021 our <production> <numbers> are looking a little down, by about 3%.
Assistant:["production", "numbers"]


['production', 'numbers']