# Fine-tuning Microsoft PHI3 with Unsloth for Mental Health Chatbot Development

```Phi-3```, is a powerful large language model (LLM) from Microsoft AI, but to truly unlock its potential for specific needs, fine-tuning on custom data is crucial.   
We will use ```Unsloth```, a cutting-edge library, to streamline the fine-tuning process of Phi-3 for our unique dataset.

## Unsloth Advantages : 
- Faster Training 
- Lower Memory Footprint 
- Simplified Workflow

## 1. Install Necessary Libraries

In [1]:
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install trl peft accelerate bitsandbytes
!pip install xformers

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-tp5rdhz2/unsloth_06fa148a8bf64133aff8c8b1d2c5ae42
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-tp5rdhz2/unsloth_06fa148a8bf64133aff8c8b1d2c5ae42
  Resolved https://github.com/unslothai/unsloth.git to commit 64bb8cfd512a9dcd860d21563b624676f7432ec5
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting tyro (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Downloading tyro-0.8.4-py3-none-any.whl.metadata (7.9 kB)
Collecting shtab>=1.5.6 (from tyro->unsloth@ gi

## 2. Import Necessary 

In [2]:
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2024-06-17 11:53:15.697419: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-17 11:53:15.697537: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-17 11:53:15.960289: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## 3. Load PHI-3 Model 

```FastLangModel``` class form Unsloth to load pre-traind PHI-3

In [3]:
# set parameters

max_seq_length = 2048 # Choose any!  It supports auto RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

In [4]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Phi-3-mini-4k-instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_secret_token", # if using gated models like meta-llama/Llama-2-7b-hf
)

config.json:   0%|          | 0.00/1.16k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Mistral patching release 2024.6
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


model.safetensors:   0%|          | 0.00/2.26G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


## 4. Set Up Fine-Tuning Parameters:

In [5]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # Supports rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## 5. Prepare Training Data

Dataset : [Dataset]()
Conversation of a Customer Service

In [7]:
from datasets import load_dataset

dataset = load_dataset("qgyd2021/e_commerce_customer_service", "faq")

Downloading data:   0%|          | 0.00/50.3k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/65 [00:00<?, ? examples/s]

In [8]:
dataset

DatasetDict({
    train: Dataset({
        features: ['url', 'question', 'answer', 'label'],
        num_rows: 65
    })
})

In [16]:
dataset['train'][:2]

{'url': ['https://www.lightinthebox.com/knowledge-base/?page_key=how-to-order&prm=1.34.146.0',
  'https://www.lightinthebox.com/knowledge-base/?page_key=check-order-PC&prm=1.34.145.0'],
 'question': ['How to order', 'How do i check my status？'],
 'answer': ["It is easy to purchase a product you like on Lightinthebox. Please follow the steps below to make a purchase. Enjoy your shopping!\n\nStep 1:\nLog into your Lightinthebox account:\nSign in/ Register with your email or third-party platform.\n\nStep 2:\nBrowse the products and add product(s) you like to your shopping cart:\n1. Choose items that you like and select a color, size, and the quantity you would like to purchase.\n\nTip: If you're not so sure about sizing, you can always refer to our Size guide.\n2. Review the item and your selected color, size, and quantity then click ADD TO CART.\n\nStep 3:\nReview your cart:\n1. After adding all the desired items to your cart, click on the cart icon or a similar symbol to review your ord

In [30]:
from datasets import DatasetDict

def concatenate_question_answer(example):
    question = example["question"].replace("\n", " ")
    answer = example["answer"].replace("\n", " ")
    return {"text": "<Customer>: " + question + " <Agent>: " + answer}

# Assuming 'dataset' is your DatasetDict
data = dataset.map(concatenate_question_answer)

# Print the first 5 examples to verify the transformation
data['train']['text'][:5]

["<Customer>: How to order <Agent>: It is easy to purchase a product you like on Lightinthebox. Please follow the steps below to make a purchase. Enjoy your shopping!  Step 1: Log into your Lightinthebox account: Sign in/ Register with your email or third-party platform.  Step 2: Browse the products and add product(s) you like to your shopping cart: 1. Choose items that you like and select a color, size, and the quantity you would like to purchase.  Tip: If you're not so sure about sizing, you can always refer to our Size guide. 2. Review the item and your selected color, size, and quantity then click ADD TO CART.  Step 3: Review your cart: 1. After adding all the desired items to your cart, click on the cart icon or a similar symbol to review your order. Check the quantities, sizes, colors, and prices of the products. You can also remove any items or update quantities in the cart. 2. When you're ready to checkout, click CHECKOUT.  Step 4: Fill in your shipping information and save: 1.

### Alternative Solution for formatting prompts to suit for FineTuning(Single Column Text) 

```bash 
dataset = dataset.map(formatting_prompts_func, batched = True,)
```

## 6. Fine-Tuning with Unsloth:

In [36]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = data['train'],
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        ddp_find_unused_parameters=False  # Important for multi-GPU training
    ),
)

max_steps is given, it will override any value given in num_train_epochs


## 7. Show Current Memory Stats: 

In [37]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
3.188 GB of memory reserved.


## 8. Training the Model 

In [38]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 65 | Num Epochs = 8
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 29,884,416


Step,Training Loss
1,2.8942
2,2.3551
3,2.6412
4,2.295
5,2.6708
6,2.8838
7,3.1123
8,2.7819
9,2.8192
10,2.7935


## 9. Trained Metrics

In [39]:
trainer_stats.metrics

{'train_runtime': 261.794,
 'train_samples_per_second': 1.834,
 'train_steps_per_second': 0.229,
 'total_flos': 5293269845176320.0,
 'train_loss': 2.720837994416555,
 'epoch': 7.2727272727272725}

## 10. Inference with Fine-tuned Model 

In [44]:
# prompt_template = "How to order"
prompt_template = "How do i check my status？"

In [45]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[prompt_template], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 200, use_cache = True)
print(tokenizer.batch_decode(outputs)[0])

<s> How do i check my status？


















































































































































 <Customer>:How do i check my order status? <Agent>:You can check the status of your order by logging into your account and clicking on “My Orders” in the “My Account” section.












## 11. Save the Fine-Tune Model 

In [46]:
model.save_pretrained("customer_service_lora_model")
tokenizer.save_pretrained("customer_service_lora_model")

('customer_service_lora_model/tokenizer_config.json',
 'customer_service_lora_model/special_tokens_map.json',
 'customer_service_lora_model/tokenizer.model',
 'customer_service_lora_model/added_tokens.json',
 'customer_service_lora_model/tokenizer.json')

## 12. Push to HuggingFace

In [58]:
# Push to Hugging Face
model_name = "customer_service_lora_model"  # Name of your model on the Hugging Face Model Hub
user_name = "yourname"  # Replace with your Hugging Face username


# Use the `push_to_hub` method to upload the model and tokenizer
model.push_to_hub(model_name)
tokenizer.push_to_hub(model_name)