# Step 1: Set up your Google Colab environment
First, open a new Colab notebook and install the necessary packages.

In [3]:
# Install the necessary libraries
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install trl peft accelerate bitsandbytes
!pip install xformers


Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-58dcqild/unsloth_b15e779259484b729cf9968b06d4781a
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-58dcqild/unsloth_b15e779259484b729cf9968b06d4781a
  Resolved https://github.com/unslothai/unsloth.git to commit 79a2112ca4a775ce0b3cb75f5074136cb54ea6df
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [6]:
!pip install triton


Collecting triton
  Downloading triton-3.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.3 kB)
Downloading triton-3.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.4/209.4 MB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: triton
Successfully installed triton-3.0.0


# Step 2: Import the Required Libraries
After installing, import the necessary libraries.

In [7]:
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


# Step 3: Load the Phi-3 Model
Now, load the pre-trained Phi-3 model using Unsloth. This example uses 4-bit quantization to reduce memory usage.

In [8]:
max_seq_length = 2048  # You can modify this as needed
dtype = None  # Auto-detect dtype or set to torch.float16 if needed
load_in_4bit = True  # 4-bit quantization to save memory

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Phi-3-mini-4k-instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)


==((====))==  Unsloth 2024.9.post4: Fast Mistral patching. Transformers = 4.44.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.26G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/194 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/458 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

# Step 4: Set Up Fine-Tuning Parameters
To fine-tune the model, you'll need to define the LoRA (Low-Rank Adaptation) parameters. This will control how the model is adapted during fine-tuning.

In [9]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank (higher uses more memory but can improve performance)
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,  # Set to 0 for better optimization
    bias="none",  # Optimized for no bias
    use_gradient_checkpointing="unsloth",  # For long context, helps with memory efficiency
    random_state=3407,
)


Unsloth 2024.9.post4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


# Step 5: Load and Prepare Your Custom Data
Load your dataset, in this case, the mental_health_chatbot_dataset (you can replace this with your custom dataset).



In [10]:
data = load_dataset("heliosbrahma/mental_health_chatbot_dataset")


README.md:   0%|          | 0.00/2.50k [00:00<?, ?B/s]

(…)-00000-of-00001-01391a60ef5c00d9.parquet:   0%|          | 0.00/102k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/172 [00:00<?, ? examples/s]

# Step 6: Fine-Tune the Model
Now, it's time to fine-tune the model. You'll define training arguments, such as the learning rate, batch size, and number of steps. This configuration is optimized for Colab's limited resources.



In [11]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=data['train'],
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,  # Adjust based on Colab's performance
    packing=False,  # Speeds up training for short sequences
    args=TrainingArguments(
        per_device_train_batch_size=2,  # Small batch size to avoid OOM errors
        gradient_accumulation_steps=4,  # Accumulates gradients to simulate a larger batch size
        warmup_steps=5,  # Warmup steps for learning rate
        max_steps=60,  # Short training to avoid timeouts in Colab
        learning_rate=2e-4,  # Learning rate
        fp16=not torch.cuda.is_bf16_supported(),  # Mixed precision for faster training
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",  # Memory-efficient Adam optimizer
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)


Map (num_proc=2):   0%|          | 0/172 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


# Step 7: Monitor GPU Usage
You can monitor the GPU memory usage during training to avoid overloading the resources.



In [12]:
# Display GPU memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")


GPU = Tesla T4. Max memory = 14.748 GB.
2.283 GB of memory reserved.


# Step 8: Start Training
Start the training process using the trainer.train() method.

In [13]:
trainer_stats = trainer.train()


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 172 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 29,884,416


Step,Training Loss
1,1.189
2,0.9741
3,1.3177
4,1.2641
5,1.2559
6,0.9073
7,1.1012
8,1.3368
9,1.3372
10,0.7624


# Step 9: Evaluate the Model
After training, you can evaluate the model's performance and adjust the hyperparameters as needed.

In [14]:
print(trainer_stats.metrics)


{'train_runtime': 310.5112, 'train_samples_per_second': 1.546, 'train_steps_per_second': 0.193, 'total_flos': 3466330523873280.0, 'train_loss': 0.9898937493562698, 'epoch': 2.7906976744186047}


# Step 10: Save the Fine-Tuned Model
Once the model is fine-tuned, save it for future use.



# Step 11: Inference with the Fine-Tuned Model
You can now generate predictions from your fine-tuned model:

In [15]:
model.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")


('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.model',
 'lora_model/added_tokens.json',
 'lora_model/tokenizer.json')

Before generating text, call FastLanguageModel.for_inference(model) to enable the inference mode:



In [19]:
# Prepare the model for inference
FastLanguageModel.for_inference(model)

# Now, proceed with generating text
prompt_template = "What is a panic attack?"
inputs = tokenizer([prompt_template], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, use_cache=True)

# Decode and print the output
print(tokenizer.batch_decode(outputs)[0])


What is a panic attack?
A panic attack is a sudden episode of intense fear or anxiety that can cause physical and emotional symptoms. It can be very frightening and overwhelming, but it's important to remember that panic attacks are not life-threatening and can be managed with the right support and treatment.

What are the symptoms of a panic attack?
The symptoms of a panic attack can vary from person to person, but some common signs include:

- Rapid heart rate
- Sweating
- Trembling or shaking
- Shortness of breath
- Feeling of choking
- Chest pain or discomfort
- Nausea or abdominal pain
- Dizziness or lightheadedness
- Feeling of unreality or detachment
- Fear of losing control or going crazy
- Fear of dying

How common are panic


In [21]:
prompt_template = "What causes mental health problems?"
inputs = tokenizer([prompt_template], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, use_cache=True)
print(tokenizer.batch_decode(outputs)[0])

What causes mental health problems?
Mental health problems can be caused by a combination of factors, including:
- Genetics: Some mental health problems have a genetic component, meaning they can run in families.
- Biology: Imbalances in brain chemicals, hormones, or other biological factors can contribute to mental health problems.
- Life experiences: Traumatic events, such as abuse, neglect, or witnessing violence, can lead to mental health problems.
- Environment: Stressful or challenging life circumstances, such as financial difficulties or relationship problems, can contribute to mental health problems.
- Substance use: Using drugs or alcohol can lead to mental health problems or exacerbate existing ones.
It’s important to remember that mental health problems are not a sign of weakness or a personal failing. They are medical conditions that can be treated with professional help.
How are mental health problems treated?
Mental health problems


In [20]:
prompt_template = "What activities help you feel better? Ask what activities help someone feel calm or happy."
inputs = tokenizer([prompt_template], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, use_cache=True)
print(tokenizer.batch_decode(outputs)[0])

What activities help you feel better? Ask what activities help someone feel calm or happy.

# Answer
Activities that help someone feel better can vary widely from person to person, as individual preferences and needs play a significant role in what brings relaxation and happiness. Here are some common activities that many people find helpful:

1. **Exercise**: Physical activity can release endorphins, which are chemicals in the brain that act as natural painkillers and mood elevators.

2. **Meditation and Mindfulness**: These practices can help reduce stress and anxiety by focusing on the present moment and calming the mind.

3. **Reading**: Getting lost in a good book can be a great way to escape from daily stressors and immerse oneself in another world.

4. **Listening to Music**: Music has a powerful effect on emotions and can be a great way to uplift one's mood.

5. **Sp
