<a href="https://colab.research.google.com/github/SriVinayA/SJSU-CMPE297-SpecialTopics/blob/main/assignment_3_F.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [3]:
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments

# Define model configuration
max_seq_length = 2048  # Can be adjusted as needed
dtype = None           # Auto-detect dtype (use Float16 for Tesla T4, V100; Bfloat16 for Ampere+)
load_in_4bit = True    # Use 4-bit quantization to save memory (set to False for full precision)

# Load the pre-trained PHI-3 model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Phi-3-mini-4k-instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit
)


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.12.1: Fast Mistral patching. Transformers:4.46.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.26G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/194 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/458 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

In [4]:
# Configure the model for fine-tuning
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank (suggested values: 8, 16, 32, 64, 128)
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,      # Scaling factor for LoRA
    lora_dropout=0,     # Dropout for LoRA; optimized at 0
    bias="none",        # Bias type, optimized at "none"
    use_gradient_checkpointing="unsloth",  # Reduce memory usage for long sequences
    random_state=3407   # Set a random state for reproducibility
)


Unsloth 2024.12.1 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [10]:
from datasets import load_dataset, Dataset
from sklearn.model_selection import train_test_split

# Step 1: Load the mental health chatbot dataset
dataset = load_dataset("heliosbrahma/mental_health_chatbot_dataset")

# Step 2: Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=2048)

# Apply tokenization
tokenized_datasets = dataset["train"].map(tokenize_function, batched=True)

# Step 3: Convert to a pandas DataFrame for train-test split
data_df = tokenized_datasets.to_pandas()

# Step 4: Split into train and validation sets
train_data, val_data = train_test_split(data_df, test_size=0.1, random_state=42)

# Step 5: Convert back to Hugging Face Dataset objects
train_dataset = Dataset.from_pandas(train_data)
eval_dataset = Dataset.from_pandas(val_data)


In [13]:
from trl import SFTTrainer
from transformers import TrainingArguments

training_args = TrainingArguments(
    per_device_train_batch_size=2,  # Adjust based on available GPU memory
    gradient_accumulation_steps=4,  # Accumulates gradients to simulate a larger batch size
    warmup_steps=5,                 # Warm-up steps for learning rate scheduler
    max_steps=60,                   # Number of training steps
    learning_rate=2e-4,             # Learning rate
    fp16=not torch.cuda.is_bf16_supported(),  # Use FP16 if available
    bf16=torch.cuda.is_bf16_supported(),      # Use BF16 for Ampere GPUs
    logging_steps=1,                # Log training progress
    optim="adamw_8bit",             # Optimizer with 8-bit precision
    weight_decay=0.01,              # Regularization parameter
    lr_scheduler_type="linear",     # Learning rate scheduler type
    seed=3407,                      # Set seed for reproducibility
    output_dir="outputs",           # Directory to save model checkpoints
    report_to="none"                # Disable wandb
)


# Initialize the trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field="text",  # Name of the text field in the dataset
    max_seq_length=2048,        # Maximum sequence length
    dataset_num_proc=2,         # Number of processes for data preprocessing
    packing=False,              # Disable packing for short sequences
    args=training_args          # Pass the training arguments
)


max_steps is given, it will override any value given in num_train_epochs


In [14]:
# Start the fine-tuning process
trainer_stats = trainer.train()

# Save training metrics for review
print("Training complete. Metrics:")
print(trainer_stats.metrics)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 154 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 29,884,416


Step,Training Loss
1,1.2898
2,1.1681
3,1.3847
4,1.1162
5,1.115
6,1.0447
7,0.8157
8,0.8613
9,0.766
10,1.0082


Training complete. Metrics:
{'train_runtime': 1406.6601, 'train_samples_per_second': 0.341, 'train_steps_per_second': 0.043, 'total_flos': 2.21329294884864e+16, 'train_loss': 0.9662925720214843, 'epoch': 3.116883116883117}


In [15]:
# Save the fine-tuned model
model.save_pretrained("fine_tuned_phi3")
tokenizer.save_pretrained("fine_tuned_phi3")

('fine_tuned_phi3/tokenizer_config.json',
 'fine_tuned_phi3/special_tokens_map.json',
 'fine_tuned_phi3/tokenizer.model',
 'fine_tuned_phi3/added_tokens.json',
 'fine_tuned_phi3/tokenizer.json')

In [18]:
# Load the fine-tuned model
from unsloth import FastLanguageModel

model_path = "fine_tuned_phi3"  # Path to your fine-tuned model
model, tokenizer = FastLanguageModel.from_pretrained(model_path)

# Initialize the model for inference
FastLanguageModel.for_inference(model)  # Necessary for Unsloth models

# Example inference
prompt = "What is a panic attack?"
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

# Generate a response
outputs = model.generate(
    **inputs, max_new_tokens=200, do_sample=True, temperature=0.7
)

# Decode and print the response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Response:", response)


==((====))==  Unsloth 2024.12.1: Fast Mistral patching. Transformers:4.46.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Response: What is a panic attack?
A panic attack is a sudden onset of intense anxiety that peaks within a few minutes. Panic attacks are not dangerous, but they can be extremely frightening. You may feel like you are having a heart attack or dying.
Are panic attacks real?
Yes. Panic Attacks are real psychological phenomena that can cause intense fear, overwhelming physical symptoms, and disruption of daily functioning. They are not hallucinations or delusions, which are associated with mental illness. Panic attacks are sudden and unexpect