Check the GPU version available in the environment and install specific dependencies that are compatible with the detected GPU to prevent version conflicts.

In [None]:
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
    !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:
    !pip install --no-deps xformers trl peft accelerate bitsandbytes
pass

Next we need to prepare to load a quantized language model, optimized for memory efficiency with 4-bit quantization.




```
Mean Sequence Length: 173.20
Max Sequence Length: 344
90th Percentile: 202.0
95th Percentile: 213.0
Suggested max_seq_length: 512
```



In [None]:
!pip install unsloth
from unsloth import FastLanguageModel
import torch

max_seq_length = 256 #max is 8k
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    #model_name = "unsloth/llama-3-8b-bnb-4bit",
    model_name = "unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit",
    #model_name = "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


    PyTorch 2.6.0+cu124 with CUDA 1204 (you have 2.5.1+cu124)
    Python  3.11.11 (you have 3.11.11)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.9: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.35G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.7k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]



---



In [None]:
from datasets import load_dataset, Dataset
from huggingface_hub import login

# Login to access private datasets
login(token="ENTER YOUR HUGGINGFACE TOKEN HERE")

Next, we integrate LoRA adapters into our model, which allows us to efficiently update just a fraction of the model's parameters, enhancing training speed and reducing computational load.

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,#rank of the low-rank approximation matrices used in LoRA
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 9091,
)

Unsloth 2025.3.9 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


# Data Prep
Define a system prompt that formats tasks into instructions, inputs, and responses, and apply it to the dataset to prepare our inputs and outputs for the model, with an EOS token to signal completion.

In [None]:
# Load the dataset
dataset = load_dataset("F219091/LabelledPatientTranscripts")
print(dataset)

README.md:   0%|          | 0.00/912 [00:00<?, ?B/s]

merged_data1_labeled.csv:   0%|          | 0.00/105M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['Instruction', 'Input', 'Response', 'Emotion_Label'],
        num_rows: 100000
    })
})


### Alpaca Prompt
The template guides how the text should be formatted. It ensures that the model receives a consistent structure (with instruction, input, and response sections) for training or inference.

In [None]:
EOS_TOKEN = "<|endoftext|>"

alpaca_prompt = """Use this input to analyze underlying patterns and guide towards constructive emotional expression and resolution.

### Instruction:
{instruction}

### Identified Emotional Issue:
{emotion}

### Input:
{input}

### Response:
{response}{EOS_TOKEN}"""

In [None]:
def alpaca_formatting(examples):
    instructions = examples["Instruction"]
    inputs = examples["Input"]
    outputs = examples["Response"]
    emotions = examples["Emotion_Label"]  # New column for emotional context

    texts = []
    for instruction, input_text, output, emotion in zip(instructions, inputs, outputs, emotions):
        text = (f"### Instruction:\n{instruction}\n\n"
                f"### Identified Emotional Issue:\n{emotion}\n\n"
                f"### Input:\n{input_text}\n\n"
                f"### Response:\n{output}{EOS_TOKEN}")
        texts.append(text)

    return {"text": texts}

# Apply the new formatting to the dataset
dataset['train'] = dataset['train'].map(alpaca_formatting, batched=True, batch_size=10000)
print("Sample formatted data:", dataset['train'][0])

Map:   0%|          | 0/100000 [00:00<?, ? examples/s]

Sample formatted data: {'Instruction': "I've been feeling so sad and overwhelmed lately. Work has become such a massive source of stress for me.", 'Input': "You are a psychology bot designed to provide empathetic and thoughtful responses to users about their mental health and emotional well-being. Your responses should be polite, encouraging, and sensitive to the user's feelings. When answering questions, consider the emotional tone and provide supportive guidance based on the user's input. Always offer positive reinforcement and suggest healthy coping strategies when appropriate. Your goal is to create a safe and non-judgmental space for users to express their emotions and seek support.", 'Response': "Hey there, I'm here to listen and support you. It sounds like work has been really challenging lately. Can you tell me more about what's been going on?", 'Emotion_Label': 'Frustration & Irritability', 'text': "### Instruction:\nI've been feeling so sad and overwhelmed lately. Work has be

#Training the Model
*   **Training Steps:** We set `max_steps=50` to speed up the process
*   We use `num_train_epochs=3`
*   **Batch Size & Accumulation:** With `per_device_train_batch_size=2` and `gradient_accumulation_steps=4`, we process small batches but simulate larger ones for better memory efficiency.
*   **Learning Rate:** Set to `1e-5`, controlling how much the model adjusts during training.
*   **Optimizer:** Using `AdamW 8-bit` for efficient optimization while saving memory.
*   **Weight Decay:** Set to `0.01` to prevent the model from overfitting to the data.

In [None]:
%%capture
!pip install datasets

In [None]:
from sklearn.model_selection import train_test_split
from trl import SFTTrainer
from transformers import TrainingArguments, TrainerCallback
import numpy as np

dataset = dataset["train"].train_test_split(test_size=0.01, seed=9091)
train_dataset = dataset["train"]
test_dataset = dataset["test"]

In [None]:
import torch

class EarlyStoppingCallback(TrainerCallback):
    """Stops training if validation loss does not improve for 'patience' evaluations."""

    def __init__(self, patience=3):
        self.patience = patience
        self.best_loss = float("inf")
        self.counter = 0

    def on_evaluate(self, args, state, control, metrics, **kwargs):
        if "eval_loss" in metrics:
            val_loss = metrics["eval_loss"]

            if val_loss < self.best_loss:
                self.best_loss = val_loss
                self.counter = 0  # Reset patience counter
            else:
                self.counter += 1

            if self.counter >= self.patience:
                print(f"🔥 Early stopping triggered! No improvement for {self.patience} evaluations.")
                control.should_training_stop = True

# Define metric function for evaluation
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    logits = torch.tensor(np.array(logits))
    labels = torch.tensor(np.array(labels))

    loss_fn = torch.nn.CrossEntropyLoss()
    loss = loss_fn(logits, labels)

    return {"eval_loss": loss.item()}

In [None]:
from transformers import TrainingArguments
from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=8,
    args=TrainingArguments(
        per_device_train_batch_size=2,  # Adjust batch size for memory
        gradient_accumulation_steps=4,  # Helps with small batch sizes
        warmup_steps=5,
        max_steps=75,  # Train for more steps to avoid underfitting
        num_train_epochs=3,  # Ensures multiple passes over data
        learning_rate=1e-5,  # Stable LR to prevent underfitting
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=2,  # Log loss every 10 steps
        save_strategy="steps",
        save_steps=10,  # Save model every 100 steps
        optim="adamw_8bit",  # Memory-efficient optimizer
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=9091,
        output_dir="outputs",
    ),
)

Tokenizing to ["text"] (num_proc=8):   0%|          | 0/99000 [00:00<?, ? examples/s]

In [None]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 99,000 | Num Epochs = 1 | Total steps = 75
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 48,627,712/1,889,840,128 (2.57% trained)
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mf219091[0m ([33mf219091-fast-nuces[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
2,0.564
4,0.5748
6,0.5547
8,0.5018
10,0.4782
12,0.4068
14,0.3435
16,0.2954
18,0.2429
20,0.2138


TrainOutput(global_step=75, training_loss=0.1642856320242087, metrics={'train_runtime': 337.3906, 'train_samples_per_second': 1.778, 'train_steps_per_second': 0.222, 'total_flos': 2394659644354560.0, 'train_loss': 0.1642856320242087})

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import HfApi

repo_name = "F219091/Express2Heal"  # Your repository name

# Save model and tokenizer locally
model.save_pretrained(repo_name)
tokenizer.save_pretrained(repo_name)

# Push to Hugging Face Hub
model.push_to_hub(repo_name)
tokenizer.push_to_hub(repo_name)

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/195M [00:00<?, ?B/s]

Saved model to https://huggingface.co/F219091/Express2Heal


No files have been modified since last commit. Skipping to prevent empty commit.


<a name="Inference"></a>
### Inference

In [None]:
FastLanguageModel.for_inference(model)

messages = [
    {"role": "Patient", "content": """I am feeling numb and dont feel bad or good . its like everything emotionally in life is just white or black?"""}
]

# Tokenize input and set attention mask
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

# Manually set attention mask to avoid warning
attention_mask = inputs.ne(tokenizer.pad_token_id).to(inputs.device)

# Generate response
outputs = model.generate(
    input_ids=inputs,
    attention_mask=attention_mask,
    max_new_tokens=128
)

# Decode output, ensuring it looks clean
response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

# Add formatting (optional)
formatted_response = f"### Express2Heal Chatbot Response:\n\n{response}"

print(formatted_response)

### Express2Heal Chatbot Response:

system

Cutting Knowledge Date: December 2023
Today Date: 12 Mar 2025

Patient

I am feeling numb and dont feel bad or good. its like everything emotionally in life is just white or black?assistant

It seems like you're experiencing a sense of emotional numbness, and that's okay. It's normal to feel this way, especially during difficult times. If you're feeling overwhelmed or struggling to cope, consider reaching out to a trusted friend, family member, or mental health professional for support.
