# ***Sentiment Analysis and Supportive Response Generation in Mental Health***

This Mental Health sentiment analysis and Support Model is a multi-task model designed for detecting users' mental health states from text and generating supportive responses based on the detected emotion.

This model leverages the powerful **XLM-RoBERTa** architecture for multi-lingual text classification, identifying seven distinct mental health categories, including "Normal," "Depression," "Suicidal," "Anxiety," "Bipolar," "Stress," and "Personality Disorder."

Once an emotion is detected, the model seamlessly integrates **FLAN-T5 (Large)** to generate empathetic and contextually appropriate supportive messages. Built using **PyTorch** and **Hugging Face Transformers**, this model is well-suited for applications such as mental health detection on social media, automated mental health support platforms, and psychological counseling assistance.

Reminder: Before running this notebook, please make sure the following libaries have been installed:

*   Transformers: For loading two model - XLM-RoBERTa model and FLAN-T5.
*   Datasets: For handling and processing large datasets easily.
*   Torch(PyTorch): For model training, evaluation, and GPU acceleration.

In [None]:
pip install transformers datasets torch

Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupt

# 1. Data Preparation and Preprocessing


## 1.1 Importing Kaggle Data Sources
The data is from Kaggle, which contains users' generated text labeled with different mental health categories (e.g., Normal, Depression, Suicidal, Anxiety, Bipolar, Stress, Personality Disorder). This data serves as the foundation for training the XLM-RoBERTa classification model, which allows it to detect users' mental health states from text.

In [None]:
import kagglehub
import pandas as pd

suchintikasarkar_sentiment_analysis_for_mental_health_path = kagglehub.dataset_download('suchintikasarkar/sentiment-analysis-for-mental-health')

print('Data source import complete.')

df=pd.read_csv("/kaggle/input/sentiment-analysis-for-mental-health/Combined Data.csv")

# Drop rows with missing text
df = df.dropna(subset=['statement'])
df.info()
df['status'].value_counts()

Data source import complete.
<class 'pandas.core.frame.DataFrame'>
Index: 52681 entries, 0 to 53042
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  52681 non-null  int64 
 1   statement   52681 non-null  object
 2   status      52681 non-null  object
dtypes: int64(1), object(2)
memory usage: 1.6+ MB


Unnamed: 0_level_0,count
status,Unnamed: 1_level_1
Normal,16343
Depression,15404
Suicidal,10652
Anxiety,3841
Bipolar,2777
Stress,2587
Personality disorder,1077


## 1.2 Handling Class Imbalance with Weights Calculation

Class weights are calculated to handle data imbalance, ensuring the model pays more attention to minority classes.

In [None]:
from sklearn.utils.class_weight import compute_class_weight
import numpy as np
import torch
from torch import nn

# Example labels
labels = ['Normal', 'Depression', 'Suicidal', 'Anxiety', 'Bipolar', 'Stress', 'Personality disorder']
class_weights = compute_class_weight('balanced', classes=np.unique(labels), y=labels)

# Convert to dictionary format (for scikit-learn or TensorFlow)
class_weights_dict = dict(zip(np.unique(labels), class_weights))
print(class_weights_dict)

{np.str_('Anxiety'): np.float64(1.0), np.str_('Bipolar'): np.float64(1.0), np.str_('Depression'): np.float64(1.0), np.str_('Normal'): np.float64(1.0), np.str_('Personality disorder'): np.float64(1.0), np.str_('Stress'): np.float64(1.0), np.str_('Suicidal'): np.float64(1.0)}


# 2. Data Preprocessing and Splitting

## 2.1 Data Cleaning and Label Mapping

This section prepares the raw text data for training by cleaning and labeling.

Each text entry is mapped to a specific emotion label using a dictionary (label_map). These labels include seven categories: Normal, Depression, Suicidal, Anxiety, Bipolar, Stress, and Personality Disorder.


The data is then split into three sets:
*   Training Set (60%): For model learning.
*   Validation Set (20%): For model evaluation during training.
*   Test Set (20%): For final model performance evaluation.

In [None]:
from sklearn.model_selection import train_test_split

# Map labels to integers
label_map = {
    'Normal': 0,
    'Depression': 1,
    'Suicidal': 2,
    'Anxiety': 3,
    'Bipolar': 4,
    'Stress': 5,
    'Personality disorder': 6
}
df['label'] = df['status'].map(label_map)

# train 60%、val 20%、test 20%
train_val_df, test_df = train_test_split(df, test_size=0.2, random_state=42, stratify=df['label'])
train_df, val_df = train_test_split(train_val_df, test_size=0.25, random_state=42, stratify=train_val_df['label'])
print(f"Train: {train_df.shape}, Val: {val_df.shape}, Test: {test_df.shape}")

Train: (31608, 4), Val: (10536, 4), Test: (10537, 4)


## 2.2 Text Tokenization using XLM-RoBERTa:
This section uses the AutoTokenizer from Hugging Face to efficintly convert text into tokenized format. Tokenizer used is **XLM-RoBERTa**, **a multilingual pre-trained model** capable of handling text in over 100 languages.

In [None]:
from transformers import AutoTokenizer

# Load the  XLM-RoBERTa  tokenizer
tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')

# Tokenize the text
def tokenize_data(df):
    text_data = df['statement'].astype(str).tolist()
    return tokenizer(
        text_data,
        padding=True,    # Ensures that all tokenized text sequences are padded to the same length, improving batch processing efficiency.
        truncation=True,   #　Ensures that text sequences exceeding the maximum length (128 tokens) are truncated to prevent memory overflow.
        max_length=128,   # Limits each tokenized sequence to a maximum of 128 tokens, balancing performance and memory usage.
        return_tensors='pt' # Directly converts the tokenized output into PyTorch tensors, making it compatible with PyTorch training.
    )

train_encodings = tokenize_data(train_df)
val_encodings = tokenize_data(val_df)
test_encodings = tokenize_data(test_df)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

# 3. Custom Dataset Class and DataLoader

## 3.1 Building a Custom PyTorch Dataset Class
This section defines a custom PyTorch dataset class, SentimentDataset, which efficiently handles loading and processing of tokenized text data and their corresponding labels for model training.


Key Components:
*   __init__: Initializes the dataset with tokenized text data (encodings) and their corresponding labels.
*   __getitem__: Allows access to each data sample by index, ensuring that each sample is returned as a PyTorch tensor.
*   __len__: Returns the total number of samples in the dataset.

In [None]:
import torch

class SentimentDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

#datasets
train_dataset = SentimentDataset(train_encodings, train_df['label'].tolist())
val_dataset = SentimentDataset(val_encodings, val_df['label'].tolist())
test_dataset  = SentimentDataset(test_encodings,  test_df['label'].tolist())

#  4. Model Initialization and Configuration

## 4.1 Configuring XLM-RoBERTa Model

This section defines a custom configuration for the XLM-RoBERTa model using AutoConfig from Hugging Face. This step is to customize the model’s architecture and adaptat to specific task.

Configuration provides control over the model’s structure (e.g., number of labels, number of hidden layers), optimization (e.g., dropout rate, attention mechanism), and task adaptation (like sentiment analytic for mental health).

By adjusting these settings, the model is optimized for efficient performance, balancing between model complexity and computational cost.



In [None]:
from transformers import AutoModelForSequenceClassification, AutoConfig

# Create custom configuration
config = AutoConfig.from_pretrained('xlm-roberta-base')
config.num_labels = len(label_map)     # Sets the number of output labels to match the number of mental health categories 7.
config.hidden_dropout_prob = 0.3      # Increased dropout (default is usually 0.1)
config.attention_probs_dropout_prob = 0.3
config.num_hidden_layers = 8        # Reduces the number of hidden layers from the default 12 to 8, making the model more efficient.
config.layer_norm_eps = 1e-7        # Slightly stricter normalization, improving numerical stability.
config.output_attentions = True      # Allows the model to output attention weights for analysis.


# Initialize the model with custom config
model = AutoModelForSequenceClassification.from_pretrained(
    'xlm-roberta-base',
    config=config
)

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# 5. Training Setup and Custom Trainer

## 5.1 Setting Up Training Arguments
This section defines the training arguments using the TrainingArguments class from Hugging Face.

Training arguments are used to customize and control the training process without modifying the model code.

They ensure consistency in training runs and make it easy to adjust training settings without changing the core code.



> *What is the different between 4.1 and 5.1?*

> *Step 4.1 - Model Initialization: This step defines the model's architecture, structure, and internal behavior. It is where to **decide the model's design**, such as the number of labels, hidden layers, dropout rates, and attention mechanisms. It determines what the model is.*

> *Step 5.1 - Training Setup: This step configures how the model learns from data. It **defines the training process**, including the learning rate, batch size, number of epochs, and optimization strategies. It determines how the model is trained.*







In [None]:
from transformers import TrainingArguments
from sklearn.utils.class_weight import compute_class_weight
import numpy as np


#training arguments
training_args = TrainingArguments(
    output_dir='./results',           # Output directory
    eval_strategy="epoch",            # Evaluates the model at the end of each epoch.
    learning_rate=2e-5,             # Sets the learning rate for the optimizer.
    per_device_train_batch_size=16,       # Batch size for training
    per_device_eval_batch_size=4,        # Batch size for evaluation
    num_train_epochs=6,             # Number of training epochs
    weight_decay=0.01,              # Applies weight decay to prevent overfitting.
    save_strategy="epoch",            # Save model every epoch
    logging_dir='./logs',            # Directory for logs
    logging_steps=10,              # Log every 10 steps
    report_to="none",              # Disable wandb and other logging integrations
    disable_tqdm=False,             # Ensure tqdm progress bars are enabled (default)
    fp16=True,                  # Enables mixed-precision training (faster on compatible GPUs).
)

## 5.2 Creating a Custom Weighted Trainer
To handle class imbalance, this section defines a custom trainer
(*WeightedTrainer*) that applies class weights during training:


*   ***class weights*** are first calculated using compute_class_weight, making minority classes receive higer importance.


*   The **custom trainer** then applies these class weights through a weighted loss function in the compute_loss method, ensuring that minority classes receive higher importance and preventing the model from being biased towards majority classes.

> *Connection between Section 1.2: Section 1.2 identifies the imbalance in the dataset and calculates the necessary adjustments. Section 5.2 uses these adjustments to ensure the model pays more attention to minority classes.*

In [None]:
from transformers import Trainer
import torch
from torch import nn

# Compute class weights
class_weights = compute_class_weight(
    'balanced',
    classes=np.unique(train_df['label']),
    y=train_df['label']
)
class_weights = torch.tensor(class_weights, dtype=torch.float32)

# Custom Trainer with weighted loss
class WeightedTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        # Get the device of the model
        if isinstance(model, torch.nn.DataParallel):
            device = model.module.device  # Access the underlying model's device
        else:
            device = model.device  # Single-GPU case

        # Move inputs to the correct device
        inputs = {k: v.to(device) for k, v in inputs.items()}

        # Move class weights to the correct device
        class_weights_device = class_weights.to(device)

        # Get labels and outputs
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get('logits')

        # Compute loss with class weights
        loss_fct = nn.CrossEntropyLoss(weight=class_weights_device)
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))

        return (loss, outputs) if return_outputs else loss

# Initialize the Trainer
trainer = WeightedTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

# 6. Model Training and Evaluation

## 6.1 Training the Model

This section initiates the model training using the custom WeightedTrainer defined earlier.

The trainer.train() method automatically manages the training loop, including:


*   Loading batches of data from the training set.

*   Calculating loss using the weighted loss function (giving more importance to minority classes).


*   Performing backpropagation to update model weights.

*   Evaluating the model on the validation set after each epoch.

This training process is optimized with GPU acceleration, significantly reducing training time. In this case, the model is trained on a GPU, taking approximately 40 minutes.

In [None]:
trainer.train()

  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}


Epoch,Training Loss,Validation Loss
1,0.9939,0.875446
2,0.8052,0.799897
3,0.7449,0.816963
4,0.4406,0.678739
5,0.6643,0.654975
6,0.4409,0.657049


  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}


TrainOutput(global_step=11856, training_loss=0.7279585348932367, metrics={'train_runtime': 2861.276, 'train_samples_per_second': 66.281, 'train_steps_per_second': 4.144, 'total_flos': 8345796678586368.0, 'train_loss': 0.7279585348932367, 'epoch': 6.0})

In [None]:
results = trainer.evaluate()
print(results)

  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}


{'eval_loss': 0.6570487022399902, 'eval_runtime': 32.0822, 'eval_samples_per_second': 328.406, 'eval_steps_per_second': 82.102, 'epoch': 6.0}


## 6.2 Evaluating the Sentiment Analysis For Mental Health Model
This section evaluates the trained model using the test dataset, providing a clear view of the model's performance.

Key Evaluation Metrics:


*   Classification Report: Displays precision, recall, F1-score, and support for each class.


*   Confusion Matrix: Visualizes the correct and incorrect predictions for each class, making it easy to identify model strengths and weaknesses.

In [None]:
import torch
from torch.utils.data import DataLoader
from transformers import default_data_collator
from sklearn.metrics import classification_report, confusion_matrix

# Clear CUDA cache (optional)
torch.cuda.empty_cache()

# Setup DataLoader for testing: used to efficiently load test data in batches, ensuring fast evaluation.
test_loader = DataLoader(test_dataset, batch_size=4, collate_fn=default_data_collator, shuffle=False)

# Evaluate model
y_true, y_pred = [], []
model.eval()

for batch in test_loader:
    batch = {k: v.to(model.device) for k, v in batch.items()}
    with torch.no_grad():
        outputs = model(**batch)

    preds = torch.argmax(outputs.logits, dim=-1)
    y_true.extend(batch['labels'].cpu().numpy())
    y_pred.extend(preds.cpu().numpy())

# Classification report
print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=list(label_map.keys())))

print("\nConfusion Matrix:")
print(confusion_matrix(y_true, y_pred))

  item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}



Classification Report:
                      precision    recall  f1-score   support

              Normal       0.98      0.85      0.91      3269
          Depression       0.80      0.67      0.73      3081
            Suicidal       0.65      0.79      0.72      2131
             Anxiety       0.79      0.86      0.82       768
             Bipolar       0.80      0.85      0.82       556
              Stress       0.50      0.85      0.63       517
Personality disorder       0.59      0.69      0.64       215

            accuracy                           0.78     10537
           macro avg       0.73      0.80      0.75     10537
        weighted avg       0.81      0.78      0.79     10537


Confusion Matrix:
[[2775   73   82   60    9  240   30]
 [  23 2057  798   33   60   79   31]
 [  24  386 1684   12    5   16    4]
 [   4    6    2  662   22   58   14]
 [   1   17    0   23  474   23   18]
 [   3   10    6   38   11  442    7]
 [   4   10    3   10   14   25  149]]


For the first part of the model, the performance can be observed by this classification report. It shows that the model performs well overall, especially for the "Normal" class (F1-score: 0.91), but struggles with classes like "Stress" and "Personality disorder", which have lower precision and F1-scores. The confusion matrix reveals that many "Depression" and "Suicidal" cases are misclassified as each other, indicating overlap in their features. Although the model achieves a decent overall accuracy of 78%, it has difficulty distinguishing between certain mental health conditions.

## 6.3 Saving the Fine-Tuned Model
After training and evaluation, the **fine-tuned model and tokenizer are saved using the save_pretrained method from Hugging Face Transformers**, allowing them to be easily reloaded for future use without retraining, ensuring consistent tokenization and model behavior.

In [None]:
model.save_pretrained('./fine-tuned-xlmr')
tokenizer.save_pretrained('./fine-tuned-xlmr')

('./fine-tuned-xlmr/tokenizer_config.json',
 './fine-tuned-xlmr/special_tokens_map.json',
 './fine-tuned-xlmr/sentencepiece.bpe.model',
 './fine-tuned-xlmr/added_tokens.json',
 './fine-tuned-xlmr/tokenizer.json')



> *After building a robust model capable of detecting users' mental health states from text, we further enhanced its functionality by **integrating a supportive response generation system**. This addition transforms the model from a passive sentiment detector to an active, empathetic assistant, capable of not only identifying users' emotional states but also providing comforting, context-aware responses.*

> *This enhancement ensures that the model is not only an analytical tool but also a practical support system, offering personalized encouragement based on the detected emotions.*




# 7. Generating Supportive Responses with FLAN-T5


## 7.1 Adding Supportive Response Generation
This section **uses a pre-trained text-to-text generation model, FLAN-T5 (Large)** to create supportive, empathetic messages based on the detected emotion.

Here’s how it works:


1.   Model Selection (FLAN-T5):
 We use pipeline("text2text-generation", model="google/flan-t5-large") to quickly load the FLAN-T5 model, which is specifically designed for natural language generation tasks. The "large" version is chosen for its superior performance in generating coherent, context-aware text.

2.   Prompt Design: A pre-defined prompt template is used, which provides examples of how the model should respond.
  
  The generate_supportive_message function takes two inputs:
  * emotion: The detected emotional state (e.g., Depression, Suicidal).
  * user_text: The user’s input text expressing their feelings.









In [None]:
from transformers import pipeline

# use text2text-generation with FLAN-T5 model ( large version is perform better than base)
generator = pipeline("text2text-generation", model="google/flan-t5-large")

def generate_supportive_message(emotion, user_text):
    prompt = (
    "You are a supportive mental health assistant. Given what someone said and how they feel, write a short, kind message to comfort them.\n\n"
    "User: \"I feel hopeless.\" (emotion: Depression)\n"
    "Response: I'm really sorry you're feeling this way. You're not alone — there are people who care about you and want to help.\n\n"
    "User: \"I can't take it anymore.\" (emotion: Suicidal)\n"
    "Response: I'm truly sorry you're feeling overwhelmed. You're important, and it's okay to ask for help. You don't have to go through this alone.\n\n"
    f"User: \"{user_text}\" (emotion: {emotion})\n"
    "Response:"
    )
    result = generator(
    prompt,
    max_length=80,      # Limits the response length for clarity.
    num_return_sequences=1,
    temperature=0.8,     # Controls the creativity of the response (lower values make it more conservative).
    top_p=0.9,        # Enables nucleus sampling for better diversity.
    do_sample=True      # Ensures that the response is not deterministic (different each time).
    )
    return result[0]['generated_text'].strip()

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cuda:0




---



# Making Predictions with the Model

In [None]:
import torch
from transformers import XLMRobertaForSequenceClassification, XLMRobertaTokenizerFast
import pandas as pd
import torch.nn.functional as F

def load_model_and_tokenizer(model_path):
    """Load the saved model and tokenizer"""
    model = XLMRobertaForSequenceClassification.from_pretrained(model_path)
    tokenizer = XLMRobertaTokenizerFast.from_pretrained(model_path)
    return model, tokenizer

def predict_mental_health(text, model, tokenizer):
    """Make prediction for a single text input"""
    # Prepare the text
    inputs = tokenizer(
        text,
        padding=True,
        truncation=True,
        max_length=128,
        return_tensors="pt"
    )

    # Make prediction
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
        probabilities = F.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(probabilities, dim=-1).item()

    return predicted_class, probabilities[0]

def get_label_name(label_id):
    """Convert numerical label to string"""
    label_map = {
        0: 'Normal',
        1: 'Depression',
        2: 'Suicidal',
        3: 'Anxiety',
        4: 'Bipolar',
        5: 'Stress',
        6: 'Personality disorder'
    }
    return label_map.get(label_id, 'Unknown')

# Load model and tokenizer
print("Loading model and tokenizer...")
model, tokenizer = load_model_and_tokenizer('./fine-tuned-xlmr')
print("Model and tokenizer loaded successfully!")

# Test sentences representing different mental health states
test_sentences = [
    {
        "text": "Today was a great day! I enjoyed spending time with my friends and family.",
        "expected": "Normal"
    },
    {
        "text": "I've been feeling down for weeks now. Nothing brings me joy anymore.",
        "expected": "Depression"
    },
    {
        "text": "I can't handle this pain anymore. Everyone would be better off without me.",
        "expected": "Suicidal"
    },
    {
        "text": "My heart is racing and I can't breathe. What if something terrible happens?",
        "expected": "Anxiety"
    },
    {
        "text": "Yesterday I felt on top of the world, but today I can't even get out of bed.",
        "expected": "Bipolar"
    },
    {
        "text": "Work is overwhelming me. I can't sleep and my mind won't stop racing.",
        "expected": "Stress"
    },
    {
        "text": "Nobody understands me. My emotions are intense and I push everyone away.",
        "expected": "Personality disorder"
    },
    {
        "text": "Just finished a yoga session and feeling very peaceful and centered.",
        "expected": "Normal"
    },
    {
        "text": "My thoughts are spiraling and I can't focus on anything. Everything feels threatening.",
        "expected": "Anxiety"
    },
    {
        "text": "I feel empty inside. Nothing matters anymore.",
        "expected": "Depression"
    }
]

# Make predictions
print("\nAnalyzing sentences...\n")
print("=" * 100)
print(f"{'Text':<60} {'Expected':<15} {'Predicted':<15} {'Confidence'}")
print("=" * 100)

for test_case in test_sentences:
    text = test_case["text"]
    expected = test_case["expected"]

    # Get prediction
    predicted_class, probabilities = predict_mental_health(text, model, tokenizer)
    predicted_label = get_label_name(predicted_class)
    confidence = probabilities[predicted_class].item() * 100

    # Print results with truncated text if too long
    truncated_text = text[:57] + "..." if len(text) > 60 else text.ljust(60)
    print(f"{truncated_text} {expected:<15} {predicted_label:<15} {confidence:.1f}%")

print("=" * 100)

# Print detailed analysis of a few examples with high confidence
print("\nDetailed Analysis of High Confidence Predictions:")
print("-" * 50)

for test_case in test_sentences[:3]:  # Analyze first 3 cases
    text = test_case["text"]
    predicted_class, probabilities = predict_mental_health(text, model, tokenizer)

    print(f"\nText: {text}")
    print("Probability Distribution:")
    for i, prob in enumerate(probabilities):
        label = get_label_name(i)
        probability = prob.item() * 100
        print(f"{label:<20}: {probability:.1f}%")
    print("-" * 50)

Loading model and tokenizer...
Model and tokenizer loaded successfully!

Analyzing sentences...

Text                                                         Expected        Predicted       Confidence
Today was a great day! I enjoyed spending time with my fr... Normal          Normal          99.6%
I've been feeling down for weeks now. Nothing brings me j... Depression      Depression      52.9%
I can't handle this pain anymore. Everyone would be bette... Suicidal        Stress          96.1%
My heart is racing and I can't breathe. What if something... Anxiety         Anxiety         98.8%
Yesterday I felt on top of the world, but today I can't e... Bipolar         Stress          73.4%
Work is overwhelming me. I can't sleep and my mind won't ... Stress          Stress          96.7%
Nobody understands me. My emotions are intense and I push... Personality disorder Depression      70.0%
Just finished a yoga session and feeling very peaceful an... Normal          Anxiety         62.1%
My

In [None]:
# Test sentences representing different mental health states in multi language
test_sentences = [
    {
        "text": "今日は素晴らしい一日でした！友達や家族と楽しい時間を過ごしました。",  # Japanese
        "expected": "Normal"
    },
    {
        "text": "もう何週間も気分が落ち込んでいます。もう何も喜びを感じません。",  # Japanese
        "expected": "Depression"
    },
    {
        "text": "もうこの痛みに耐えられない。私がいなくなった方がみんなにとって良いのに。",  # Japanese
        "expected": "Suicidal"
    },
    {
        "text": "我的心跳加速，無法呼吸。如果發生可怕的事情怎麼辦？",  # Mandarin
        "expected": "Anxiety"
    },
    {
        "text": "昨天我覺得自己是世界上最幸福的人，但今天我無法起床。",  # Mandarin
        "expected": "Bipolar"
    },
    {
        "text": "工作讓我不堪負荷。我無法入睡，思緒也不停地飛轉。",  # Mandarin
        "expected": "Stress"
    },
    {
        "text": "沒人理解我。我的情緒很強烈，我會把所有人都推開。",  # Mandarin
        "expected": "Personality disorder"
    },
    {
        "text": "Acabo de terminar una sesión de yoga y me siento muy tranquilo y centrado.",  # Spanish
        "expected": "Normal"
    },
    {
        "text": "Mis pensamientos dan vueltas y no puedo concentrarme en nada. Todo parece amenazante.",  # Spanish
        "expected": "Anxiety"
    },
    {
        "text": "Me siento vacío por dentro. Ya nada importa.",  # Spanish
        "expected": "Depression"
    }
]

# Make predictions
print("\nAnalyzing sentences...\n")
print("=" * 100)
print(f"{'Text':<60} {'Expected':<15} {'Predicted':<15} {'Confidence'}")
print("=" * 100)

for test_case in test_sentences:
    text = test_case["text"]
    expected = test_case["expected"]

    # Get prediction
    predicted_class, probabilities = predict_mental_health(text, model, tokenizer)
    predicted_label = get_label_name(predicted_class)
    confidence = probabilities[predicted_class].item() * 100

    # Print results with truncated text if too long
    truncated_text = text[:57] + "..." if len(text) > 60 else text.ljust(60)
    print(f"{truncated_text} {expected:<15} {predicted_label:<15} {confidence:.1f}%")

print("=" * 100)

# Print detailed analysis of a few examples with high confidence
print("\nDetailed Analysis of High Confidence Predictions:")
print("-" * 50)

for test_case in test_sentences[:3]:  # Analyze first 3 cases
    text = test_case["text"]
    predicted_class, probabilities = predict_mental_health(text, model, tokenizer)

    print(f"\nText: {text}")
    print("Probability Distribution:")
    for i, prob in enumerate(probabilities):
        label = get_label_name(i)
        probability = prob.item() * 100
        print(f"{label:<20}: {probability:.1f}%")
    print("-" * 50)


Analyzing sentences...

Text                                                         Expected        Predicted       Confidence
今日は素晴らしい一日でした！友達や家族と楽しい時間を過ごしました。                            Normal          Normal          99.7%
もう何週間も気分が落ち込んでいます。もう何も喜びを感じません。                              Depression      Anxiety         99.3%
もうこの痛みに耐えられない。私がいなくなった方がみんなにとって良いのに。                         Suicidal        Suicidal        88.8%
我的心跳加速，無法呼吸。如果發生可怕的事情怎麼辦？                                    Anxiety         Anxiety         99.2%
昨天我覺得自己是世界上最幸福的人，但今天我無法起床。                                   Bipolar         Suicidal        42.7%
工作讓我不堪負荷。我無法入睡，思緒也不停地飛轉。                                     Stress          Stress          47.5%
沒人理解我。我的情緒很強烈，我會把所有人都推開。                                     Personality disorder Stress          42.1%
Acabo de terminar una sesión de yoga y me siento muy tran... Normal          Normal          48.7%
Mis pensamientos dan vueltas y no puedo concentrarme en n... Anxiety      

In [None]:
from transformers import pipeline

# use text2text-generation with FLAN-T5 model ( large version is perform better than base)
generator = pipeline("text2text-generation", model="google/flan-t5-large")

def generate_supportive_message(emotion, user_text):
    prompt = (
    "You are a supportive mental health assistant. Given what someone said and how they feel, write a short, kind message to comfort them.\n\n"
    "User: \"I feel hopeless.\" (emotion: Depression)\n"
    "Response: I'm really sorry you're feeling this way. You're not alone — there are people who care about you and want to help.\n\n"
    "User: \"I can't take it anymore.\" (emotion: Suicidal)\n"
    "Response: I'm truly sorry you're feeling overwhelmed. You're important, and it's okay to ask for help. You don't have to go through this alone.\n\n"
    f"User: \"{user_text}\" (emotion: {emotion})\n"
    "Response:"
    )
    result = generator(
    prompt,
    max_length=80,
    num_return_sequences=1,
    temperature=0.8,
    top_p=0.9,
    do_sample=True
    )
    return result[0]['generated_text'].strip()

Device set to use cuda:0


## Test 1
The model wouldn't answer the same sentences to the same context

In [None]:
# Test 1-1
user_input = "Work is overwhelming me. I can't sleep and my mind won't stop racing."
predicted_class, _ = predict_mental_health(user_input, model, tokenizer)
predicted_emotion = get_label_name(predicted_class)

response = generate_supportive_message(predicted_emotion, user_input)

print("User's sentiment: ", predicted_emotion)
print("Answer: ", response)

User's sentiment:  Stress
Answer:  I'm so sorry to hear that. You are not alone, and you have many others to talk to.


In [None]:
# Test 1-2
user_input = "Work is overwhelming me. I can't sleep and my mind won't stop racing."
predicted_class, _ = predict_mental_health(user_input, model, tokenizer)
predicted_emotion = get_label_name(predicted_class)

response = generate_supportive_message(predicted_emotion, user_input)

print("User's sentiment: ", predicted_emotion)
print("Answer: ", response)

User's sentiment:  Stress
Answer:  You don't have to be stressed out to work. It's normal to feel that way sometimes. It's normal to have to work hard to make ends meet.


## Test 2
The model could read multi-language context, but still can answer in English only.

In [None]:
# Test 2-1 English
user_input = "My heart is racing and I can't breathe. What if something terrible happens?"
predicted_class, _ = predict_mental_health(user_input, model, tokenizer)
predicted_emotion = get_label_name(predicted_class)

response = generate_supportive_message(predicted_emotion, user_input)

print("User's sentiment: ", predicted_emotion)
print("Answer: ", response)

User's sentiment:  Anxiety
Answer:  I understand that feeling. I know it's hard to imagine what might happen, but I know you're gonna do fine.


In [None]:
# Test 2-2 Mandarin
user_input = "我的心跳加速，無法呼吸。如果發生可怕的事情怎麼辦？"
predicted_class, _ = predict_mental_health(user_input, model, tokenizer)
predicted_emotion = get_label_name(predicted_class)

response = generate_supportive_message(predicted_emotion, user_input)

print("User's sentiment: ", predicted_emotion)
print("Answer: ", response)

User's sentiment:  Anxiety
Answer:  You don't have to be scared, you can just keep on moving.


In [None]:
# Test 2-3 Spanish
user_input = "Mi corazón late fuerte y no puedo respirar. ¿Y si pasa algo terrible?"
predicted_class, _ = predict_mental_health(user_input, model, tokenizer)
predicted_emotion = get_label_name(predicted_class)

response = generate_supportive_message(predicted_emotion, user_input)

print("User's sentiment: ", predicted_emotion)
print("Answer: ", response)

User's sentiment:  Anxiety
Answer:  I'm so sorry that you are having this anxiety attack. I hope you feel better soon.
