# BERT Model Fine-Tuning
Intent classification refers to the task of identifying the underlying intent behind a user’s query. In KusiBot, this is a critical function as it allows the system to respond appropriately to signs of mental distress. The intents identified by the system include “normal”, “depression”, “anxiety” and other relevant categories such as “suicidal”, “bipolar”, “stress” or “personality disorder”.

To achieve this, a pre-trained BERT model was fine-tuned to be specialised for this domain. This process was essential, as the base pre-trained model does not possess the ability to classify text into specific mental health categories.


## Data Preprocessing
Before any fine-tuning process, it was needed a curated dataset containing text statements, each labelled with one of the aforementioned intents. This dataset can be found in: [Mental_Health_Dataset](https://www.kaggle.com/code/rajtilak/mental-health-sentiment-analysis-nlp-ml#-Sentiment-Analysis-for-Mental-Health-)

The initial step involved a thorough data cleaning and preprocessing phase to prepare the text for the model. This included:
-	Removing any missing data (e.g. statements with no tagged status).
-	Standardising the text by converting it to lowercase.
-	Removing non-essential characters such as punctuation, numbers, and extra whitespaces.

Notebooks taken as References:
- [PreProcessing 1](https://www.kaggle.com/code/rajtilak/mental-health-sentiment-analysis-nlp-ml/notebook#-4.-Data-Preprocessing-)
- [PreProcessing 2](https://www.kaggle.com/code/muhammadfaizan65/sentiment-analysis-for-mental-health-nlp/notebook)


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import pandas as pd
import plotly.express as px
import numpy as np
import re, string, nltk, random, torch
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from textblob import TextBlob

class PreprocessData:
  """Preprocesses the dataset for sentiment
  analysis on mental health statements."""

  def __init__(self, df):
    nltk.download('stopwords')
    nltk.download('punkt')
    nltk.download('punkt_tab')
    self.df = df
    self.stop_words = set(stopwords.words('english'))

  def show_info(self):
    """Shows useful information about the dataset."""

    print("###################################")
    print("Dataset:")
    print(self.df.info())
    print("###################################")
    print("Missing Values:")
    print(self.df.isnull().sum())
    print("###################################")
    print("Duplicated Values:")
    print(self.df.duplicated().sum())
    print("###################################")
    print("Class Distribution:")
    print(self.df['status'].value_counts())
    print("###################################")

    # Distribution of status labels in piechart
    fig = px.pie(self.df, names='status', title='Proportion of Each Status Category')
    fig.show()

  def clean_statement(self, statement):
    """Cleans statements by removing especial characters, spaces, ..."""

    if isinstance(statement, str):
      statement = statement.lower()  # Lowercase statements
      statement = re.sub(r'\[.*?\]', '', statement)  # Remove statements in square brackets
      statement = re.sub(r'https?://\S+|www\.\S+', '', statement)  # Remove links
      statement = re.sub(r'<.*?>+', '', statement)  # Remove HTML tags
      statement = re.sub(r'[%s]' % re.escape(string.punctuation), '', statement)  # Remove punctuation
      statement = re.sub(r'\n', '', statement)  # Remove newlines
      statement = re.sub(r'\w*\d\w*', '', statement)  # Remove words containing numbers
      statement = re.sub(r'\s+', ' ', statement).strip() # Remove extra whitespace

      return statement
    return ""

  def remove_stopwords(self, statement):
    """Tokenize and Removes stopwords from statements."""

    tokens = word_tokenize(statement)
    tokens = [word for word in tokens if word not in self.stop_words]
    return ' '.join(tokens)

  def augment(self, statement):
    """Augments text by translating it to French and back to English.
    This is called BackTranslation."""

    try:
      blob = TextBlob(statement)
      translated = blob.translate(to='fr').translate(to='en')
      return str(translated)
    except Exception:
      return statement

  def eda(self):
    """Performs Exploratory Data Analysis on the dataset."""

    # Distribution of statement lengths
    self.df['statement_length'] = self.df['cleaned_statement'].apply(lambda x: len(x.split()))
    fig = px.histogram(self.df, x='statement_length', color='status', title='Distribution of Statement Lengths')
    fig.show()

    # Top 20 most common words
    words = ' '.join(self.df['cleaned_statement']).split()
    freq = pd.Series(words).value_counts()[:20]
    fig = px.bar(freq, x=freq.index, y=freq.values, title='Top 20 Most Common Words')
    fig.show()

    # Sentiment Analysis
    self.df['sentiment'] = self.df['cleaned_statement'].apply(lambda x: TextBlob(x).sentiment.polarity)
    fig = px.histogram(self.df, x='sentiment', color='status', title='Distribution of Sentiments')
    fig.show()

  def run(self):
    """Runs the preprocessing pipeline."""

    self.show_info()

    # Eliminate rows with NaN statement values
    self.df.dropna(subset=['statement'], inplace=True)

    # Add a new column with cleaned statements
    self.df['cleaned_statement'] = self.df['statement'].apply(self.clean_statement)

    # Tokenization and Stopwords Removal
    self.df['cleaned_statement'] = self.df['cleaned_statement'].apply(self.remove_stopwords)

    # Augmenting text
    self.df['augmented_statement'] = self.df['statement'].apply(self.augment)
    augmented_df = self.df[['statement', 'status']].copy()
    augmented_df['statement'] = self.df['augmented_statement']
    self.df = pd.concat([self.df, augmented_df])

    # Reapply preprocessing on augmented data
    self.df['cleaned_statement'] = self.df['statement'].apply(self.clean_statement)
    self.df['cleaned_statement'] = self.df['cleaned_statement'].apply(self.remove_stopwords)

    # Eliminate rows with NaN statement values
    self.df.dropna(subset=['cleaned_statement'], inplace=True)

    # After preprocessing, EDA
    self.eda()

    # Convert labels to numeric values (e.g. Depression: 0, Anxiety: 1, ...)
    label_mapping = {label: idx for idx, label in enumerate(self.df['status'].unique())}
    reverse_label_mapping = {idx: label for label, idx in label_mapping.items()}

    # Convert labels to numeric values
    self.df['status_id'] = self.df['status'].map(label_mapping)

    # Splitting data into features and target
    return self.df['cleaned_statement'], self.df['status_id'], label_mapping, reverse_label_mapping


################################################################################
# 1. Load, Explore and Preprocess Data.
################################################################################

# Set random seed for reproducibility
RANDOM_SEED = 42
def set_seed(seed_value):
    random.seed(seed_value)
    np.random.seed(seed_value)
    torch.manual_seed(seed_value)
    torch.cuda.manual_seed_all(seed_value)

set_seed(RANDOM_SEED)

# Loading Mental Health data from Drive
path = "/content/drive/MyDrive/MH_Datasets/mh_dataset_sentiment.csv"
df = pd.read_csv(path)

# Preprocessing Data
preprocessor = PreprocessData(df)
X, y, label_mapping, reverse_label_mapping = preprocessor.run()

print("LOG - Data preprocessing complete!")

print("#### Data for BERT training")
print(X)
print("#### Tags for BERT training")
print(y)
print("#### Label Mapping")
print(label_mapping)
print("#### Reverse Label Mapping")
print(reverse_label_mapping)


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


###################################
Dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53043 entries, 0 to 53042
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  53043 non-null  int64 
 1   statement   52681 non-null  object
 2   status      53043 non-null  object
dtypes: int64(1), object(2)
memory usage: 1.2+ MB
None
###################################
Missing Values:
Unnamed: 0      0
statement     362
status          0
dtype: int64
###################################
Duplicated Values:
0
###################################
Class Distribution:
status
Normal                  16351
Depression              15404
Suicidal                10653
Anxiety                  3888
Bipolar                  2877
Stress                   2669
Personality disorder     1201
Name: count, dtype: int64
###################################


LOG - Data preprocessing complete!
#### Data for BERT training
0                                                  oh gosh
1        trouble sleeping confused mind restless heart ...
2        wrong back dear forward doubt stay restless re...
3        ive shifted focus something else im still worried
4                      im restless restless month boy mean
                               ...                        
53038    nobody takes seriously ’ dealt depressionanxie...
53039    selfishness dont feel good like dont belong wo...
53040    way sleep better cant sleep nights meds didnt ...
53041    public speaking tips hi give presentation work...
53042    really bad door anxiety scared didnt lock door...
Name: cleaned_statement, Length: 105362, dtype: object
#### Tags for BERT training
0        0
1        0
2        0
3        0
4        0
        ..
53038    0
53039    0
53040    0
53041    0
53042    0
Name: status_id, Length: 105362, dtype: int64
#### Label Mapping
{'Anxiety': 0, 'Nor

## Data Processing for BERT Training
BERT models require input data to be prepared and structured in a way that it can be understood by them. Therefore, the pre-processed data underwent a second transformation, which included:
-	Tokenization: Using the BertTokenizer module (from transformers), the text was broken down into tokens that align with the model's vocabulary. For example, a word like "ChatGPT" might become [“Chat”, “##G”, “##PT”].
-	Special Tokens: Special tokens like “[CLS]” and “[SEP]” were added to each sample.
-	Padding and Truncation: Each sample were padded or truncated to a uniform length to be processed in batches. In this case, the maximum length for a sample was 128.
After this process, the dataset was split into three parts: one for training and the other two for validation and testing purposes.

In [3]:
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from transformers import BertTokenizer
from sklearn.model_selection import train_test_split

BERT_MODEL = 'bert-base-uncased'
TEXT_MAX_LEN = 128
DATALOADER_BATCH_SIZE = 32

class BERTDataPreparation:
    """Prepares data for BERT model by tokenizing and creating dataloaders."""

    def __init__(self, X, y, label_mapping, reverse_label_mapping,
                 max_length=TEXT_MAX_LEN, batch_size=DATALOADER_BATCH_SIZE):
        self.X = X
        self.y = y
        self.label_mapping = label_mapping
        self.reverse_label_mapping = reverse_label_mapping
        self.max_length = max_length
        self.batch_size = batch_size
        self.tokenizer = BertTokenizer.from_pretrained(BERT_MODEL)

    def tokenize_data(self):
        """Tokenizes text data for BERT training."""
        input_ids = []
        attention_masks = []

        # Encode plus from HF: Tokenize + adds BERT special chars [CLS] and [SEP]
        # + padding or truncation at max_len
        for text in self.X:
            encoded_dict = self.tokenizer.encode_plus(
                text,
                add_special_tokens=True,
                max_length=self.max_length,
                padding='max_length',
                truncation=True,
                return_attention_mask=True,
                return_tensors='pt'
            )

            input_ids.append(encoded_dict['input_ids'])
            attention_masks.append(encoded_dict['attention_mask'])

        # Convert lists to tensors
        input_ids = torch.cat(input_ids, dim=0)
        attention_masks = torch.cat(attention_masks, dim=0)
        labels = torch.tensor(self.y.values)

        return input_ids, attention_masks, labels

    def create_dataloaders(self, input_ids, attention_masks, labels, val_ratio=0.1, test_ratio=0.1):
        """Creates train, validation, and test dataloaders for BERT."""

        # First split: separate test set
        train_inputs, test_inputs, train_masks, test_masks, train_labels, test_labels = train_test_split(
            input_ids, attention_masks, labels,
            test_size=test_ratio,
            random_state=RANDOM_SEED,
            stratify=labels
        )

        # Second split: separate validation set from training set
        train_inputs, val_inputs, train_masks, val_masks, train_labels, val_labels = train_test_split(
            train_inputs, train_masks, train_labels,
            test_size=val_ratio/(1-test_ratio),
            random_state=RANDOM_SEED,
            stratify=train_labels
        )

        # Create DataLoaders to finetune BERT
        train_data = TensorDataset(train_inputs, train_masks, train_labels)
        train_sampler = RandomSampler(train_data)
        train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=self.batch_size)

        val_data = TensorDataset(val_inputs, val_masks, val_labels)
        val_sampler = SequentialSampler(val_data)
        val_dataloader = DataLoader(val_data, sampler=val_sampler, batch_size=self.batch_size)

        test_data = TensorDataset(test_inputs, test_masks, test_labels)
        test_sampler = SequentialSampler(test_data)
        test_dataloader = DataLoader(test_data, sampler=test_sampler, batch_size=self.batch_size)

        return train_dataloader, val_dataloader, test_dataloader

    def run(self):
        """Runs the BERT data preparation pipeline."""

        # Tokenize data
        input_ids, attention_masks, labels = self.tokenize_data()

        # Create dataloaders
        train_dataloader, val_dataloader, test_dataloader = self.create_dataloaders(
            input_ids, attention_masks, labels
        )

        # Print information about data splits
        print(f"\nData split information:")
        print(f"Training samples: {len(train_dataloader.dataset)}")
        print(f"Validation samples: {len(val_dataloader.dataset)}")
        print(f"Test samples: {len(test_dataloader.dataset)}")

        return train_dataloader, val_dataloader, test_dataloader

################################################################################
# 2. Prepare data for BERT training
################################################################################

bert_prep = BERTDataPreparation(X, y, label_mapping, reverse_label_mapping)
train_dataloader, val_dataloader, test_dataloader = bert_prep.run()

print("Data preparation complete. Ready for BERT model training!")



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.



tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]


Data split information:
Training samples: 84288
Validation samples: 10537
Test samples: 10537
Data preparation complete. Ready for BERT model training!


## BERT Model Training for Sentiment Analysis
Once the data was processed, the fine-tuning was performed by training the bert-base-uncased model for five epochs. For each of this epochs, the model was trained on the training split, then validated to find its accuracy and finally choosing the trained model if it improves this metric.

### Model Training Class
Training class for the BERT model. Parameters that could be studied to find the best model:
- Learning rate (2e-5)
- Number of epochs (5)
- Optimiser (Adam)
- Scheduler (Linear Regression - LR)

In [None]:
from transformers import BertForSequenceClassification, get_linear_schedule_with_warmup
from sklearn.metrics import accuracy_score, classification_report
import os

# Model CONSTANTS
LEARNING_RATE = 2e-5
NUM_EPOCHS = 5

class BERTIntentClassifier:
  """BERT-based intent classification model training"""

  def __init__(self, num_labels, model_name=BERT_MODEL):

    # Try using GPU for training
    self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {self.device}")

    # Load pre-trained model
    self.model = BertForSequenceClassification.from_pretrained(model_name,
                                                      num_labels=num_labels,
                                                      output_attentions=False,
                                                      output_hidden_states=False)
    # Setting model to GPU | CPU
    self.model.to(self.device)

  def setup(self, len_train_dataloader):
    """
    Function in charge of setting up Optimizer and Scheduler for training
    """

    # Setting up the Optimizer (learning rate and Adam)
    self.optimizer = torch.optim.AdamW(self.model.parameters(), lr=LEARNING_RATE, eps=1e-8)

    # Setting the LR Scheduler (it will modify learning rate after each iteration
    # that speed-up the convergence. using it is optional but recommended)
    total_steps = len_train_dataloader * NUM_EPOCHS # Total number of training steps
    self.scheduler = get_linear_schedule_with_warmup(self.optimizer,
                                            num_warmup_steps=0,
                                            num_training_steps=total_steps)

  def train_epoch(self, data_loader):
    """
    Function in charge of training one epoch of the model
    Returns avg train loss for an epoch
    """

    self.model.train()
    total_loss = 0

    for batch in data_loader:
        # Clear gradients
        self.optimizer.zero_grad()

        # Get inputs
        input_ids = batch[0].to(self.device) # input-ids
        attention_mask = batch[1].to(self.device) # attention-mask
        labels = batch[2].to(self.device) # labels

        # Forward propagation
        predictions = self.model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )

        loss = predictions.loss
        total_loss += loss.item()

        # Backward propagation
        loss.backward()

        # Clip gradients: To avoid gradient explotion
        torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)

        # Update weights
        self.optimizer.step()

        # Update learning rate
        self.scheduler.step()

    return total_loss / len(data_loader)

  def evaluate(self, data_loader, label_mapping):
    """
    Defining function in charge of evaluating performance for every epoch of the model
    Calculates the accuracy and generates a report
    """

    self.model.eval()

    predictions = []
    actual_labels = []
    total_loss = 0

    with torch.no_grad(): # calculation of gradient not required on evaluation
        for batch in data_loader:

            # Get inputs
            input_ids = batch[0].to(self.device) # input-ids
            attention_mask = batch[1].to(self.device) # attention-mask
            labels = batch[2].to(self.device) # labels

            # Forward propagation
            output = self.model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                labels=labels
            )
            loss = output.loss
            total_loss += loss.item()

            # Get predictions
            _, preds = torch.max(output.logits, dim=1)

            # Move predictions and labels to CPU
            predictions.extend(preds.cpu().tolist())
            actual_labels.extend(labels.cpu().tolist())

    # Calculate accuracy
    accuracy = accuracy_score(actual_labels, predictions)

    # Generate classification report
    report = classification_report(
        actual_labels,
        predictions,
        target_names=list(label_mapping.keys()),
        output_dict=True
    )

    return total_loss / len(data_loader), accuracy, report, predictions, actual_labels

  def train(self, train_dataloader, val_dataloader, label_mapping):

    self.setup(len(train_dataloader)) # Optimizer & Scheduler

    best_val_accuracy = 0

    # Training loop
    for epoch in range(NUM_EPOCHS):
        print(f"\nEpoch {epoch+1}/{NUM_EPOCHS}")

        # Train
        train_loss = self.train_epoch(train_dataloader)
        print(f"Average Training loss: {train_loss:.4f}")

        # Validate
        val_loss, val_accuracy, val_report, _, _ = self.evaluate(val_dataloader,
                                                                 label_mapping)
        print(f"Validation loss: {val_loss:.4f}, accuracy: {val_accuracy:.4f}")

        # Print validation report for key metrics
        print("\nValidation Report:")
        for label, metrics in val_report.items():
            if label in ['macro avg', 'weighted avg']:
                print(f"{label}: Precision={metrics['precision']:.4f}, \
                Recall={metrics['recall']:.4f}, F1={metrics['f1-score']:.4f}")

        # Save the best model
        if val_accuracy > best_val_accuracy:
            best_val_accuracy = val_accuracy
            print(f"New best model saved with validation accuracy: \
              {best_val_accuracy:.4f}")
            self.save_model()

    print(f"\nTraining complete! Best validation accuracy: {best_val_accuracy:.4f}")
    return best_val_accuracy

  def save_model(self, model_dir=None):

    """Save the model for later use to the given directory."""
    try:
        from google.colab import drive
        drive_mounted = True
    except ImportError:
        drive_mounted = False

    if drive_mounted:
        drive_path = "/content/drive/MyDrive/intent_classifier_model_v2"
        if not os.path.exists(drive_path):
            os.makedirs(drive_path)
        self.model.save_pretrained(drive_path)
        print(f"Model saved to Google Drive at {drive_path}")

  def save_as_onnx(self, onnx_path, model_dir=None):
    """Export model to ONNX format for deployment."""

    # Create dummy input for tracing
    dummy_input_ids = torch.ones(1, 128, dtype=torch.long).to(self.device)
    dummy_attention_mask = torch.ones(1, 128, dtype=torch.long).to(self.device)
    dummy_input = (dummy_input_ids, dummy_attention_mask)

    # Export to ONNX
    torch.onnx.export(
        self.model,
        dummy_input,
        onnx_path,
        export_params=True,
        opset_version=14,
        do_constant_folding=True,
        input_names=['input_ids', 'attention_mask'],
        output_names=['output'],
        dynamic_axes={
            'input_ids': {0: 'batch_size'},
            'attention_mask': {0: 'batch_size'},
            'output': {0: 'batch_size'},
        }
    )
    print(f"Model exported to ONNX at {onnx_path}")

    # Save tokenizer info along with model
    tokenizer_path = os.path.join(model_dir, "tokenizer_info.txt")
    with open(tokenizer_path, "w") as f:
        f.write("bert-base-uncased")
    print(f"Tokenizer info saved to {tokenizer_path}")

  def load_model(self, model_dir):
    """Load a saved model."""
    self.model = BertForSequenceClassification.from_pretrained(model_dir)
    self.model.to(self.device)
    print(f"Model loaded from {model_dir}")



### Model Training
Code that trains and saves the best trained model into Drive folder.

In [None]:
# Creating BERT model for Sentiment Analysis classification
intent_classifier = BERTIntentClassifier(num_labels=len(label_mapping))

# Training the model
best_val_accuracy = intent_classifier.train(train_dataloader, val_dataloader, label_mapping)

# Save the model to Google Drive
intent_classifier.save_model("/content/drive/MyDrive/intent_classifier_model_v2")

Using device: cuda


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Epoch 1/5
Average Training loss: 0.5872
Validation loss: 0.4195, accuracy: 0.8338

Validation Report:
macro avg: Precision=0.8348, Recall=0.8000, F1=0.8144
weighted avg: Precision=0.8354, Recall=0.8338, F1=0.8312
New best model saved with validation accuracy: 0.8338
Model saved to Google Drive at /content/drive/MyDrive/intent_classifier_model_v2

Epoch 2/5
Average Training loss: 0.3303
Validation loss: 0.3083, accuracy: 0.8836

Validation Report:
macro avg: Precision=0.8973, Recall=0.8721, F1=0.8830
weighted avg: Precision=0.8859, Recall=0.8836, F1=0.8839
New best model saved with validation accuracy: 0.8836
Model saved to Google Drive at /content/drive/MyDrive/intent_classifier_model_v2

Epoch 3/5
Average Training loss: 0.1899
Validation loss: 0.2449, accuracy: 0.9222

Validation Report:
macro avg: Precision=0.9135, Recall=0.9269, F1=0.9200
weighted avg: Precision=0.9224, Recall=0.9222, F1=0.9222
New best model saved with validation accuracy: 0.9222
Model saved to Google Drive at /co

UnsupportedOperatorError: Exporting the operator 'aten::scaled_dot_product_attention' to ONNX opset version 11 is not supported. Support for this operator was added in version 14, try exporting with this version.

In [None]:
!pip install onnx

# Load your saved model
loaded_classifier = BERTIntentClassifier(
    num_labels=len(label_mapping)
)
loaded_classifier.load_model("/content/drive/MyDrive/intent_classifier_model_v2")

# Export to ONNX with the fixed method
loaded_classifier.save_as_onnx(
    "/content/drive/MyDrive/intent_classifier_model_v2/model.onnx",  # onnx_path
    "/content/drive/MyDrive/intent_classifier_model_v2"              # model_dir
)

# Also save the label mapping for later use
import json
with open("/content/drive/MyDrive/intent_classifier_model_v2/label_mapping.json", "w") as f:
    json.dump(label_mapping, f)


Collecting onnx
  Downloading onnx-1.17.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Downloading onnx-1.17.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.0/16.0 MB[0m [31m99.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: onnx
Successfully installed onnx-1.17.0
Using device: cuda


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model loaded from /content/drive/MyDrive/intent_classifier_model_v2
Model exported to ONNX at /content/drive/MyDrive/intent_classifier_model_v2/model.onnx
Tokenizer info saved to /content/drive/MyDrive/intent_classifier_model_v2/tokenizer_info.txt


## Test Evaluation of BEST Model

After the training was complete, the model's performance was evaluated using the testing split. This process is essential when training models as it can be verified if the model can generalise its knowledge to new data.

**BEST MODEL STATS**

Epoch 5/5
Average Training loss: 0.0625
Validation loss: 0.2497, accuracy: 0.9473

Validation Report:
macro avg: Precision=0.9521, Recall=0.9464, F1=0.9492
weighted avg: Precision=0.9474, Recall=0.9473, F1=0.9473
New best model saved with validation accuracy: 0.9473

Training complete! Best validation accuracy: 0.9473

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report

# Load your saved model
loaded_classifier = BERTIntentClassifier(
    num_labels=len(label_mapping)
)
loaded_classifier.load_model("/content/drive/MyDrive/intent_classifier_model_v2")

# Evaluate on Test Data
test_loss, test_accuracy, test_report, test_predictions, test_actual = loaded_classifier.evaluate(test_dataloader, label_mapping)

print(f"\nTest Results:")
print(f"Loss: {test_loss:.4f}")
print(f"Accuracy: {test_accuracy:.4f}")

# Print detailed classification report
print("\nClassification Report:")
print(classification_report(test_actual, test_predictions, target_names=list(label_mapping.keys())))

# Generate confusion matrix
cm = confusion_matrix(test_actual, test_predictions)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=list(label_mapping.keys()),
            yticklabels=list(label_mapping.keys()))
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.tight_layout()
plt.savefig('/content/drive/MyDrive/intent_classifier_model_v2/confusion_matrix.png')
plt.close()

Using device: cuda


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model loaded from /content/drive/MyDrive/intent_classifier_model_v2

Test Results:
Loss: 0.2397
Accuracy: 0.9498

Classification Report:
                      precision    recall  f1-score   support

             Anxiety       0.97      0.97      0.97       768
              Normal       0.98      0.99      0.99      3269
          Depression       0.93      0.93      0.93      3081
            Suicidal       0.91      0.92      0.91      2131
              Stress       0.95      0.94      0.94       517
             Bipolar       0.97      0.97      0.97       556
Personality disorder       0.95      0.93      0.94       215

            accuracy                           0.95     10537
           macro avg       0.95      0.95      0.95     10537
        weighted avg       0.95      0.95      0.95     10537



## Inference with BERT Model

Checking how the BERT model performs manually.

In [None]:
#########################################
# 8. Inference Function for New Text
#########################################
def clean_statement(statement):
    """Cleans statements by removing especial characters, spaces, ..."""

    if isinstance(statement, str):
      statement = statement.lower()  # Lowercase statements
      statement = re.sub(r'\[.*?\]', '', statement)  # Remove statements in square brackets
      statement = re.sub(r'https?://\S+|www\.\S+', '', statement)  # Remove links
      statement = re.sub(r'<.*?>+', '', statement)  # Remove HTML tags
      statement = re.sub(r'[%s]' % re.escape(string.punctuation), '', statement)  # Remove punctuation
      statement = re.sub(r'\n', '', statement)  # Remove newlines
      statement = re.sub(r'\w*\d\w*', '', statement)  # Remove words containing numbers
      statement = re.sub(r'\s+', ' ', statement).strip() # Remove extra whitespace

      return statement
    return ""


def predict_mental_health_status(text, model, tokenizer, device, label_mapping):
    """
    Predict the mental health status for a given text input.

    Args:
        text (str): Input text
        model: Trained BERT model
        tokenizer: BERT tokenizer
        device: Device to run inference on
        label_mapping: Mapping from label indices to label names

    Returns:
        dict: Predicted label and confidence scores
    """
    # Clean the text
    cleaned_text = clean_statement(text)

    # Tokenize
    encoding = tokenizer.encode_plus(
        cleaned_text,
        add_special_tokens=True,
        max_length=TEXT_MAX_LEN,
        return_token_type_ids=False,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors='pt'
    )

    # Prepare input tensors
    input_ids = encoding['input_ids'].to(device)
    attention_mask = encoding['attention_mask'].to(device)

    # Set model to evaluation mode
    model.eval()

    # Get prediction
    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs.logits

        # Convert logits to probabilities
        probabilities = torch.nn.functional.softmax(logits, dim=1)

        # Get the predicted class
        _, predicted_class = torch.max(probabilities, dim=1)
        predicted_label = reverse_label_mapping[predicted_class.item()]

        # Get confidence scores for all classes
        confidence_scores = {
            reverse_label_mapping[i]: prob.item()
            for i, prob in enumerate(probabilities[0])
        }

    result = {
        'predicted_label': predicted_label,
        'confidence': confidence_scores[predicted_label],
        'confidence_scores': confidence_scores
    }

    return result

#########################################
# 9. Testing with Example Inputs
#########################################

# Test examples
test_examples = [
    "I've been feeling really down lately and can't seem to enjoy anything anymore.",
    "I'm worried all the time and can't stop thinking about what might go wrong.",
    "Sometimes I feel really energetic and other times I can barely get out of bed.",
    "I had a good day at work today, things are going well.",
    "I keep having these thoughts that I need to check if I locked the door multiple times."
]

# Make predictions
print("\nTesting with example inputs:")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

for text in test_examples:
    prediction = predict_mental_health_status(
        text,
        loaded_classifier.model,
        tokenizer,
        device,
        reverse_label_mapping
    )

    print(f"\nInput: {text}")
    print(f"Predicted label: {prediction['predicted_label']}")
    print(f"Confidence: {prediction['confidence']:.4f}")

    # Print top 3 confidence scores
    sorted_scores = sorted(
        prediction['confidence_scores'].items(),
        key=lambda x: x[1],
        reverse=True
    )[:3]

    print("Top 3 predictions:")
    for label, score in sorted_scores:
        print(f"  {label}: {score:.4f}")


Testing with example inputs:

Input: I've been feeling really down lately and can't seem to enjoy anything anymore.
Predicted label: Depression
Confidence: 0.6909
Top 3 predictions:
  Depression: 0.6909
  Normal: 0.2474
  Suicidal: 0.0544

Input: I'm worried all the time and can't stop thinking about what might go wrong.
Predicted label: Anxiety
Confidence: 0.9992
Top 3 predictions:
  Anxiety: 0.9992
  Normal: 0.0003
  Depression: 0.0002

Input: Sometimes I feel really energetic and other times I can barely get out of bed.
Predicted label: Normal
Confidence: 0.9990
Top 3 predictions:
  Normal: 0.9990
  Stress: 0.0004
  Depression: 0.0003

Input: I had a good day at work today, things are going well.
Predicted label: Normal
Confidence: 0.9998
Top 3 predictions:
  Normal: 0.9998
  Bipolar: 0.0000
  Suicidal: 0.0000

Input: I keep having these thoughts that I need to check if I locked the door multiple times.
Predicted label: Anxiety
Confidence: 0.9668
Top 3 predictions:
  Anxiety: 0.966