# Installing important libraries

In [None]:
!pip install transformers datasets torch sklearn -q

  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m√ó[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m‚îÇ[0m exit code: [1;36m1[0m
  [31m‚ï∞‚îÄ>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... [?25l[?25herror
[1;31merror[0m: [1mmetadata-generation-failed[0m

[31m√ó[0m Encountered error while generating package metadata.
[31m‚ï∞‚îÄ>[0m See above for output.

[1;35mnote[0m: This is an issue with the package mentioned above, not pip.
[1;36mhint[0m: See above for details.


I started by installing the core libraries needed to work with `transformer-based` models, especially since I‚Äôll be using MentalBERT later on for sentiment analysis. The transformers library gives access to a wide range of pre-trained NLP models and tools for tokenization, fine-tuning, and text generation. It‚Äôs basically the foundation that allows me to load and use advanced models without having to train them from scratch.

The `datasets` library helps in handling large text datasets more efficiently. It makes it easy to load, split, and preprocess data, which is really useful when preparing text for BERT-based models. Meanwhile, I installed `torch` because it‚Äôs the deep learning framework that powers these transformer models  it handles all the computations that happen behind the scenes during model training and prediction.

Lastly, I added `sklearn` since it includes tools for model evaluation and preprocessing that I‚Äôll still use alongside the transformer model. Adding the `-q` flag just runs the installation quietly, so it doesn‚Äôt flood the notebook with too much text output. Overall, this setup ensures that everything I need for deep learning and NLP is ready before I start working with MentalBERT.


In [None]:
from huggingface_hub import login
login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

Before using **MentalBERT**, I needed to log in to the Hugging Face Hub since this model requires an API token for access. Hugging Face hosts a lot of pre-trained models, including MentalBERT, and it uses authentication to manage who can download and use them. To get this token, I created a free Hugging Face account, went to my profile settings, and generated an access token from the ‚ÄúAccess Tokens‚Äù section.

Once I had the token, I imported the login function from `huggingface_hub` and ran `login()`. This command opens a prompt where I entered my token, which then authenticates my session with the Hugging Face Hub. After logging in, I could securely load the MentalBERT model and its tokenizer without any permission issues.

# Token Code : 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

In [None]:
!pip install -U transformers




I used this command to make sure the Transformers library is updated to its latest version. By adding the `-U` flag, it upgrades any older installation instead of just reinstalling the same one. This is important because newer versions of Transformers often include performance improvements, bug fixes, and updated model compatibility  especially when working with models like **MentalBERT** that rely on recent architecture updates.

Keeping Transformers up to date ensures that I can use all the latest functions and tokenizer features without running into version conflicts. It also makes sure the library works smoothly with other dependencies like PyTorch and datasets.

In [None]:
pip install torch transformers scikit-learn pandas




In [None]:
import transformers
print(transformers.__version__)


4.57.1


# WAN B API CODE : 95d47de64d7c30ab73ce317e099af2fb8cb0a24f

# Explanation for Random Search and Grid Search

In [None]:
import pandas as pd
import torch
import random
import numpy as np
from torch.utils.data import Dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    classification_report,
    accuracy_score,
    precision_recall_fscore_support
)


I started by importing all the main libraries needed for preparing, training, and evaluating the MentalBERT model. The `pandas` library is used for handling and exploring the dataset, especially for reading CSV files and managing text data efficiently. Then, I imported `torch` and `Dataset` from PyTorch since Transformers are built on top of it. The `Dataset` class helps convert my text data into a format that can be fed directly into the model during training.

Next, I brought in tools from scikit-learn to help with data splitting and evaluation. `train_test_split` divides the dataset into training and testing portions, while `accuracy_score` and `precision_recall_fscore_support` will later help measure how well the model performs on the test data. These metrics give a more complete view of performance, especially for a binary classification task like detecting suicidal vs. normal statements.

Finally, I imported everything from Transformers that‚Äôs specific to using MentalBERT. `AutoTokenizer` automatically handles tokenization based on the model we choose, converting text into the numerical format BERT understands. `AutoModelForSequenceClassification` loads the actual pre-trained model designed for classification tasks. The `Trainer` and `TrainingArguments` handle the training process  they make it easier to define how the model trains, tracks progress, and saves checkpoints. Together, these imports set up everything I need to fine-tune MentalBERT efficiently.


In [None]:
# 1Ô∏è‚É£ Load dataset
df = pd.read_csv("Cleaned_Combined_Data.csv")

TEXT_COL = "statement"
LABEL_COL = "status"

I loaded my cleaned dataset using pandas with the `read_csv() `function. This command reads the file named ***‚ÄúCleaned_Combined_Data.csv‚Äù*** and stores it in a DataFrame called `df`, making it easy to view, filter, and process the data later. This dataset contains the text statements and their corresponding mental health labels that I‚Äôll use to train and test the MentalBERT model.

I created two variables, `TEXT_COL and LABEL_COL`, to clearly define which parts of the dataset I‚Äôll be working with. The `TEXT_COL` is set to "statement", which holds all the written texts or posts that will be analyzed by the model. The `LABEL_COL` is set to "status", which contains the actual categories or mental health conditions  in this case, ‚ÄúNormal‚Äù and ‚ÄúSuicidal.‚Äù
**bold text**
Doing this makes my code more organized and easier to maintain. Instead of repeating the column names throughout the code, I can just refer to these variables whenever I need to access the text or the label columns. It‚Äôs a simple step, but it helps make the workflow cleaner and less prone to errors, especially when adjusting or reusing the code later on.


In [None]:
# Encode string labels to integer IDs
df[LABEL_COL] = df[LABEL_COL].astype('category')
df['label_id'] = df[LABEL_COL].cat.codes
label_mapping = dict(enumerate(df[LABEL_COL].cat.categories))
print("‚úÖ Label mapping:", label_mapping)

I converted the string labels in my dataset into numeric values so that the model can understand them. First, I changed the `status` column into a categorical type using `astype('category')`, which helps pandas recognize it as a set of fixed categories instead of plain text. Then, I created a new column called `label_id` using `cat.codes,` which automatically assigns an integer to each category  for example, ‚ÄúNormal‚Äù becomes 0 and ‚ÄúSuicidal‚Äù becomes 1.

I also created a `label_mapping` dictionary to keep track of which number corresponds to which label. This is helpful later when I interpret the model‚Äôs predictions and need to translate the results back to readable text. Printing the label mapping confirms that everything was encoded correctly before moving forward with model training.

In [None]:
# Split dataset
train_texts, val_texts, train_labels, val_labels = train_test_split(
    df[TEXT_COL].tolist(),
    df['label_id'].tolist(),
    test_size=0.2,
    random_state=42
)

I split my dataset into training and validation sets using `train_test_split()`. Here, I took the text data from the `statement` column and the numeric labels from the `label_id column`, then separated them into 80% for training and 20% for validation. The `test_size=0.2` means that one-fifth of the data will be used later to check how well the model performs on unseen examples.

I also set `random_state=42` to make sure the split stays consistent every time I run the code. This helps with reproducibility, meaning I‚Äôll always get the same training and validation sets across runs. Doing this step ensures that the MentalBERT model will learn from one portion of the data while being tested fairly on another.

In [None]:
model_name = "mental/mental-bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

I set `model_name` to `"mental/mental-bert-base-uncased"` since that‚Äôs the exact name of the pre-trained **MentalBERT** model I‚Äôll be using from Hugging Face. This model is specifically designed for analyzing mental health‚Äìrelated text, which fits perfectly with my project‚Äôs goal of identifying suicidal and normal statements.

After that, I loaded the tokenizer using `AutoTokenizer.from_pretrained(model_name)`. The tokenizer is what converts raw text into tokens  basically breaking the text into smaller pieces and turning them into numerical IDs that the model can understand. Using the same tokenizer that comes with the model ensures that the text is processed in the exact way MentalBERT was originally trained.

In [None]:
class SentimentDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len=64):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = int(self.labels[idx])
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding="max_length",
            max_length=self.max_len,
            return_tensors='pt'
        )
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

I created a custom dataset class called `SentimentDataset`, which inherits from **PyTorch‚Äôs** built-in Dataset class. This lets me organize my text and label data in a way that‚Äôs compatible with the MentalBERT model during training. Inside the `__init__ `method, I passed in the text data, the corresponding labels, the tokenizer, and a `max_len` parameter (which I set to 64). The `max_len` value limits the maximum number of tokens per input, making sure each text sample has a consistent length for the model to process efficiently.

The `__len__` method simply returns how many samples are in the dataset. This is a required method for any PyTorch dataset because it helps the data loader know how many times to loop through the dataset during training. By returning `len(self.texts)`, I‚Äôm basically telling the model how many text samples it will be working with.

Next, the `__getitem__` method handles how to retrieve each individual sample from the dataset. For each index, it grabs the text and its corresponding label, converts the text into a string, and ensures the label is in integer format. Then, it uses the tokenizer to convert the text into numerical form that the model can understand. Here, I set parameters like `truncation=True `to shorten long texts, `padding="max_length"` to make all sequences the same length, and `return_tensors='pt'` to output the data as PyTorch tensors.

Finally, I returned a dictionary containing three key elements: `input_ids, attention_mask,` and `labels`. The input_ids represent the tokenized text, the `attention_mask` tells the model which parts of the input are real words versus padding, and `labels` are the actual target outputs (either Normal or Suicidal). This structure ensures that when the data is loaded in batches, the model receives everything it needs to train


In [None]:
train_dataset = SentimentDataset(train_texts, train_labels, tokenizer)
val_dataset = SentimentDataset(val_texts, val_labels, tokenizer)

I created two dataset objects  `train_dataset and val_dataset  `using the custom SentimentDataset class I defined earlier. The `train_dataset` contains the training texts and labels that the model will learn from, while the `val_dataset `holds the validation data that will be used to test how well the model performs on unseen samples. Both datasets use the same tokenizer to make sure the text is processed in a consistent way.

By passing the text, labels, and tokenizer into the `SentimentDataset` class, each dataset automatically handles tokenization, padding, and truncation for every text entry. This means the data is already preprocessed and ready for **MentalBERT** to use. The dataset structure also makes it easy to load batches of data during training without having to manually tokenize or format the text every time.

Splitting the data this way helps prevent overfitting since the model can be trained on one portion of the dataset and validated on another. It‚Äôs a clean setup that keeps the workflow organized, ensuring that both training and evaluation use the same processing pipeline and consistent input format.


In [None]:
num_labels = len(label_mapping)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)

I first defined `num_labels` by getting the length of `label_mapping`, which tells me how many unique categories my model needs to predict. Since this project only has two classes ‚Äî Normal and Suicidal the value of `num_labels` will be 2. Setting this variable ensures that the output layer of the model has the correct number of neurons to match the classification task.

Next, I loaded the MentalBERT model using `AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)`. This command downloads the pre-trained MentalBERT weights and automatically configures the model for sequence classification. By including `num_labels`, the final layer is adjusted to handle exactly two output classes.

Doing this lets me take advantage of MentalBERT‚Äôs pre-trained knowledge on mental health‚Äìrelated text, while still customizing it for my specific task  detecting whether a statement is Normal or Suicidal. It‚Äôs an efficient way to use a powerful model that already understands language patterns without having to train one from scratch.


In [None]:
# Freeze lower layers for faster fine-tuning
for name, param in model.named_parameters():
    if not name.startswith("classifier") and not name.startswith("bert.encoder.layer.11"):
        param.requires_grad = False

I decided to **freeze the lower layers of the MentalBERT** model to make fine-tuning faster and more focused. In transformer models like BERT, the lower layers capture general language patterns, while the higher layers and the classifier layer specialize in task-specific features. Since I‚Äôm fine-tuning on a relatively small dataset, I don‚Äôt need to retrain the entire network from scratch.

The `for name, param in model.named_parameters()`: loop goes through all the parameters of the model. By checking the parameter names, I can selectively decide which ones to update during training. Specifically, I keep the classifier layer and the last encoder layer `(bert.encoder.layer.11)` trainable, because these layers are the most important for learning the distinctions between Normal and Suicidal statements.

All other parameters have `requires_grad = False`, meaning they won‚Äôt be updated during backpropagation. Freezing these layers reduces training time and computational load, while still allowing the model to adjust its final representations to my specific classification task. This is a practical way to fine-tune a large pre-trained model efficiently.


In [None]:
# 5Ô∏è‚É£ Define metrics
def compute_metrics(p):
    preds = p.predictions.argmax(-1)
    acc = accuracy_score(p.label_ids, preds)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": acc, "precision": precision, "recall": recall, "f1": f1}

The` compute_metrics` function is a custom evaluation metric function designed to assess the performance of the MentalBERT sentiment classifier during training and validation. It takes as input the predictions made by the model `(p.predictions)` and the corresponding true labels `(p.label_ids)`. Inside the function, the model‚Äôs predicted class for each input is obtained by selecting the index of the highest predicted probability using `argmax(-1)`. This gives a list of predicted labels that can be directly compared to the true labels. The function then calculates accuracy, which measures the proportion of correctly classified samples out of the total, providing an overall view of the model‚Äôs correctness.

Next, the function uses the`precision_recall_fscore_support` method from Scikit-learn to compute precision, recall, and F1-score with a ‚Äúweighted‚Äù average. This averaging method accounts for label imbalance by weighting each class according to its frequency in the dataset, ensuring fair evaluation even if some classes (e.g., ‚ÄúNormal‚Äù vs. ‚ÄúSuicidal‚Äù) appear more often than others. Precision measures how many predicted positives were actually correct, recall measures how many actual positives were correctly identified, and the F1-score balances both metrics as a harmonic mean. Finally, the function returns a dictionary containing all four metrics‚Äîaccuracy, precision, recall, and F1‚Äîwhich allows the Hugging Face Trainer to automatically compute and log these scores during training and evaluation.


## Hyperparameter set on Random Search

In [None]:
param_space = {
    "learning_rate": [1e-5, 2e-5, 3e-5, 5e-5],
    "per_device_train_batch_size": [8, 16, 32],
    "num_train_epochs": [2, 3, 4, 5],
    "weight_decay": [0.0, 0.01, 0.05]
}

n_trials = 5  # Number of random experiments
results = []

This part of the code defines the hyperparameter search space and sets up the configuration for conducting random search experiments.

The dictionary `param_space` specifies different values for four important hyperparameters that affect model training. The `learning_rate` controls how much the model‚Äôs weights are updated during training‚Äîtoo high may cause instability, while too low may slow convergence. The `per_device_train_batch_size` defines how many samples are processed before updating the model weights, impacting memory usage and training speed. The `num_train_epochs` indicates how many full passes the model makes over the training dataset, and weight_decay helps regularize the model to prevent overfitting by penalizing large weights.

The `variable n_trials = 5` means that five random combinations from the parameter space will be tested during the random search. Each trial will train and evaluate the model with a different random selection of hyperparameters. The list `results = []` initializes an empty container where the performance metrics (such as accuracy, precision, recall, and F1-score) of each trial will be stored for later comparison. This setup ensures the best-performing configuration can be identified after all experiments have been completed.


## Hyperparameter set for Grid Search

In [None]:
param_space = {
    "learning_rate": [3e-5, 5e-5],
    "per_device_train_batch_size": [8, 16],
    "num_train_epochs": [3, 4],
    "weight_decay": [0.01]
}

# Generate all combinations
param_combinations = list(itertools.product(
    param_space["learning_rate"],
    param_space["per_device_train_batch_size"],
    param_space["num_train_epochs"],
    param_space["weight_decay"]
))

results = []

print(f"\nüîç Total combinations to test: {len(param_combinations)}")

The `param_space` dictionary defines a smaller, focused range of hyperparameter values to reduce computational load compared to a larger search. Here, `learning_rate` has two options (3e-5 and 5e-5), `per_device_train_batch_size `has two options (8 and 16), `num_train_epochs` has two options (3 and 4), and weight_decay is fixed at 0.01. Using itertools.product, the code generates all possible combinations of these hyperparameters. Each combination represents a unique configuration to be tested during Grid Search.

The list `param_combinations` stores these combinations, while `results = []` initializes an empty list to collect performance metrics for each trial. The print statement displays the total number of combinations to be tested‚Äîin this case, 8 combinations so the user knows how many training/evaluation runs will be executed for the Grid Search experiment.

In [None]:
for i in range(n_trials):
    print(f"\nüöÄ Running Random Search Trial {i+1}/{n_trials}")

    # Randomly select parameters
    params = {k: random.choice(v) for k, v in param_space.items()}
    print("üéØ Selected params:", params)

    training_args = TrainingArguments(
        output_dir=f"./results_trial_{i+1}",
        eval_strategy="epoch",
        save_total_limit=0, # Change save_strategy="no" to save_total_limit=0
        learning_rate=params["learning_rate"],
        per_device_train_batch_size=params["per_device_train_batch_size"],
        per_device_eval_batch_size=params["per_device_train_batch_size"],
        num_train_epochs=params["num_train_epochs"],
        weight_decay=params["weight_decay"],
        logging_dir="./logs",
        logging_steps=10,
        load_best_model_at_end=False,
        metric_for_best_model="f1"
    )

The for loop iterates `n_trials` times, where each iteration corresponds to a single random search trial. Inside the loop, the params dictionary is created by randomly selecting one value for each hyperparameter from param_space using `random.choice()`. This ensures that each trial tests a different, randomly chosen combination of learning rate, batch size, number of epochs, and weight decay. The selected hyperparameters are printed so you can track which configuration is being evaluated in each trial.

Next, `TrainingArguments` from the Hugging Face Transformers library is instantiated with the randomly selected parameters. Key arguments include output_dir to store the trial‚Äôs results, eval_strategy="epoch" to evaluate the model at the end of each epoch, save_total_limit=0 to avoid saving multiple checkpoints, and the hyperparameters from params such as `learning_rate, per_device_train_batch_size, num_train_epochs, and weight_decay`. Logging is enabled for monitoring progress, and `metric_for_best_model="f1"` indicates that the F1-score would be used to identify the best-performing model if `load_best_model_at_end` were set to True. This setup prepares each trial for training and evaluation under a unique randomly selected hyperparameter configuration.


In [None]:
 trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
    )

    trainer.train()
    metrics = trainer.evaluate()

    result_entry = {
        "trial": i + 1,
        **params,
        "accuracy": metrics.get("eval_accuracy", 0),
        "precision": metrics.get("eval_precision", 0),
        "recall": metrics.get("eval_recall", 0),
        "f1": metrics.get("eval_f1", 0)
    }

    results.append(result_entry)

# Save results to Excel
results_df = pd.DataFrame(results)
results_df.to_excel("RandomSearch_Results.xlsx", index=False)
print("\n‚úÖ Random Search complete! Results saved to RandomSearch_Results.xlsx")

First, a Trainer object from Hugging Face is instantiated using the selected model, the `trial-specific training_args`, the `train_dataset and val_dataset,` the tokenizer, and the custom compute_metrics function. The trainer.train() method fine-tunes the model using the current hyperparameter configuration, and `trainer.evaluate()` computes evaluation metrics on the validation set.

Next, a dictionary `result_entry` is created to store the trial number, the hyperparameters used in this trial `(via **params)`, and the evaluation `metrics‚Äîaccuracy, precision, recall, and F1-score‚Äîretrieve`d from the metrics dictionary. This entry is appended to the results list. After all trials are completed, the results are converted into a Pandas DataFrame and saved to an Excel file named RandomSearch_Results.xlsx, making it easy to analyze and compare the performance of all trials. The final print statement confirms that the random search process has finished and the results are successfully saved.

In [None]:
best_trial = max(results, key=lambda x: x["f1"])
print("\nüèÜ Best Trial Configuration:")
for k, v in best_trial.items():
    print(f"{k}: {v}")

print("\n‚úÖ Sucessful")

The `max()` function iterates over the results list of dictionaries and uses a lambda function lambda x: x["f1"] as the key to determine which trial achieved the highest F1-score. This ensures that the selected trial balances both precision and recall, which is particularly important for imbalanced datasets like sentiment classification of ‚ÄúNormal‚Äù vs. ‚ÄúSuicidal‚Äù labels.

After finding the best trial, a for loop prints all details of that trial, including the trial number, the hyperparameters used, and the evaluation metrics (accuracy, precision, recall, and F1-score). The final print statement reminds the user that the results are now ready for further analysis in the accompanying Excel file, which can be used to prepare a detailed IEEE report comparing all random search experiments.

In [None]:
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

print("\nüí¨ Type a sentence to analyze (type 'quit' to exit)\n")

while True:
    text = input("Enter a sentence: ")
    if text.lower() == "quit":
        print("üëã Exiting.")
        break

    encoding = tokenizer(
        text,
        return_tensors='pt',
        truncation=True,
        padding='max_length',
        max_length=64
    ).to(device)

    with torch.no_grad():
        outputs = model(**encoding)
        preds = torch.softmax(outputs.logits, dim=-1)
        pred_label = torch.argmax(preds, dim=1).item()
        confidence = preds[0][pred_label].item()

    label_name = label_mapping[pred_label]
    print(f"üß† Prediction: {label_name} ({confidence:.2%} confidence)\n")

First, `model.eval()` sets the model to evaluation mode, disabling training-specific behaviors like dropout. The device is determined to use GPU if available, otherwise CPU, and the model is moved to that device with `model.to(device`) for efficient computation. The program then prints instructions, indicating that typing "quit" will exit the loop.

Inside the` while True` loop, the code takes user input (text) and processes it using the tokenizer, which converts the text into token IDs suitable for the model. Padding and truncation ensure the input matches the model‚Äôs expected maximum sequence length of 64 tokens.` torch.no_grad()` disables gradient calculation to save memory and speed up inference. The model outputs logits, which are converted to probabilities using `torch.softmax()`. The predicted label is obtained with argmax, and its corresponding confidence score is extracted. Finally, the predicted label name is retrieved from label_mapping, and the prediction with confidence is printed to the user. This loop continues until the user types `"quit"`.


# Random Search full code ( With Result)

In [None]:
# ===============================
# LIGHTWEIGHT MENTALBERT SENTIMENT CLASSIFIER WITH RANDOM SEARCH
# ===============================

import pandas as pd
import torch
import random
import numpy as np
from torch.utils.data import Dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    classification_report,
    accuracy_score,
    precision_recall_fscore_support
)

# 1Ô∏è‚É£ Load dataset
df = pd.read_csv("Cleaned_Combined_Data.csv")

TEXT_COL = "statement"
LABEL_COL = "status"

# Encode string labels to integers
df[LABEL_COL] = df[LABEL_COL].astype('category')
df['label_id'] = df[LABEL_COL].cat.codes
label_mapping = dict(enumerate(df[LABEL_COL].cat.categories))
print("‚úÖ Label mapping:", label_mapping)

# Split dataset
train_texts, val_texts, train_labels, val_labels = train_test_split(
    df[TEXT_COL].tolist(),
    df['label_id'].tolist(),
    test_size=0.2,
    random_state=42
)

# 2Ô∏è‚É£ Load tokenizer
model_name = "mental/mental-bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 3Ô∏è‚É£ Dataset class
class SentimentDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len=64):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = int(self.labels[idx])
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding="max_length",
            max_length=self.max_len,
            return_tensors='pt'
        )
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

# Prepare datasets
train_dataset = SentimentDataset(train_texts, train_labels, tokenizer)
val_dataset = SentimentDataset(val_texts, val_labels, tokenizer)

# 4Ô∏è‚É£ Load model
num_labels = len(label_mapping)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)

# ‚úÖ Freeze lower layers for faster training
for name, param in model.named_parameters():
    if not name.startswith("classifier") and not name.startswith("bert.encoder.layer.11"):
        param.requires_grad = False

# 5Ô∏è‚É£ Define metrics
def compute_metrics(p):
    preds = p.predictions.argmax(-1)
    acc = accuracy_score(p.label_ids, preds)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": acc, "precision": precision, "recall": recall, "f1": f1}

# ===============================
# 6Ô∏è‚É£ RANDOM SEARCH HYPERPARAMETER TUNING
# ===============================

param_space = {
    "learning_rate": [1e-5, 2e-5, 3e-5, 5e-5],
    "per_device_train_batch_size": [8, 16, 32],
    "num_train_epochs": [2, 3, 4, 5],
    "weight_decay": [0.0, 0.01, 0.05]
}

n_trials = 5  # Number of random experiments
results = []

for i in range(n_trials):
    print(f"\nüöÄ Running Random Search Trial {i+1}/{n_trials}")

    # Randomly select parameters
    params = {k: random.choice(v) for k, v in param_space.items()}
    print("üéØ Selected params:", params)

    training_args = TrainingArguments(
        output_dir=f"./results_trial_{i+1}",
        eval_strategy="epoch",
        save_total_limit=0, # Change save_strategy="no" to save_total_limit=0
        learning_rate=params["learning_rate"],
        per_device_train_batch_size=params["per_device_train_batch_size"],
        per_device_eval_batch_size=params["per_device_train_batch_size"],
        num_train_epochs=params["num_train_epochs"],
        weight_decay=params["weight_decay"],
        logging_dir="./logs",
        logging_steps=10,
        load_best_model_at_end=False,
        metric_for_best_model="f1"
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
    )

    trainer.train()
    metrics = trainer.evaluate()

    result_entry = {
        "trial": i + 1,
        **params,
        "accuracy": metrics.get("eval_accuracy", 0),
        "precision": metrics.get("eval_precision", 0),
        "recall": metrics.get("eval_recall", 0),
        "f1": metrics.get("eval_f1", 0)
    }

    results.append(result_entry)

# Save results to Excel
results_df = pd.DataFrame(results)
results_df.to_excel("RandomSearch_Results.xlsx", index=False)
print("\n‚úÖ Random Search complete! Results saved to RandomSearch_Results.xlsx")

# ===============================
# 7Ô∏è‚É£ BEST MODEL FINAL EVALUATION (Optional)
# ===============================

best_trial = max(results, key=lambda x: x["f1"])
print("\nüèÜ Best Trial Configuration:")
for k, v in best_trial.items():
    print(f"{k}: {v}")

print("\n‚úÖ You can now proceed to analyze the Excel file for your IEEE report.")

# ===============================
# 8Ô∏è‚É£ USER INPUT PREDICTION (Optional interactive testing)
# ===============================

model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

print("\nüí¨ Type a sentence to analyze (type 'quit' to exit)\n")

while True:
    text = input("Enter a sentence: ")
    if text.lower() == "quit":
        print("üëã Exiting.")
        break

    encoding = tokenizer(
        text,
        return_tensors='pt',
        truncation=True,
        padding='max_length',
        max_length=64
    ).to(device)

    with torch.no_grad():
        outputs = model(**encoding)
        preds = torch.softmax(outputs.logits, dim=-1)
        pred_label = torch.argmax(preds, dim=1).item()
        confidence = preds[0][pred_label].item()

    label_name = label_mapping[pred_label]
    print(f"üß† Prediction: {label_name} ({confidence:.2%} confidence)\n")

‚úÖ Label mapping: {0: 'Normal', 1: 'Suicidal'}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mental/mental-bert-base-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



üöÄ Running Random Search Trial 1/5
üéØ Selected params: {'learning_rate': 2e-05, 'per_device_train_batch_size': 8, 'num_train_epochs': 2, 'weight_decay': 0.05}


  trainer = Trainer(
  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mdominicboy-almazan[0m ([33msteven-tiu-jose-rizal-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.0961,0.101192,0.968513,0.968498,0.968513,0.968503
2,0.0576,0.099472,0.971106,0.97111,0.971106,0.971108



üöÄ Running Random Search Trial 2/5
üéØ Selected params: {'learning_rate': 1e-05, 'per_device_train_batch_size': 8, 'num_train_epochs': 4, 'weight_decay': 0.0}


  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.0343,0.09825,0.974254,0.974271,0.974254,0.974261
2,0.0537,0.090445,0.974625,0.974613,0.974625,0.974609
3,0.1057,0.091591,0.975366,0.975358,0.975366,0.975361
4,0.0198,0.090951,0.975921,0.97591,0.975921,0.975911



üöÄ Running Random Search Trial 3/5
üéØ Selected params: {'learning_rate': 1e-05, 'per_device_train_batch_size': 8, 'num_train_epochs': 4, 'weight_decay': 0.0}


  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.0029,0.095711,0.977774,0.977777,0.977774,0.977775
2,0.0187,0.088298,0.977588,0.977584,0.977588,0.977571
3,0.1086,0.091554,0.978515,0.978505,0.978515,0.978507
4,0.0074,0.091129,0.97907,0.979065,0.97907,0.979055



üöÄ Running Random Search Trial 4/5
üéØ Selected params: {'learning_rate': 1e-05, 'per_device_train_batch_size': 8, 'num_train_epochs': 4, 'weight_decay': 0.0}


  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.0006,0.097665,0.97907,0.979066,0.97907,0.979068
2,0.0034,0.093723,0.979441,0.979493,0.979441,0.979408
3,0.0802,0.095538,0.980737,0.980739,0.980737,0.980721
4,0.0022,0.094892,0.980182,0.980188,0.980182,0.980163



üöÄ Running Random Search Trial 5/5
üéØ Selected params: {'learning_rate': 1e-05, 'per_device_train_batch_size': 8, 'num_train_epochs': 4, 'weight_decay': 0.0}


  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.0004,0.105607,0.979996,0.979991,0.979996,0.979993
2,0.0021,0.104141,0.980182,0.980264,0.980182,0.980145
3,0.0328,0.106195,0.979996,0.980047,0.979996,0.979965
4,0.0007,0.104824,0.980737,0.980782,0.980737,0.980709



‚úÖ Random Search complete! Results saved to RandomSearch_Results.xlsx

üèÜ Best Trial Configuration:
trial: 5
learning_rate: 1e-05
per_device_train_batch_size: 8
num_train_epochs: 4
weight_decay: 0.0
accuracy: 0.9807371735506575
precision: 0.980782169937334
recall: 0.9807371735506575
f1: 0.9807090073863047

‚úÖ You can now proceed to analyze the Excel file for your IEEE report.

üí¨ Type a sentence to analyze (type 'quit' to exit)

üß† Prediction: Suicidal (98.90% confidence)

Enter a sentence: quit
üëã Exiting.


# Grid Search

In [4]:
# ===============================
# LIGHTWEIGHT MENTALBERT SENTIMENT CLASSIFIER WITH GRID SEARCH
# ===============================

import pandas as pd
import torch
import itertools
from torch.utils.data import Dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score,
    precision_recall_fscore_support
)

# 1Ô∏è‚É£ Load dataset
df = pd.read_csv("Cleaned_Combined_Data.csv")

TEXT_COL = "statement"
LABEL_COL = "status"

# Encode string labels to integers
df[LABEL_COL] = df[LABEL_COL].astype('category')
df['label_id'] = df[LABEL_COL].cat.codes
label_mapping = dict(enumerate(df[LABEL_COL].cat.categories))
print("‚úÖ Label mapping:", label_mapping)

# Split dataset
train_texts, val_texts, train_labels, val_labels = train_test_split(
    df[TEXT_COL].tolist(),
    df['label_id'].tolist(),
    test_size=0.2,
    random_state=42
)

# 2Ô∏è‚É£ Load tokenizer
model_name = "mental/mental-bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 3Ô∏è‚É£ Dataset class
class SentimentDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len=64):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = int(self.labels[idx])
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding="max_length",
            max_length=self.max_len,
            return_tensors='pt'
        )
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

# Prepare datasets
train_dataset = SentimentDataset(train_texts, train_labels, tokenizer)
val_dataset = SentimentDataset(val_texts, val_labels, tokenizer)

# 4Ô∏è‚É£ Define metrics
def compute_metrics(p):
    preds = p.predictions.argmax(-1)
    acc = accuracy_score(p.label_ids, preds)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": acc, "precision": precision, "recall": recall, "f1": f1}

# ===============================
# 5Ô∏è‚É£ GRID SEARCH HYPERPARAMETER TUNING
# ===============================

param_space = {
    "learning_rate": [3e-5, 5e-5],
    "per_device_train_batch_size": [8, 16],
    "num_train_epochs": [3, 4],
    "weight_decay": [0.01]
}

# Generate all combinations
param_combinations = list(itertools.product(
    param_space["learning_rate"],
    param_space["per_device_train_batch_size"],
    param_space["num_train_epochs"],
    param_space["weight_decay"]
))

results = []

print(f"\nüîç Total combinations to test: {len(param_combinations)}")

for i, (lr, batch_size, epochs, wd) in enumerate(param_combinations, 1):
    print(f"\nüöÄ Running Grid Search Trial {i}/{len(param_combinations)}")
    print(f"üéØ Params: LR={lr}, Batch={batch_size}, Epochs={epochs}, Weight Decay={wd}")

    # Reload model each iteration
    num_labels = len(label_mapping)
    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)

    # Freeze lower layers for faster training
    for name, param in model.named_parameters():
        if not name.startswith("classifier") and not name.startswith("bert.encoder.layer.11"):
            param.requires_grad = False

    training_args = TrainingArguments(
        output_dir=f"./grid_results_trial_{i}",
        eval_strategy="epoch",
        save_strategy="no", # Changed from evaluation_strategy
        learning_rate=lr,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        num_train_epochs=epochs,
        weight_decay=wd,
        logging_dir="./logs",
        logging_steps=10,
        load_best_model_at_end=False,
        metric_for_best_model="f1"
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
    )

    trainer.train()
    metrics = trainer.evaluate()

    result_entry = {
        "trial": i,
        "learning_rate": lr,
        "per_device_train_batch_size": batch_size,
        "num_train_epochs": epochs,
        "weight_decay": wd,
        "accuracy": metrics.get("eval_accuracy", 0),
        "precision": metrics.get("eval_precision", 0),
        "recall": metrics.get("eval_recall", 0),
        "f1": metrics.get("eval_f1", 0)
    }

    results.append(result_entry)

# Save results to Excel
results_df = pd.DataFrame(results)
results_df.to_excel("GridSearch_Results.xlsx", index=False)
print("\n‚úÖ Grid Search complete! Results saved to GridSearch_Results.xlsx")

# ===============================
# 6Ô∏è‚É£ BEST MODEL SELECTION
# ===============================

best_trial = max(results, key=lambda x: x["f1"])
print("\nüèÜ Best Grid Search Configuration:")
for k, v in best_trial.items():
    print(f"{k}: {v}")

print("\n‚úÖ Complete")

‚úÖ Label mapping: {0: 'Normal', 1: 'Suicidal'}

üîç Total combinations to test: 8

üöÄ Running Grid Search Trial 1/8
üéØ Params: LR=3e-05, Batch=8, Epochs=3, Weight Decay=0.01


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mental/mental-bert-base-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.0815,0.099013,0.971661,0.971664,0.971661,0.971662
2,0.0711,0.084789,0.975736,0.975734,0.975736,0.975713
3,0.0938,0.086624,0.976848,0.976842,0.976848,0.976829



üöÄ Running Grid Search Trial 2/8
üéØ Params: LR=3e-05, Batch=8, Epochs=4, Weight Decay=0.01


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mental/mental-bert-base-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.0992,0.097917,0.973514,0.973503,0.973514,0.973507
2,0.0508,0.082761,0.975921,0.975918,0.975921,0.9759
3,0.1374,0.087715,0.977033,0.977041,0.977033,0.977036
4,0.0261,0.086383,0.978515,0.978507,0.978515,0.978502



üöÄ Running Grid Search Trial 3/8
üéØ Params: LR=3e-05, Batch=16, Epochs=3, Weight Decay=0.01


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mental/mental-bert-base-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1277,0.087276,0.97018,0.970166,0.97018,0.970151
2,0.0829,0.077469,0.97481,0.974798,0.97481,0.974795
3,0.0801,0.078211,0.975551,0.975539,0.975551,0.975536



üöÄ Running Grid Search Trial 4/8
üéØ Params: LR=3e-05, Batch=16, Epochs=4, Weight Decay=0.01


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mental/mental-bert-base-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1151,0.087445,0.971106,0.97109,0.971106,0.971084
2,0.0787,0.076366,0.974995,0.974983,0.974995,0.974983
3,0.0589,0.080569,0.974995,0.975084,0.974995,0.975018
4,0.1112,0.075545,0.977403,0.977397,0.977403,0.977386



üöÄ Running Grid Search Trial 5/8
üéØ Params: LR=5e-05, Batch=8, Epochs=3, Weight Decay=0.01


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mental/mental-bert-base-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.079,0.097286,0.973884,0.97389,0.973884,0.973887
2,0.0284,0.077349,0.977774,0.977773,0.977774,0.977753
3,0.0909,0.081685,0.979811,0.979805,0.979811,0.979798



üöÄ Running Grid Search Trial 6/8
üéØ Params: LR=5e-05, Batch=8, Epochs=4, Weight Decay=0.01


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mental/mental-bert-base-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.0709,0.096014,0.97444,0.974433,0.97444,0.974436
2,0.0414,0.075271,0.977959,0.978007,0.977959,0.977924
3,0.133,0.080154,0.980737,0.98073,0.980737,0.98073
4,0.0072,0.080827,0.981293,0.981291,0.981293,0.981279



üöÄ Running Grid Search Trial 7/8
üéØ Params: LR=5e-05, Batch=16, Epochs=3, Weight Decay=0.01


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mental/mental-bert-base-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1262,0.082254,0.972588,0.972587,0.972588,0.972556
2,0.0795,0.069258,0.977033,0.977023,0.977033,0.977019
3,0.0617,0.07003,0.979255,0.979252,0.979255,0.97924



üöÄ Running Grid Search Trial 8/8
üéØ Params: LR=5e-05, Batch=16, Epochs=4, Weight Decay=0.01


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mental/mental-bert-base-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1027,0.082515,0.972588,0.972572,0.972588,0.972573
2,0.0706,0.069187,0.977588,0.977579,0.977588,0.977576
3,0.0417,0.071709,0.978329,0.978384,0.978329,0.978344
4,0.0805,0.068425,0.980737,0.980746,0.980737,0.980718



‚úÖ Grid Search complete! Results saved to GridSearch_Results.xlsx

üèÜ Best Grid Search Configuration:
trial: 6
learning_rate: 5e-05
per_device_train_batch_size: 8
num_train_epochs: 4
weight_decay: 0.01
accuracy: 0.981292832005927
precision: 0.9812912739247692
recall: 0.981292832005927
f1: 0.9812794158918405

‚úÖ Complete
