# Task
Summarize the performance of the BERT-based depression detection system using the calculated accuracy, precision, recall, and F1-score. Discuss potential improvements and next steps for deployment based on these results and the confusion matrix.

## Load Dataset

### Subtask:
Load the 'indian_depression_dataset_2025.csv' dataset into a Pandas DataFrame and inspect its structure, including columns and initial rows. Identify the text and label columns.


**Reasoning**:
First, I'll import the pandas library, then load the CSV file 'indian_depression_dataset_2025.csv' into a DataFrame named `df`, and finally display the first 5 rows to inspect its structure and identify the relevant columns.



In [19]:
import pandas as pd

df = pd.read_csv('indian_depression_dataset_2025.csv')
print(df.head())

FileNotFoundError: [Errno 2] No such file or directory: 'indian_depression_dataset_2025.csv'

**Reasoning**:
The previous code failed because the specified file 'indian_depression_dataset_2025.csv' was not found. I will correct the filename to the available file '/content/raw_depression_dataset_india.csv' to successfully load the dataset.



In [20]:
import pandas as pd

df = pd.read_csv('/content/raw_depression_dataset_india.csv')
print(df.head())

                                          tweet_text  label        date
0  my anxiety is at an all time high. i feel like...      1  2025-12-11
1  nothing brings me joy anymore. not even the th...      1  2025-12-11
2  honestly why am i even trying anymore. nothing...      1  2025-12-02
3  The pollution in Delhi is so bad i can barely ...      0  2025-12-05
4  my anxiety is at an all time high. i feel like...      1  2025-12-02


## Prepare Data for BERT

### Subtask:
Preprocess the text data for BERT by tokenizing it, converting tokens to input IDs, creating attention masks, and segment IDs. Split the dataset into training and validation sets to prepare for model training.


**Reasoning**:
To prepare the text data for BERT, I need to tokenize the 'tweet_text' column, convert the tokens and labels into PyTorch tensors, create a TensorDataset, split it into training and validation sets, and finally create DataLoaders for efficient batch processing during model training.



In [21]:
import torch
from transformers import BertTokenizer
from torch.utils.data import TensorDataset, random_split, DataLoader, RandomSampler, SequentialSampler

# 1. Import and instantiate BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

# 2. Define MAX_LEN
MAX_LEN = 128

# 3. Tokenize the 'tweet_text' column
input_ids = []
attention_masks = []

for text in df.tweet_text.values:
    encoded_dict = tokenizer.encode_plus(
                        text,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length = MAX_LEN,      # Pad & truncate all sentences.
                        padding = 'max_length',
                        return_attention_mask = True,   # Construct attention masks.
                        return_tensors = 'pt',     # Return pytorch tensors.
                        truncation=True
                   )

    # Add the encoded sentence to the list
    input_ids.append(encoded_dict['input_ids'])

    # Add the attention mask to the list
    attention_masks.append(encoded_dict['attention_mask'])

# Convert the lists into tensors
input_ids = torch.cat(input_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)

# 5. Convert the 'label' column to a PyTorch tensor
labels = torch.tensor(df.label.values)

# Print sentence 0, now as a list of IDs.
print('Original: ', df.tweet_text.values[0])
print('Token IDs:', input_ids[0])

# 6. Create a TensorDataset
dataset = TensorDataset(input_ids, attention_masks, labels)

# 7. Calculate train and validation set sizes
train_size = int(0.9 * len(dataset))
val_size = len(dataset) - train_size

print(f"Dataset size: {len(dataset)}")
print(f"Training size: {train_size}")
print(f"Validation size: {val_size}")

# 8. Use random_split to divide the dataset
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

# 10. Define batch_size
batch_size = 32 # A common BERT fine-tuning batch size

# 11. Create DataLoader instances
train_dataloader = DataLoader(
            train_dataset,  # The training samples.
            sampler = RandomSampler(train_dataset), # Select batches randomly
            batch_size = batch_size # Trains with this batch size.
        )

validation_dataloader = DataLoader(
            val_dataset,    # The validation samples.
            sampler = SequentialSampler(val_dataset), # Pulls batches sequentially.
            batch_size = batch_size # Evaluate with this batch size.
        )

print("Data preparation complete. Dataloaders for training and validation are created.")

Original:  my anxiety is at an all time high. i feel like i can't breathe right now.
Token IDs: tensor([  101,  2026, 10089,  2003,  2012,  2019,  2035,  2051,  2152,  1012,
         1045,  2514,  2066,  1045,  2064,  1005,  1056,  7200,  2157,  2085,
         1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,    

## Load Pre-trained BERT Model

### Subtask:
Load a pre-trained BERT model suitable for sequence classification from a library like Hugging Face Transformers. Configure the model for a binary classification task (labels 0 and 1).


**Reasoning**:
To load a pre-trained BERT model, I will import `BertForSequenceClassification` from the `transformers` library, instantiate it for binary classification (`num_labels=2`), and then move the model to the available GPU if one exists, otherwise to the CPU.



In [22]:
from transformers import BertForSequenceClassification

# Check for GPU and set device
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f'There are {torch.cuda.device_count()} GPU(s) available.')
    print('Device name:', torch.cuda.get_device_name(0))
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

# Instantiate the BERT model for sequence classification
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
    num_labels = 2, # The number of output labels--2 for binary classification.
                    # You can increase this for multi-class tasks.
    output_attentions = False, # Whether the model returns attentions weights.
    output_hidden_states = False, # Whether the model returns all hidden-states.
)

# Tell pytorch to run this model on the GPU or CPU
model.to(device)

print("BERT model loaded and configured for binary classification.")

No GPU available, using the CPU instead.


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERT model loaded and configured for binary classification.


**Reasoning**:
The previous code successfully loaded the BERT model, and now the next logical step is to set up the optimizer and learning rate scheduler required for fine-tuning the model. This will involve defining the AdamW optimizer and a linear scheduler with a warmup period.



In [23]:
## Define Optimizer and Learning Rate Scheduler

### Subtask:
Define the optimizer and a learning rate scheduler for fine-tuning the BERT model. This includes specifying the optimizer's parameters and setting up a scheduler for adjusting the learning rate over time.

#### Instructions:
1. Import the `AdamW` optimizer from the `transformers` library.
2. Import `get_linear_schedule_with_warmup` from `transformers.optimization`.
3. Instantiate the `AdamW` optimizer, specifying the model's parameters and a learning rate (e.g., 2e-5) and weight decay.
4. Calculate the total number of training steps, typically `(number of epochs * number of training batches)`.
5. Create the learning rate scheduler using `get_linear_schedule_with_warmup`, passing the optimizer, number of warmup steps, and total training steps.

SyntaxError: unterminated string literal (detected at line 4) (ipython-input-1036446461.py, line 4)

**Reasoning**:
The previous code block caused a `SyntaxError` because it contained markdown text instead of executable Python code. I need to replace it with the actual Python implementation for defining the optimizer and learning rate scheduler, following the instructions in the preceding text cells.



In [24]:
from transformers import AdamW, get_linear_schedule_with_warmup
import torch

# Instantiate the AdamW optimizer with model parameters and learning rate
optimizer = AdamW(model.parameters(),
                  lr = 2e-5,  # Default learning rate
                  eps = 1e-8  # Default epsilon
                 )

# Number of training epochs (authors recommend between 2 and 4)
epochs = 4 # This value is already present in the kernel state.

# Total number of training steps is [number of batches] x [number of epochs].
# Calculate total steps based on the train_dataloader size and epochs.
total_steps = len(train_dataloader) * epochs

# Create the learning rate scheduler
scheduler = get_linear_schedule_with_warmup(optimizer,
                                            num_warmup_steps = 0, # Default value in run_glue.py
                                            num_training_steps = total_steps)

print("Optimizer and learning rate scheduler defined.")

ImportError: cannot import name 'AdamW' from 'transformers' (/usr/local/lib/python3.12/dist-packages/transformers/__init__.py)

**Reasoning**:
The previous error indicated that `AdamW` could not be imported directly from `transformers`. This is because in more recent versions, `AdamW` is typically found in `torch.optim`. I will update the import statement to reflect this change, ensuring that `AdamW` is imported from `torch.optim` while `get_linear_schedule_with_warmup` is still imported from `transformers`.



In [25]:
from torch.optim import AdamW
from transformers import get_linear_schedule_with_warmup
import torch

# Instantiate the AdamW optimizer with model parameters and learning rate
optimizer = AdamW(model.parameters(),
                  lr = 2e-5,  # Default learning rate
                  eps = 1e-8  # Default epsilon
                 )

# Number of training epochs (authors recommend between 2 and 4)
epochs = 4 # This value is already present in the kernel state.

# Total number of training steps is [number of batches] x [number of epochs].
# Calculate total steps based on the train_dataloader size and epochs.
total_steps = len(train_dataloader) * epochs

# Create the learning rate scheduler
scheduler = get_linear_schedule_with_warmup(optimizer,
                                            num_warmup_steps = 0, # Default value in run_glue.py
                                            num_training_steps = total_steps)

print("Optimizer and learning rate scheduler defined.")

Optimizer and learning rate scheduler defined.


## Fine-tune BERT Model

### Subtask:
Set up a training loop to fine-tune the loaded BERT model on your prepared training dataset. Define an optimizer, a loss function, and train the model for a specified number of epochs, monitoring validation performance.


**Reasoning**:
I need to set up the training loop for the BERT model, which involves importing `tqdm`, defining an accuracy calculation function, setting random seeds for reproducibility, and then implementing the per-epoch training and validation steps as outlined in the instructions.



In [26]:
import numpy as np
import time
import datetime
from tqdm.notebook import tqdm

# Function to calculate accuracy for a batch
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

# Set the random seed for reproducibility
seed_val = 42
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

# Store the average loss after each epoch so we can plot them.
loss_values = []

# For each epoch...
for epoch_i in range(0, epochs):

    print(f'======== Epoch {epoch_i + 1} / {epochs} ========\nTraining...')

    # Measure how long the training epoch takes.
    t0 = time.time()

    # Reset the total loss for this epoch.
    total_loss = 0

    # Put the model into training mode.
    model.train()

    # For each batch of training data...
    for step, batch in tqdm(enumerate(train_dataloader), total=len(train_dataloader)):

        # Progress update every 40 batches.
        if step % 40 == 0 and not step == 0:
            # Calculate elapsed time in minutes.
            elapsed = datetime.timedelta(seconds=int(round(time.time() - t0)))
            # Report progress.
            print(f'  Batch {step:>5,}  of  {len(train_dataloader):>5,}.    Elapsed: {elapsed}.')

        # Unpack this training batch from our dataloader.
        # `batch` contains three pytorch tensors:
        #   [0]: input ids
        #   [1]: attention masks
        #   [2]: labels
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)

        # Always clear any previously calculated gradients before performing a backward pass.
        model.zero_grad()

        # Perform a forward pass (evaluate the model on this training batch).
        # The documentation for this `model` function is here:
        # https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification
        # It returns different things depending on the content of its arguments.
        # In this case, we just need the `loss` (first element) and `logits` (second element).
        outputs = model(b_input_ids,
                        token_type_ids=None,
                        attention_mask=b_input_mask,
                        labels=b_labels)

        loss = outputs.loss
        logits = outputs.logits

        # Accumulate the training loss over all of the batches so that we can
        # calculate the average loss at the end. `loss` is a `torch.Tensor` containing a single value;
        # the `.item()` function just returns the Python value from the tensor.
        total_loss += loss.item()

        # Perform a backward pass to calculate the gradients.
        loss.backward()

        # Clip the norm of the gradients to prevent the "exploding gradients" problem.
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        # Update parameters and take a step using the computed gradient.
        # The optimizer dictates the learning rate and how to apply the gradients.
        optimizer.step()

        # Update the learning rate.
        scheduler.step()

    # Calculate the average loss over the training data.
    avg_train_loss = total_loss / len(train_dataloader)

    # Store the loss value for plotting the learning curve.
    loss_values.append(avg_train_loss)

    print(f'  Average training loss: {avg_train_loss:.2f}')
    print(f'  Training epoch took: {datetime.timedelta(seconds=int(round(time.time() - t0)))}')

    print('\nValidation...')

    t0 = time.time()

    # Put the model in evaluation mode--the dropout layers behave differently during evaluation.
    model.eval()

    # Tracking variables
    eval_loss, eval_accuracy = 0, 0
    nb_eval_steps, nb_eval_examples = 0, 0

    # Evaluate data for one epoch
    for batch in validation_dataloader:

        # Add batch to GPU
        batch = tuple(t.to(device) for t in batch)

        # Unpack the inputs from our dataloader
        b_input_ids, b_input_mask, b_labels = batch

        # Telling the model not to compute or store gradients, saving memory and speeding up validation
        with torch.no_grad():
            # Forward pass, calculate logit predictions.
            # This will return the logits rather than the loss because we have not provided labels.
            outputs = model(b_input_ids,
                            token_type_ids=None,
                            attention_mask=b_input_mask,
                            labels=b_labels)

        # Get the loss and logits from the output
        loss = outputs.loss
        logits = outputs.logits

        # Accumulate the validation loss.
        eval_loss += loss.item()

        # Move logits and labels to CPU
        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()

        # Calculate the accuracy for this batch of test sentences, and add to the total.
        tmp_eval_accuracy = flat_accuracy(logits, label_ids)
        eval_accuracy += tmp_eval_accuracy

        # Track the number of batches
        nb_eval_steps += 1

    # Report the final accuracy for this validation run.
    print(f'  Validation Loss: {(eval_loss/nb_eval_steps):.2f}')
    print(f'  Accuracy: {(eval_accuracy/nb_eval_steps):.2f}')
    print(f'  Validation took: {datetime.timedelta(seconds=int(round(time.time() - t0)))}')

print('\nTraining complete!')

Training...


  0%|          | 0/12 [00:00<?, ?it/s]

  Average training loss: 0.48
  Training epoch took: 0:09:09

Validation...
  Validation Loss: 0.31
  Accuracy: 1.00
  Validation took: 0:00:19
Training...


  0%|          | 0/12 [00:00<?, ?it/s]

  Average training loss: 0.24
  Training epoch took: 0:08:55

Validation...
  Validation Loss: 0.15
  Accuracy: 1.00
  Validation took: 0:00:17
Training...


  0%|          | 0/12 [00:00<?, ?it/s]

  Average training loss: 0.13
  Training epoch took: 0:08:45

Validation...
  Validation Loss: 0.08
  Accuracy: 1.00
  Validation took: 0:00:17
Training...


  0%|          | 0/12 [00:00<?, ?it/s]

  Average training loss: 0.08
  Training epoch took: 0:08:42

Validation...
  Validation Loss: 0.06
  Accuracy: 1.00
  Validation took: 0:00:18

Training complete!


## Evaluate Model Performance

### Subtask:
Evaluate the fine-tuned BERT model on the validation set. Calculate and display key classification metrics such as accuracy, precision, recall, and F1-score to assess the model's performance. Also, generate a confusion matrix to visualize the true positives, true negatives, false positives, and false negatives.


**Reasoning**:
The previous steps completed the training of the BERT model. Now, to evaluate its performance as per the subtask, I need to collect predictions and true labels from the validation set and then calculate various classification metrics and generate a confusion matrix.



In [27]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Put the model in evaluation mode
model.eval()

# Tracking variables for predictions and true labels
val_preds_all = []
val_labels_all = []

# Predict
for batch in validation_dataloader:
    # Add batch to device
    batch = tuple(t.to(device) for t in batch)

    # Unpack the inputs from our dataloader
    b_input_ids, b_input_mask, b_labels = batch

    # Telling the model not to compute or store gradients, saving memory and speeding up prediction
    with torch.no_grad():
        # Forward pass, calculate logit predictions.
        outputs = model(b_input_ids,
                        token_type_ids=None,
                        attention_mask=b_input_mask)

    logits = outputs.logits

    # Move logits and labels to CPU
    logits = logits.detach().cpu().numpy()
    label_ids = b_labels.to('cpu').numpy()

    # Store predictions and true labels
    val_preds_all.extend(np.argmax(logits, axis=1).flatten())
    val_labels_all.extend(label_ids.flatten())

# Convert to numpy arrays
val_preds_all = np.array(val_preds_all)
val_labels_all = np.array(val_labels_all)

# Calculate metrics
accuracy = accuracy_score(val_labels_all, val_preds_all)
precision = precision_score(val_labels_all, val_preds_all, average='binary', zero_division=0)
recall = recall_score(val_labels_all, val_preds_all, average='binary', zero_division=0)
f1 = f1_score(val_labels_all, val_preds_all, average='binary', zero_division=0)
cm = confusion_matrix(val_labels_all, val_preds_all)

print(f"\n----- Model Performance on Validation Set -----")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
print(f"\nConfusion Matrix:\n{cm}")

# Interpreting the confusion matrix (assuming label 1 is 'depressed' and 0 is 'not depressed')
# cm[0,0]: True Negatives (Correctly predicted not depressed)
# cm[0,1]: False Positives (Incorrectly predicted depressed - Type I error)
# cm[1,0]: False Negatives (Incorrectly predicted not depressed - Type II error)
# cm[1,1]: True Positives (Correctly predicted depressed)

print("\nInterpretation of Confusion Matrix:")
print(f"True Negatives (TN): {cm[0,0]} (Correctly predicted not depressed)")
print(f"False Positives (FP): {cm[0,1]} (Incorrectly predicted depressed)")
print(f"False Negatives (FN): {cm[1,0]} (Incorrectly predicted not depressed)")
print(f"True Positives (TP): {cm[1,1]} (Correctly predicted depressed)")


----- Model Performance on Validation Set -----
Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1 Score: 1.0000

Confusion Matrix:
[[19  0]
 [ 0 21]]

Interpretation of Confusion Matrix:
True Negatives (TN): 19 (Correctly predicted not depressed)
False Positives (FP): 0 (Incorrectly predicted depressed)
False Negatives (FN): 0 (Incorrectly predicted not depressed)
True Positives (TP): 21 (Correctly predicted depressed)


## Final Task

### Subtask:
Summarize the performance of the depression detection system, including the key evaluation metrics and discuss potential next steps for improvement or deployment.


## Summary:

### Q&A
The BERT-based depression detection system achieved perfect performance on the validation set, with an accuracy of 1.0000, precision of 1.0000, recall of 1.0000, and an F1-score of 1.0000. The confusion matrix showed 19 True Negatives, 0 False Positives, 0 False Negatives, and 21 True Positives, indicating that all samples in the validation set were classified correctly.

Based on these results, potential improvements and next steps for deployment should focus on rigorous testing and validation with larger, more diverse, and unseen datasets to confirm the model's generalizability and robustness, as perfect scores on a small validation set may indicate issues like data leakage or an overly simplistic evaluation.

### Data Analysis Key Findings
*   The dataset `raw_depression_dataset_india.csv` was loaded, containing `tweet_text` for analysis and a `label` column for binary classification.
*   The text data was preprocessed for BERT, resulting in a dataset of 400 samples, split into a training set of 360 samples and a validation set of 40 samples.
*   A `BertForSequenceClassification` model (`bert-base-uncased`) was loaded and configured for binary classification, fine-tuned using the AdamW optimizer with a learning rate of 2e-5 for 4 epochs on the CPU.
*   During fine-tuning, the average training loss decreased from 0.48 in the first epoch to 0.08 in the fourth epoch.
*   The validation loss consistently decreased from 0.31 to 0.06 over the four epochs, with validation accuracy consistently reported as 1.00.
*   The final evaluation on the validation set yielded:
    *   **Accuracy:** 1.0000
    *   **Precision:** 1.0000
    *   **Recall:** 1.0000
    *   **F1 Score:** 1.0000
*   The confusion matrix indicated 19 True Negatives, 0 False Positives, 0 False Negatives, and 21 True Positives, meaning the model made no errors on the validation set.

### Insights or Next Steps
*   The observed perfect performance across all metrics (accuracy, precision, recall, F1-score of 1.0000) on the validation set, especially given its small size of 40 samples, is highly unusual. This suggests a strong possibility of data leakage, an overly simple validation set, or that the dataset is not representative of real-world complexity.
*   For deployment and to ensure reliability, it is crucial to re-evaluate the model on a larger, entirely independent, and more diverse test set. Additionally, consider cross-validation techniques or K-fold splitting to better assess the model's generalization capabilities and mitigate risks associated with an extremely high validation score.
