# Feature Addition

In this notebook, we aim to choose the optimal option among concat and unimodal approaches in which additional numerical and categorical textual data are incorporated on top of text data. The utilized model is BERT+BERT where we choose the best extractor + classifier from the previous notebook named `Classification Selection.ipynb`.

- 2 numerical (float) features:
  - tweet length
  - offensive word count

- 3 categorical (boolean) features:
  - pos. emoji
  - neg. emoji
  - URL link

### Unimodal:
Additional features are appended into tweets before the tokenization phase.
Obtained tweets are fed into BERT+BERT to get the results. This is the most straightforward method.

- Tweet => Tweet + Features => Featured Tweet
- Featured Tweet => BERT+BERT (tokenizer, encoder layers) => results

### Concat:
Additional features are fed into tokenizer first with the corresponded tweets.
These tokenized features are concatenated with the classifier vector of the corresponded tweet provided by the BERT model. Then the concatenated vector are fed into the BERT classifier to get the results.

- Features, Tweets => BERT (tokenizer) => Tok_Feat, Tok_Tweet
- Tok_Tweet => BERT (encoder layers) => Classifier Vector
- Classifier Vector => Tok_Feat + Classifier Vector => Final Vector
- Final Vector => BERT Classifier => Results

## Results:
- BERT+BERT (Concat):
  - holdout:
    - Acc: 0.94, Prec: 0.91, Rec: 0.97, F1: 0.94, F1-micro: 0.94, F1-macro: 0.94, F1-weighted: 0.94, G-mean: 0.94
  - 5-fold:
    - Acc: 1.00, Prec: 1.00, Rec: 1.00, F1: 1.00, F1-micro: 1.00, F1-macro: 1.00, F1-weighted: 1.00, G-mean: 1.00

- BERT+BERT (Unimodal):
  - holdout:
    - Acc: 0.95, Prec: 0.92, Rec: 0.97, F1: 0.95, F1-micro: 0.95, F1-macro: 0.95, F1-weighted: 0.95, G-mean: 0.95
  - 5-fold:
    - Acc: 0.99, Prec: 0.99, Rec: 0.99, F1: 0.99, F1-micro: 0.99, F1-macro: 0.99, F1-weighted: 0.99, G-mean: 0.99

<br>

In [None]:
import os

import pandas as pd
import numpy as np
from google.colab import runtime
import zipfile

In [None]:
# unzipping the zip file
with zipfile.ZipFile("nst_preprocessed_tweets.zip", 'r') as zip_ref:
    zip_ref.extractall(os.getcwd())

In [None]:
df_tweets = pd.read_csv('nst_preprocessed_tweets.csv')
df_tweets.shape

(22830, 9)

In [None]:
df_tweets = df_tweets.drop(['Unnamed: 0'], axis=1)
df_tweets.sample(10)

Unnamed: 0,vader_sentiment_label,vader_score,tweet,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
18832,1,0.7642,eating disorder almost four years working fall...,186,0,0,0,0
6278,0,-0.6597,teens say depression anxiety major issues amon...,145,1,0,0,0
12523,0,-0.7351,kembali lonely penat ah nak layan depression je,60,0,0,0,0
22399,0,-0.7506,brothers birthday hardest day hurts much think...,155,0,0,0,0
3434,1,0.8268,may today day make personal decision take sinc...,230,1,0,0,0
14402,1,0.2034,sorry depression tweeting hour kinda wan na de...,68,0,0,0,0
20257,0,-0.5719,realizing wearing less going curing depression,72,0,0,0,0
10113,1,0.8384,agree let help get mind also around friends gr...,216,0,0,0,0
8345,0,-0.8056,contracted depression alienation,46,0,0,0,0
2027,0,-0.891,keeping spoken word poet songwriter depression...,95,1,0,0,0


# Resample minority class by duplication (Concat)

In [None]:
featured_tweets = []

for index, row in df_tweets.iterrows():
    featured_tweets.append([row['tweet'], row['tweet_length'], row['url_link'], row['pos_emoji'], row['neg_emoji'], row['profanity_word']])

np.asarray(featured_tweets).shape

(22830, 6)

In [None]:
df_tweets.tail(10)

Unnamed: 0,vader_sentiment_label,vader_score,tweet,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
22820,0,-0.6597,learn use mindfulness overcome anxiety depress...,151,1,0,0,0
22821,1,0.9442,mental health benefit iss thankful spirit thou...,234,1,0,0,0
22822,0,-0.9406,version built overnight experience pain insecu...,188,0,0,0,0
22823,1,0.6808,essence faith believing god show despite evi w...,199,0,0,0,0
22824,0,-0.1406,listened podcast first time today great also t...,195,0,0,0,0
22825,0,-0.8126,cbd depression nature works mysterious ways cb...,116,1,0,0,0
22826,0,-0.5719,depression real,18,0,0,0,0
22827,0,-0.506,even though tropical depression barry di would...,245,1,0,0,0
22828,0,-0.7906,depression depressed anti wouldepressant ssris...,83,1,0,0,0
22829,0,-0.7783,new clinical trial depression task shifting tr...,127,1,0,0,0


In [None]:
# sanity check
featured_tweets[22827]

 245,
 1,
 0,
 0,
 0]

In [None]:
from collections import Counter
from imblearn.over_sampling import RandomOverSampler

print(f"Before oversampling: {Counter(df_tweets['vader_sentiment_label'].tolist())}")

ros = RandomOverSampler(random_state=42)
X_res, y_res = ros.fit_resample(featured_tweets, df_tweets['vader_sentiment_label'])

print(f"After oversampling: {Counter(y_res)}")

Before oversampling: Counter({0: 18453, 1: 4377})
After oversampling: Counter({0: 18453, 1: 18453})


In [None]:
np.asarray(X_res).shape

(36906, 6)

In [None]:
featured_tweets_df = pd.DataFrame()

vader_sentiment_label = [label for label in y_res]
tweet = [row[0] for row in X_res]
tweet_length = [row[1] for row in X_res]
url_link = [row[2] for row in X_res]
pos_emoji = [row[3] for row in X_res]
neg_emoji = [row[4] for row in X_res]
profanity_word = [row[5] for row in X_res]

data = {'vader_sentiment_label': vader_sentiment_label,
        'tweet': tweet,
        'tweet_length': tweet_length,
        'url_link': url_link,
        'pos_emoji': pos_emoji,
        'neg_emoji': neg_emoji,
        'profanity_word': profanity_word}

featured_tweets_df = featured_tweets_df.assign(**data)

In [None]:
len(featured_tweets_df.index)

36906

# Concat Feature Addition Method

## Normalize Feature Distributions

BERT requires that distance between the values is the reflection of how similiar of these data points are. \
In order to make the numerical values convenient for the BERT model, we shift and re-scale the ranges of each feature so they all have similiar distributions with the mean at 0.

Additionally, replace binary values with float values.

In [None]:
from sklearn.preprocessing import QuantileTransformer

for col in ['tweet_length', 'profanity_word']:

    # select the column
    num_df = featured_tweets_df[col]

    # replace any empty cells with 0
    num_df.replace('', 0, inplace=True)

    # the column values are in a 1D array, but the transformer function requires
    # it to be in a 2D array. Reshape it into a column vector
    col_values = num_df.values.reshape(-1, 1)

    # create a quant. trans. that transforms the feat.s to have a normal dist.
    numerical_transformer = QuantileTransformer(output_distribution='normal')

    # apply the transformation
    col_values_norm = numerical_transformer.fit_transform(col_values)

    # replace the values with the normalized ones
    featured_tweets_df[col] = col_values_norm

In [None]:
for col in ['url_link', 'pos_emoji', 'neg_emoji']:
    featured_tweets_df[col] = featured_tweets_df[col].astype('float')

In [None]:
featured_tweets_df.sample(10)

Unnamed: 0,vader_sentiment_label,tweet,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
7084,0,like two texts phone calls got upon waking dep...,-0.540882,0.0,0.0,0.0,-5.199338
6818,0,depression steve,-1.761948,0.0,0.0,0.0,-5.199338
4217,0,days much harder get bed morning days today di...,0.161281,0.0,0.0,0.0,-5.199338
33279,1,point hope feel like failure go depression hop...,0.071571,0.0,0.0,0.0,-5.199338
18743,0,suffering ptsd depression means trying force e...,0.642543,0.0,0.0,0.0,-5.199338
4809,0,whenever friend get nails done always get some...,-0.156199,0.0,0.0,0.0,-5.199338
17007,1,gon na diactivate guess need memes drama expec...,-0.356159,0.0,0.0,0.0,-5.199338
12622,0,feel better depressed mentalhealth depression,-0.647177,1.0,0.0,0.0,-5.199338
6959,0,yeah obviously dude cured depression,-1.301191,0.0,0.0,0.0,-5.199338
27490,1,wow seems like depression hours,-1.352623,0.0,0.0,0.0,-5.199338


## Tokenize & Encode

In [None]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [None]:
max_len = 128
padding = 'post'
truncating = 'post'
dtype = 'long'

def tokenization(tweets, labels, categorical_feats, numerical_feats, maxlen=max_len, dtype=dtype, truncating=truncating, padding=padding, tokenizer=tokenizer):
    input_ids = []
    attention_masks = []

    for tweet in tweets:
        encoded_dict = tokenizer.encode_plus(
                        tweet,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length = max_len,           # Pad & truncate all sentences.
                        truncation = True,
                        padding = 'max_length',
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',     # Return pytorch tensors.
                )

        # Add the encoded sentence to the list.
        input_ids.append(encoded_dict['input_ids'])

        # And its attention mask (simply differentiates padding from non-padding).
        attention_masks.append(encoded_dict['attention_mask'])

    # Convert the lists into tensors.
    input_ids = torch.cat(input_ids, dim=0)
    attention_masks = torch.cat(attention_masks, dim=0)
    categorical_feats = torch.stack(categorical_feats, dim=0)
    numerical_feats = torch.stack(numerical_feats, dim=0)
    labels = torch.tensor(labels)

    return input_ids, attention_masks, categorical_feats, numerical_feats, labels

In [None]:
import torch

# Create tweets and labels lists
tweets = featured_tweets_df.tweet.values
labels = featured_tweets_df.vader_sentiment_label.values

categorical_feats = []
numerical_feats = []

for index, row in featured_tweets_df.iterrows():
    categorical_feat = torch.tensor([row['url_link'], row['pos_emoji'], row['neg_emoji']])
    numerical_feat = torch.tensor([row['tweet_length'], row['profanity_word']])

    categorical_feats.append(categorical_feat)
    numerical_feats.append(numerical_feat)


In [None]:
input_ids, attention_masks, categorical_feats, numerical_feats, labels = tokenization(tweets, labels, categorical_feats, numerical_feats)

In [None]:
# sanity check
print((input_ids.shape), (attention_masks.shape), (categorical_feats.shape), (numerical_feats.shape))
print(len(input_ids), len(attention_masks), len(categorical_feats), len(numerical_feats))

torch.Size([36906, 128]) torch.Size([36906, 128]) torch.Size([36906, 3]) torch.Size([36906, 2])
36906 36906 36906 36906


In [None]:
from sklearn.model_selection import train_test_split
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler

batch_size = 32
training_split = .75

def train_val_split(input_ids, attention_masks, categorical_feats, numerical_feats, labels, training_split=training_split, batch_size=batch_size):
    # Use train_test_split to split our data into train and validation sets for training
    train_inputs, validation_inputs, train_labels, validation_labels = train_test_split(input_ids, labels,
                                                                                        random_state=2018, train_size=training_split)
    train_masks, validation_masks, _, _ = train_test_split(attention_masks, input_ids,
                                                                                        random_state=2018, train_size=training_split)

    train_cat_feats, validation_cat_feats, train_num_feats, validation_num_feats = train_test_split(categorical_feats, numerical_feats,
                                                                                        random_state=2018, train_size=training_split)

    # Create an iterator of our data with torch DataLoader. This helps save on memory during training because, unlike a for loop,
    # with an iterator the entire dataset does not need to be loaded into memory
    train_data = TensorDataset(train_inputs, train_masks, train_labels, train_cat_feats, train_num_feats)
    train_sampler = RandomSampler(train_data)
    train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

    validation_data = TensorDataset(validation_inputs, validation_masks, validation_labels, validation_cat_feats, validation_num_feats)
    validation_sampler = SequentialSampler(validation_data)
    validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size)

    return train_dataloader, validation_dataloader

In [None]:
train_dataloader, validation_dataloader = train_val_split(input_ids, attention_masks, categorical_feats, numerical_feats, labels)

## Custom Classes
In order to implement our model, we need to define our own BERT class based on
`BertForSequenceClassification`. \
We named our custom class `BertConcatFeatures`. \


One component of our custom model is that we'll append the features to the CLS vector provided by the model. The resulted vector'll be fed into BERT classification.

In [None]:
from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss

from transformers import BertForSequenceClassification

class BertConcatFeatures(BertForSequenceClassification):
    """
        A model for classification which combines text, categorical and numerical
        features. The text features are processed with BERT. All features are
        concatenated into a single vector, which is fed into the BERT classifier.

        This class expects a transformers.BertConfig object, and the config object
        needs to have three additional properties manually added to it:

          'text_feat_dim' - The length of the BERT vector.
          'cat_feat_dim' - The number of categorical features.
          'numerical_feat_dim' - The number of numerical features.
    """

    def __init__(self, config):

      #BERT set-up

      # Call the constructor for the huggingface 'BertForSequenceClassification'
      # class, which will do all of the BERT-related setup. The resulting BERT
      # model is stored in 'self.bert'.
      super().__init__(config)

      # Feature combination set-up

      # Store the number of labels, which tells us this is a classification task.
      combined_feat_dim = config.text_feat_dim + \
                          config.cat_feat_dim + \
                          config.numerical_feat_dim

      # Create a batch normalizer for the numerical features.
      self.num_bn = nn.BatchNorm1d(config.numerical_feat_dim)

      # The new hidden size of the model which is changed from 768 to 773 by
      # adding 5 categorical and numerical features.
      self.classifier = nn.Linear(773, config.num_labels)

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,
        position_ids=None,
        head_mask=None,
        inputs_embeds=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        cat_feats=None,
        numerical_feats=None,
        return_dict=None):
        """
            Perform a forward pass of our model.

            This has the same inputs as 'forward' in 'BertSequenceClassification',
            but with two extra parameters:

              'cat_feats' - Tensor of categorical features.
              'numerical_feats' - Tensor of numerical features.
        """

        # BERT

        # Run the text through the BERT model. Invoking 'self.bert' returns
        # outputs from the encoding layers, and not from the final classifier.

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states)

        # outputs[0] - All of the outputs embeddings from BERT
        # outputs[1] - The [CLS] token embedding, with some additional "pooling"
        #              done.
        cls = outputs[1]

        # Apply dropout to the CLS embedding for concatenation process.
        cls = self.dropout(cls)

        # Concatenate Features

        # Apply batch normalization to the numerical features.
        numerical_feats = self.num_bn(numerical_feats)

        # Object sizes:
        #             cls   [batch size x 768]
        # numerical_feats   [batch size x # numerical features]
        #        cat_feats  [batch size x # categorical features]

        # Concatenate everything into one vector.
        # 3 cat. and 2 num. features => 768 + 3 + 2 = 773
        combined_feats = torch.cat((cls, cat_feats, numerical_feats), dim=1)
        logits = self.classifier(combined_feats)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                if self.num_labels == 1:
                    loss = loss_fct(logits.squeeze(), labels.squeeze())
                else:
                    loss = loss_fct(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss_fct = BCEWithLogitsLoss()
                loss = loss_fct(logits, labels)
        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        results = {
            'loss': loss,
            'logits': logits,
            'hidden_states': outputs.hidden_states,
            'attentions': outputs.attentions
        }
        return results

## Load Model

In this section, we'll use our custom BERT class and Google's pretrained BERT model.

First, connect GPU to PyTorch

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.cuda.get_device_name(0)

'NVIDIA A100-SXM4-40GB'

In [None]:
from transformers import BertConfig

# We'll need to use a "BertConfig" object from the transformers library
# to specify our parameters.
config = BertConfig.from_pretrained(
          'bert-base-uncased',
          num_labels=2)

# To set up the MLP, we need to know the combined vector length that will
# be sent into it

# Pass in the number of numerical and categorical features.
#config.numerical_feat_dim = numerical_feats.size()[1]
#config.cat_feat_dim = categorical_feats.size()[1]
config.numerical_feat_dim = 2
config.cat_feat_dim = 3

# Pass in the size of the text embedding which is 768 for the BERT-base model.
config.text_feat_dim = config.hidden_size # 768

model = BertConcatFeatures.from_pretrained(
        'bert-base-uncased',
        config=config)

# Tell pytorch to run this model on the GPU
desc = model.cuda()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertConcatFeatures were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'num_bn.bias', 'num_bn.num_batches_tracked', 'num_bn.running_mean', 'num_bn.running_var', 'num_bn.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Training parameters

Now that our data is properly formatted and custom class is implemented, it's time to fine-tune the BERT model with additional featuers.

- First block - we need to create the optimizer, passing it the weights from our BERT model.
- Second block - the learning rate scheduler will implement learning rate decay for us.
- Third block - define a helper function for calculating simple accuracy.

In [None]:
from transformers import AdamW

batch_size = 16
learning_rate = 2e-5 # try 3e-3 later
epochs = 4


optimizer = AdamW(model.parameters(),
                  lr = learning_rate,
                  eps = 1e-8 # a very small number to prevent any division by zero in the implementation
                  )



In [None]:
from transformers import get_linear_schedule_with_warmup

# Total number of training steps is [number of batches] x [number of epochs].
# (Note that this is not the same as the number of training samples!)
total_steps = 1846 * epochs

"""
  Create a schedule with a learning rate that decreases linearly from the
  initial lr set in the optimizer to 0, after a warmup period during which it
  increases linearly from 0 to the initial lr set in the optimizer.
"""
scheduler = get_linear_schedule_with_warmup(optimizer,
                                            num_warmup_steps = 0, # Default value in run_glue.py
                                            num_training_steps = total_steps)

In [None]:
# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

## BERT-Concat Holdout

## Training Loop
Below is our training loop. There's a lot going on, but fundamentally for each pass in our loop we have a trianing phase and a validation phase. At each pass we need to:

**Training loop:**
- Tell the model to compute gradients by setting the model in train mode
- Unpack our data inputs and labels
- Load data onto the GPU for acceleration
- Clear out the gradients calculated in the previous pass.
  - In pytorch the gradients accumulate by default (useful for things like RNNs) unless you explicitly clear them out
- Forward pass (feed input data through the network)
- Backward pass (backpropagation)
- Tell the network to update parameters with optimizer.step()
- Track variables for monitoring progress

**Evalution loop:**
- Tell the model not to compute gradients by setting the model in evaluation mode
- Unpack our data inputs and labels
- Load data onto the GPU for acceleration
- Forward pass (feed input data through the network)
- Compute loss on our validation data and track variables for monitoring progress




In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from imblearn.metrics import geometric_mean_score
from tqdm import tqdm, trange

for _ in trange(epochs, desc="Epoch"):

    # Training

    # Put the model into training mode. Don't be mislead--the call to
    # `train` just changes the *mode*, it doesn't *perform* the training.
    model.train()

    # tracking variables
    total_train_loss, num_train_steps = 0, 0
    training_stats = []

    for step, batch in enumerate(train_dataloader):
        # Unpack this training batch from our dataloader.
        # As we unpack the batch, we'll also copy each tensor to the GPU using
        # the 'to' method
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
        b_categ_feats = batch[3].to(device)
        b_numer_feats = batch[4].to(device)

        # Clear prior gradients
        model.zero_grad()

        result = model(b_input_ids,
                           token_type_ids=None,
                           attention_mask=b_input_mask,
                           labels=b_labels,
                           cat_feats = b_categ_feats,
                           numerical_feats = b_numer_feats,
                           return_dict=True)

        # Get the loss and "logits" output by the model. The "logits" are the
        # output values prior to applying an activation function like the
        # softmax.
        loss = result['loss']
        logits = result['logits']

        total_train_loss += loss.item()
        num_train_steps += 1
        loss.backward()

        # Clip the norm of the gradients to 1.0.
        # This is to help prevent the "exploding gradients" problem.
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        # Update parameters and take a step using the computed gradient.
        # The optimizer dictates the "update rule"--how the parameters are
        # modified based on their gradients, the learning rate, etc.
        optimizer.step()

        # Update the learning rate.
        scheduler.step()

    print(f"Train loss: {total_train_loss / num_train_steps}")

    # After the completion of each training epoch, measure our performance on
    # our validation set.

    model.eval()

    # Tracking variables
    total_eval_loss = 0
    total_eval_accuracy = 0

    # Tracking variables for performance evaluation
    predictions , true_labels = [], []

    for batch in validation_dataloader:
        # Unpack this training batch from our dataloader.
        # As we unpack the batch, we'll also copy each tensor to the GPU using
        # the 'to' method

        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
        b_categ_feats = batch[3].to(device)
        b_numer_feats = batch[4].to(device)

        with torch.no_grad():
            result = model(b_input_ids,
                           token_type_ids=None,
                           attention_mask=b_input_mask,
                           labels=b_labels,
                           cat_feats = b_categ_feats,
                           numerical_feats = b_numer_feats,
                           return_dict=True)

        # Get the loss and "logits" output by the model. The "logits" are the
        # output values prior to applying an activation function like the
        # softmax.
        true_labels.extend(b_labels.tolist())
        _, predicted_labels = torch.max(result["logits"], dim=1)
        predictions.extend(predicted_labels.tolist())

        loss = result['loss']
        logits = result['logits']

        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()

    print(f"\nAcc:{(accuracy_score(true_labels, predictions)).round(2)}," \
            f" Prec:{precision_score(true_labels, predictions).round(2)}," \
            f" Rec:{recall_score(true_labels, predictions).round(2)}," \
            f" F1:{f1_score(true_labels, predictions).round(2)}," \
            f" F1-micro:{f1_score(true_labels, predictions, average='micro').round(2)}," \
            f" F1-macro:{f1_score(true_labels, predictions, average='macro').round(2)}," \
            f" F1-weighted:{f1_score(true_labels, predictions, average='weighted').round(2)}," \
            f" G-mean:{geometric_mean_score(true_labels, predictions).round(2)}")

Epoch:   0%|          | 0/4 [00:00<?, ?it/s]

Train loss: 0.3954379270876074


Epoch:  25%|██▌       | 1/4 [09:20<28:02, 560.68s/it]


Acc:0.9, Prec:0.89, Rec:0.91, F1:0.9, F1-micro:0.9, F1-macro:0.9, F1-weighted:0.9, G-mean:0.9
Train loss: 0.19652042476341905


Epoch:  50%|█████     | 2/4 [18:41<18:42, 561.04s/it]


Acc:0.94, Prec:0.93, Rec:0.94, F1:0.94, F1-micro:0.94, F1-macro:0.94, F1-weighted:0.94, G-mean:0.94
Train loss: 0.1036170645665399


Epoch:  75%|███████▌  | 3/4 [28:02<09:20, 560.82s/it]


Acc:0.95, Prec:0.94, Rec:0.96, F1:0.95, F1-micro:0.95, F1-macro:0.95, F1-weighted:0.95, G-mean:0.95
Train loss: 0.06499964317557202


Epoch: 100%|██████████| 4/4 [37:23<00:00, 560.78s/it]


Acc:0.94, Prec:0.91, Rec:0.97, F1:0.94, F1-micro:0.94, F1-macro:0.94, F1-weighted:0.94, G-mean:0.94





In [None]:
print(f"\nAcc:{(accuracy_score(true_labels, predictions))}," \
            f" Prec:{precision_score(true_labels, predictions)}," \
            f" Rec:{recall_score(true_labels, predictions)}," \
            f" F1:{f1_score(true_labels, predictions)}," \
            f" F1-micro:{f1_score(true_labels, predictions, average='micro')}," \
            f" F1-macro:{f1_score(true_labels, predictions, average='macro')}," \
            f" F1-weighted:{f1_score(true_labels, predictions, average='weighted')}," \
            f" G-mean:{geometric_mean_score(true_labels, predictions)}")


Acc:0.9394169285791698, Prec:0.9140401146131805, Rec:0.969815418023887, F1:0.9411020967232115, F1-micro:0.9394169285791698, F1-macro:0.9393672929225787, F1-weighted:0.9393640966871163, G-mean:0.9389827102719246


## BERT-Concat 5-fold Cross-validation

In [None]:
featured_tweets_df.sample(10)

Unnamed: 0,vader_sentiment_label,tweet,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
20037,0,ow high ranked bad literally made depression r...,160,0,0,0,0
32881,1,honestly dbt saved bad depression like bad tho...,272,0,0,0,0
26845,1,sorryit expansi question agree medication ofte...,267,0,0,0,0
23296,1,make sure little ones look amazing summer holi...,250,1,0,0,0
17291,0,depression really stressed since home ac im en...,189,0,0,0,2
26485,1,look high national deficit increased since tru...,277,0,0,0,0
2152,0,bipolar anxiety disorder clinical depression d...,268,0,0,0,0
19007,0,level depression extreme unforgettable,52,0,0,0,0
2077,0,truedepression clouds judgement,46,1,0,0,0
14304,0,depression ever go,24,0,0,0,0


In [None]:
from sklearn.preprocessing import QuantileTransformer

for col in ['tweet_length', 'profanity_word']:

    # select the column
    num_df = featured_tweets_df[col]

    # replace any empty cells with 0
    num_df.replace('', 0, inplace=True)

    # the column values are in a 1D array, but the transformer function requires
    # it to be in a 2D array. Reshape it into a column vector
    col_values = num_df.values.reshape(-1, 1)

    # create a quant. trans. that transforms the feat.s to have a normal dist.
    numerical_transformer = QuantileTransformer(output_distribution='normal')

    # apply the transformation
    col_values_norm = numerical_transformer.fit_transform(col_values)

    # replace the values with the normalized ones
    featured_tweets_df[col] = col_values_norm

In [None]:
for col in ['url_link', 'pos_emoji', 'neg_emoji']:
    featured_tweets_df[col] = featured_tweets_df[col].astype('float')

In [None]:
featured_tweets_df.sample(10)

Unnamed: 0,vader_sentiment_label,tweet,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
29145,1,depression ot diminish persons desire connect ...,-0.11189,0.0,0.0,0.0,-5.199338
15694,0,su right depression get accused sending heaven...,-0.028859,0.0,0.0,0.0,-5.199338
9432,0,legion followers signing twitter november take...,0.664287,0.0,0.0,0.0,-5.199338
246,1,thing explain depression unexplainable sometim...,0.503497,0.0,0.0,0.0,-5.199338
31915,1,thor directed taika waititi means thor recogni...,0.884593,0.0,0.0,0.0,-5.199338
7373,0,blog post today picturebooks personify externa...,0.884593,1.0,0.0,0.0,-5.199338
218,0,problem money time gon na waste saw friend goi...,0.182928,0.0,0.0,0.0,-5.199338
25472,1,may love christ jesus upon us within us foreve...,0.451469,1.0,0.0,0.0,-5.199338
35064,1,sorry going alone mental illness isolatingi de...,0.700711,0.0,0.0,0.0,-5.199338
31653,1,haaa sober inouraya ne depression wangu,-1.324958,0.0,0.0,0.0,-5.199338


In [None]:
cat_feats, num_feats = [], []

for index, row in featured_tweets_df.iterrows():
    cat_feats.append(torch.tensor([row['url_link'], row['pos_emoji'], row['neg_emoji']]))
    num_feats.append(torch.tensor([row['tweet_length'], row['profanity_word']]))


X_res, y_res = np.asarray(featured_tweets_df['tweet']), np.asarray(featured_tweets_df['vader_sentiment_label'])
X_res_cat, X_res_num = np.asarray(cat_feats), np.asarray(num_feats)

X_res.shape, y_res.shape, X_res_cat.shape, X_res_num.shape

((36906,), (36906,), (36906, 3), (36906, 2))

In [None]:
from torch.utils.data import Dataset

# Returns tokens of the tweet, and tensors of the tokens and segment ids
class TextDataset(Dataset):
  def __init__(self, texts, categorical_feats, numerical_feats, labels):
    self.texts = texts
    self.categorical_feats = categorical_feats
    self.numerical_feats = numerical_feats
    self.labels = labels

  def __len__(self):
    return len(self.texts)

  def __getitem__(self, idx):
    text = self.texts[idx]
    categorical_feats = self.categorical_feats[idx]
    numerical_feats = self.numerical_feats[idx]
    label = self.labels[idx]

    encoding = tokenizer(text, padding='max_length', truncation=True, max_length=510, return_tensors='pt')
    input_ids = encoding['input_ids'].squeeze()
    attention_masks = encoding['attention_mask'].squeeze()
    #categorical_feats = torch.from_numpy(categorical_feats)
    #numerical_feats = torch.from_numpy(numerical_feats)
    #cat_feat = torch.stack(cat_feat, dim=0)
    #num_feat = torch.stack(num_feat, dim=0)

    return {'input_ids': input_ids, 'attention_mask': attention_masks, 'categorical_feats': categorical_feats, 'numerical_feats': numerical_feats, 'labels': torch.tensor(label)}

In [None]:
from tqdm import tqdm, trange
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from imblearn.metrics import geometric_mean_score
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler

dataset = TextDataset(X_res, X_res_cat, X_res_num, y_res)

# Define k-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Perform k-fold cross-validation
for fold, (train_indices, val_indices) in enumerate(skf.split(X_res, y_res)):

    # Split dataset into train and validation sets for the current fold
    train_dataset = torch.utils.data.Subset(dataset, train_indices)
    val_dataset =  torch.utils.data.Subset(dataset, val_indices)

    # Create data loaders
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    validation_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    # Training loop
    total_train_loss, num_train_steps = 0, 0
    print(f"Training fold: {fold+1}/{5}")
    model.to(device)
    model.train()
    for _ in trange(epochs, desc="Epoch"):
        for batch in train_loader:
            b_input_ids = batch['input_ids'].to(device)
            b_input_mask = batch['attention_mask'].to(device)
            b_labels = batch['labels'].to(device)
            b_categ_feats = batch['categorical_feats'].to(device)
            b_numer_feats = batch['numerical_feats'].to(device)

            model.zero_grad()
            result = model(b_input_ids,
                              token_type_ids=None,
                              attention_mask=b_input_mask,
                              labels=b_labels,
                              cat_feats = b_categ_feats,
                              numerical_feats = b_numer_feats,
                              return_dict = True)

            loss = result['loss']
            logits = result['logits']

            total_train_loss += loss.item()
            num_train_steps += 1

            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()
            scheduler.step()

    print(f"\nTrain loss: {total_train_loss / num_train_steps}")

    # Validation
    # Put model on evaluation mode to evaluate loss on the validation set
    model.eval()
    print(f"Evaluation fold: {fold+1}/{5}")
    # Tracking variables for performance evaluation
    predictions , true_labels = [], []
    for batch in validation_dataloader:
        # Unpack this training batch from our dataloader.
        # As we unpack the batch, we'll also copy each tensor to the GPU using
        # the 'to' method

        b_input_ids = batch['input_ids'].to(device)
        b_input_mask = batch['attention_mask'].to(device)
        b_labels = batch['labels'].to(device)
        b_categ_feats = batch['categorical_feats'].to(device)
        b_numer_feats = batch['numerical_feats'].to(device)

        with torch.no_grad():
            result = model(b_input_ids,
                           token_type_ids=None,
                           attention_mask=b_input_mask,
                           labels=b_labels,
                           cat_feats = b_categ_feats,
                           numerical_feats = b_numer_feats,
                           return_dict = True)

        # Get the loss and "logits" output by the model. The "logits" are the
        # output values prior to applying an activation function like the
        # softmax.
        true_labels.extend(b_labels.tolist())
        _, predicted_labels = torch.max(result["logits"], dim=1)
        predictions.extend(predicted_labels.tolist())

        loss = result['loss']
        logits = result['logits']

        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()

    print(f"\nAcc:{(accuracy_score(true_labels, predictions)).round(2)}," \
            f" Prec:{precision_score(true_labels, predictions).round(2)}," \
            f" Rec:{recall_score(true_labels, predictions).round(2)}," \
            f" F1:{f1_score(true_labels, predictions).round(2)}," \
            f" F1-micro:{f1_score(true_labels, predictions, average='micro').round(2)}," \
            f" F1-macro:{f1_score(true_labels, predictions, average='macro').round(2)}," \
            f" F1-weighted:{f1_score(true_labels, predictions, average='weighted').round(2)}," \
            f" G-mean:{geometric_mean_score(true_labels, predictions).round(2)}")

Training fold: 1/5


Epoch: 100%|██████████| 4/4 [38:51<00:00, 582.76s/it]



Train loss: 0.17632919709731662
Evaluation fold: 1/5

Acc:0.95, Prec:0.93, Rec:0.98, F1:0.95, F1-micro:0.95, F1-macro:0.95, F1-weighted:0.95, G-mean:0.95
Training fold: 2/5


Epoch: 100%|██████████| 4/4 [38:51<00:00, 582.95s/it]



Train loss: 0.09568219769560721
Evaluation fold: 2/5

Acc:0.99, Prec:0.99, Rec:0.99, F1:0.99, F1-micro:0.99, F1-macro:0.99, F1-weighted:0.99, G-mean:0.99
Training fold: 3/5


Epoch: 100%|██████████| 4/4 [38:52<00:00, 583.17s/it]



Train loss: 0.09233888730190516
Evaluation fold: 3/5

Acc:0.99, Prec:0.99, Rec:0.99, F1:0.99, F1-micro:0.99, F1-macro:0.99, F1-weighted:0.99, G-mean:0.99
Training fold: 4/5


Epoch: 100%|██████████| 4/4 [38:52<00:00, 583.11s/it]



Train loss: 0.09396049112432492
Evaluation fold: 4/5

Acc:0.99, Prec:1.0, Rec:0.99, F1:0.99, F1-micro:0.99, F1-macro:0.99, F1-weighted:0.99, G-mean:0.99
Training fold: 5/5


Epoch: 100%|██████████| 4/4 [38:52<00:00, 583.08s/it]



Train loss: 0.09540401712874409
Evaluation fold: 5/5

Acc:1.0, Prec:1.0, Rec:1.0, F1:1.0, F1-micro:1.0, F1-macro:1.0, F1-weighted:1.0, G-mean:1.0


In [None]:
    print(f"\nAcc:{(accuracy_score(true_labels, predictions))}," \
            f" Prec:{precision_score(true_labels, predictions)}," \
            f" Rec:{recall_score(true_labels, predictions)}," \
            f" F1:{f1_score(true_labels, predictions)}," \
            f" F1-micro:{f1_score(true_labels, predictions, average='micro')}," \
            f" F1-macro:{f1_score(true_labels, predictions, average='macro')}," \
            f" F1-weighted:{f1_score(true_labels, predictions, average='weighted')}," \
            f" G-mean:{geometric_mean_score(true_labels, predictions)}")


Acc:0.9963419590841349, Prec:0.9964769647696476, Rec:0.9962069899756164, F1:0.9963419590841348, F1-micro:0.9963419590841349, F1-macro:0.9963419590841348, F1-weighted:0.9963419590841348, G-mean:0.9963419682283835


## BERT-Unimodal Holdout

In [None]:
df_tweets = pd.read_csv('nst_preprocessed_tweets.csv')
df_tweets.shape

(22830, 9)

In [None]:
df_tweets = df_tweets.drop(['Unnamed: 0'], axis=1)
df_tweets.sample(10)

Unnamed: 0,vader_sentiment_label,vader_score,tweet,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
11934,0,-0.1531,startin day cheerfully running depression,61,0,0,0,0
14742,0,-0.7964,psychology says comparing others root cause fe...,111,0,0,0,0
1133,1,0.6124,wow way closer thought would yes depression su...,111,0,0,0,1
21407,0,-0.3818,dark humor depression cope hbu,50,0,0,0,0
147,0,-0.7783,seen izuku manga years making depression worse,79,0,0,0,0
4965,0,-0.802,depression ass every night,30,1,0,0,1
4440,0,-0.5994,wish mosquitos could suck depression outta,55,0,0,0,0
4469,0,-0.2263,think called depression lol,36,0,0,0,0
12995,0,-0.8357,depression anxiety thoughts loneliness helples...,156,0,0,0,0
21257,0,-0.5719,depression hours,16,0,0,0,0


# Concatenate Features
2 int. features:
- tweet length
- offensive word count

3 boolean features:
- pos. emoji
- neg. emoji
- URL link

In this method, our goal is to observe BERT if it performs better when the features are simply concatenated with the textual data. All of the features are concatenated with the text data (i.e., “121 [SEP] 0 [SEP] 1 [SEP] 0 [SEP] 1 [SEP] tweet…”).

First, we convert int. values to float, then string versions of these valuse will be concatenated to the tweets.



In [None]:
for col in ['url_link', 'pos_emoji', 'neg_emoji']:
    df_tweets[col] = df_tweets[col].astype('float')

tweet_feat = []

for index, row in df_tweets.iterrows():
    combined = ""

    combined += f"{row['tweet_length']} [SEP] " \
                f"{row['profanity_word']} [SEP] " \
                f"{row['url_link']} [SEP] " \
                f"{row['pos_emoji']} [SEP] " \
                f"{row['neg_emoji']} [SEP] "

    combined += row['tweet']
    tweet_feat.append(combined)

df_tweets.insert(loc = 3,
          column = 'tweet_feat',
          value = tweet_feat)

In [None]:
df_tweets.sample(10)

Unnamed: 0,vader_sentiment_label,vader_score,tweet,tweet_feat,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
5370,1,0.2732,way realize taking trying convince anything lo...,250 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP...,250,0.0,0.0,0.0,0
4902,0,-0.5719,never sai would priest os known strugles wirh ...,120 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP...,120,0.0,0.0,0.0,0
18311,1,0.4019,momma actual angel noticed extra depressioncha...,171 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP...,171,0.0,0.0,0.0,0
753,0,-0.9656,stigma illness cost medication isolation depre...,236 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP...,236,0.0,0.0,0.0,0
12314,0,-0.743,turns lied shark week weeks depression kicked,98 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP]...,98,0.0,0.0,0.0,0
3121,0,-0.6124,study finds link social socialmedia instagram ...,254 [SEP] 0 [SEP] 1.0 [SEP] 0.0 [SEP] 0.0 [SEP...,254,1.0,0.0,0.0,0
10424,0,-0.6597,depression result trying hold image world tryi...,217 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP...,217,0.0,0.0,0.0,0
11523,1,0.6625,lord godmy hope prayer love intercession twins...,167 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP...,167,0.0,0.0,0.0,0
21782,0,-0.5719,brewers gi crippling depression,36 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP]...,36,0.0,0.0,0.0,0
5875,0,-0.5267,want take depression nap,31 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP]...,31,0.0,0.0,0.0,0


In [None]:
from collections import Counter
from imblearn.over_sampling import RandomOverSampler

print(f"Before oversampling: {Counter(df_tweets['vader_sentiment_label'].tolist())}")

ros = RandomOverSampler(random_state=42)
X_res, y_res = ros.fit_resample(df_tweets[['tweet_feat']], df_tweets['vader_sentiment_label'])

print(f"After oversampling: {Counter(y_res)}")

Before oversampling: Counter({0: 18453, 1: 4377})
After oversampling: Counter({0: 18453, 1: 18453})


In [None]:
X_res['tweet_feat'][0]

'278 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP] wow dad yesterday take stupi would depression drugs anymore though absolute worst thing never need great family supporti moms sisters stance similar way'

## Tokenize & Encode

In [None]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [None]:
import torch

max_len = 128
padding = 'post'
truncating = 'post'
dtype = 'long'

def tokenization(tweets, labels, maxlen=max_len, dtype=dtype, truncating=truncating, padding=padding, tokenizer=tokenizer):
    input_ids = []
    attention_masks = []

    for tweet in tweets:
        encoded_dict = tokenizer.encode_plus(
                        tweet,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length = max_len,           # Pad & truncate all sentences.
                        truncation = True,
                        padding = 'max_length',
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',     # Return pytorch tensors.
                )

        # Add the encoded sentence to the list.
        input_ids.append(encoded_dict['input_ids'])

        # And its attention mask (simply differentiates padding from non-padding).
        attention_masks.append(encoded_dict['attention_mask'])

    # Convert the lists into tensors.
    input_ids = torch.cat(input_ids, dim=0)
    attention_masks = torch.cat(attention_masks, dim=0)
    labels = torch.tensor(labels)

    return input_ids, attention_masks, labels

In [None]:
# Create tweets and labels lists
tweets = X_res.tweet_feat.values
labels = y_res

input_ids, attention_masks, labels = tokenization(tweets, labels)

In [None]:
# Sanity check.
input_ids[0]

tensor([  101, 24709,   102,  1014,   102,  1014,  1012,  1014,   102,  1014,
         1012,  1014,   102,  1014,  1012,  1014,   102, 10166,  3611,  7483,
         2202, 24646,  8197,  2052,  6245,  5850,  4902,  2295,  7619,  5409,
         2518,  2196,  2342,  2307,  2155,  2490,  2072,  3566,  2015,  5208,
        11032,  2714,  2126,   102,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0])

In [None]:
from sklearn.model_selection import train_test_split
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler

batch_size = 32
training_split = .75

def train_val_split(input_ids, attention_masks, labels, training_split=training_split, batch_size=batch_size):
    # Use train_test_split to split our data into train and validation sets for training
    train_inputs, validation_inputs, train_labels, validation_labels = train_test_split(input_ids, labels,
                                                                                        random_state=2018, train_size=training_split)
    train_masks, validation_masks, _, _ = train_test_split(attention_masks, input_ids,
                                                                                        random_state=2018, train_size=training_split)

    # Create an iterator of our data with torch DataLoader. This helps save on memory during training because, unlike a for loop,
    # with an iterator the entire dataset does not need to be loaded into memory
    train_data = TensorDataset(train_inputs, train_masks, train_labels)
    train_sampler = RandomSampler(train_data)
    train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

    validation_data = TensorDataset(validation_inputs, validation_masks, validation_labels)
    validation_sampler = SequentialSampler(validation_data)
    validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size)

    return train_dataloader, validation_dataloader

In [None]:
train_dataloader, validation_dataloader = train_val_split(input_ids, attention_masks, labels)

# Training parameters & Model loading

Now that our data is properly formatted and custom class is implemented, it's time to fine-tune the BERT model with additional featuers.

- First block - we need to create the optimizer, passing it the weights from our BERT model.
- Second block - the learning rate scheduler will implement learning rate decay for us.
- Third block - define a helper function for calculating simple accuracy.

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.cuda.get_device_name(0)

'NVIDIA A100-SXM4-40GB'

In [None]:
from transformers import BertForSequenceClassification

# Load BertForSequenceClassification, the pretrained BERT model with a single
# linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
    num_labels = 2, # The number of output labels--2 for binary classification.
)

# Tell pytorch to run this model on the GPU.
model.cuda()

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

In [None]:
from transformers import AdamW

batch_size = 16
learning_rate = 2e-5 # try 3e-3 later
epochs = 4


optimizer = AdamW(model.parameters(),
                  lr = learning_rate,
                  eps = 1e-8 # a very small number to prevent any division by zero in the implementation
                  )



In [None]:
from transformers import get_linear_schedule_with_warmup

# Total number of training steps is [number of batches] x [number of epochs].
# (Note that this is not the same as the number of training samples!)
total_steps = len(train_dataloader) * epochs

"""
  Create a schedule with a learning rate that decreases linearly from the
  initial lr set in the optimizer to 0, after a warmup period during which it
  increases linearly from 0 to the initial lr set in the optimizer.
"""
scheduler = get_linear_schedule_with_warmup(optimizer,
                                            num_warmup_steps = 0, # Default value in run_glue.py
                                            num_training_steps = total_steps)

In [None]:
# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

# Training Loop
Below is our training loop. There's a lot going on, but fundamentally for each pass in our loop we have a trianing phase and a validation phase. At each pass we need to:

**Training loop:**
- Tell the model to compute gradients by setting the model in train mode
- Unpack our data inputs and labels
- Load data onto the GPU for acceleration
- Clear out the gradients calculated in the previous pass.
  - In pytorch the gradients accumulate by default (useful for things like RNNs) unless you explicitly clear them out
- Forward pass (feed input data through the network)
- Backward pass (backpropagation)
- Tell the network to update parameters with optimizer.step()
- Track variables for monitoring progress

**Evalution loop:**
- Tell the model not to compute gradients by setting the model in evaluation mode
- Unpack our data inputs and labels
- Load data onto the GPU for acceleration
- Forward pass (feed input data through the network)
- Compute loss on our validation data and track variables for monitoring progress




In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from imblearn.metrics import geometric_mean_score
from tqdm import tqdm, trange

for _ in trange(epochs, desc="Epoch"):

    # Training

    # Put the model into training mode. Don't be mislead--the call to
    # `train` just changes the *mode*, it doesn't *perform* the training.
    model.train()

    # tracking variables
    total_train_loss, num_train_steps = 0, 0
    training_stats = []

    for step, batch in enumerate(train_dataloader):
        # Unpack this training batch from our dataloader.
        # As we unpack the batch, we'll also copy each tensor to the GPU using
        # the 'to' method
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)

        # Clear prior gradients
        model.zero_grad()

        result = model(b_input_ids,
                           token_type_ids=None,
                           attention_mask=b_input_mask,
                           labels=b_labels)

        # Get the loss and "logits" output by the model. The "logits" are the
        # output values prior to applying an activation function like the
        # softmax.
        loss = result['loss']
        logits = result['logits']

        total_train_loss += loss.item()
        num_train_steps += 1
        loss.backward()

        # Clip the norm of the gradients to 1.0.
        # This is to help prevent the "exploding gradients" problem.
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        # Update parameters and take a step using the computed gradient.
        # The optimizer dictates the "update rule"--how the parameters are
        # modified based on their gradients, the learning rate, etc.
        optimizer.step()

        # Update the learning rate.
        scheduler.step()

    print(f"Train loss: {total_train_loss / num_train_steps}")

    # After the completion of each training epoch, measure our performance on
    # our validation set.

    model.eval()

    # Tracking variables
    total_eval_loss = 0
    total_eval_accuracy = 0

    # Tracking variables for performance evaluation
    predictions , true_labels = [], []

    for batch in validation_dataloader:
        # Unpack this training batch from our dataloader.
        # As we unpack the batch, we'll also copy each tensor to the GPU using
        # the 'to' method

        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)

        with torch.no_grad():
            result = model(b_input_ids,
                           token_type_ids=None,
                           attention_mask=b_input_mask,
                           labels=b_labels)

        # Get the loss and "logits" output by the model. The "logits" are the
        # output values prior to applying an activation function like the
        # softmax.
        true_labels.extend(b_labels.tolist())
        _, predicted_labels = torch.max(result["logits"], dim=1)
        predictions.extend(predicted_labels.tolist())

        loss = result['loss']
        logits = result['logits']

        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()

    print(f"\nAcc:{(accuracy_score(true_labels, predictions)).round(2)}," \
            f" Prec:{precision_score(true_labels, predictions).round(2)}," \
            f" Rec:{recall_score(true_labels, predictions).round(2)}," \
            f" F1:{f1_score(true_labels, predictions).round(2)}," \
            f" F1-micro:{f1_score(true_labels, predictions, average='micro').round(2)}," \
            f" F1-macro:{f1_score(true_labels, predictions, average='macro').round(2)}," \
            f" F1-weighted:{f1_score(true_labels, predictions, average='weighted').round(2)}," \
            f" G-mean:{geometric_mean_score(true_labels, predictions).round(2)}")

Epoch:   0%|          | 0/4 [00:00<?, ?it/s]

Train loss: 0.3969888021060497


Epoch:  25%|██▌       | 1/4 [02:25<07:17, 145.75s/it]


Acc:0.9, Prec:0.91, Rec:0.88, F1:0.89, F1-micro:0.9, F1-macro:0.9, F1-weighted:0.9, G-mean:0.9
Train loss: 0.20030826398230703


Epoch:  50%|█████     | 2/4 [04:49<04:49, 144.85s/it]


Acc:0.93, Prec:0.92, Rec:0.95, F1:0.93, F1-micro:0.93, F1-macro:0.93, F1-weighted:0.93, G-mean:0.93
Train loss: 0.10359077961910838


Epoch:  75%|███████▌  | 3/4 [07:14<02:24, 144.58s/it]


Acc:0.95, Prec:0.93, Rec:0.96, F1:0.95, F1-micro:0.95, F1-macro:0.95, F1-weighted:0.95, G-mean:0.95
Train loss: 0.06103888463667685


Epoch: 100%|██████████| 4/4 [09:38<00:00, 144.61s/it]


Acc:0.95, Prec:0.92, Rec:0.97, F1:0.95, F1-micro:0.95, F1-macro:0.95, F1-weighted:0.95, G-mean:0.95





In [None]:
    print(f"\nAcc:{(accuracy_score(true_labels, predictions))}," \
            f" Prec:{precision_score(true_labels, predictions)}," \
            f" Rec:{recall_score(true_labels, predictions)}," \
            f" F1:{f1_score(true_labels, predictions)}," \
            f" F1-micro:{f1_score(true_labels, predictions, average='micro')}," \
            f" F1-macro:{f1_score(true_labels, predictions, average='macro')}," \
            f" F1-weighted:{f1_score(true_labels, predictions, average='weighted')}," \
            f" G-mean:{geometric_mean_score(true_labels, predictions)}")


Acc:0.9460279614175788, Prec:0.9236641221374046, Rec:0.9722041259500543, F1:0.9473127380448583, F1-micro:0.9460279614175788, F1-macro:0.945995849137959, F1-weighted:0.9459934228768323, G-mean:0.945715239180062


## BERT-Unimodal 5-fold Cross-validation

In [None]:
df_tweets = pd.read_csv('nst_preprocessed_tweets.csv')
df_tweets.shape

(22830, 9)

In [None]:
df_tweets = df_tweets.drop(['Unnamed: 0'], axis=1)
df_tweets.sample(10)

Unnamed: 0,vader_sentiment_label,vader_score,tweet,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
22266,1,0.1901,diabetes yes bc go home frown depression pint ...,186,0,0,0,1
12897,1,0.2363,new canadian study touts positi impacts vi wou...,257,1,0,0,0
15972,0,-0.7351,heard term facebook depression going look late...,203,0,0,0,0
18467,0,-0.5719,came rude astrology late night depression thou...,84,0,0,0,0
4931,0,-0.836,depression scary word prefer sad boy energy,50,0,0,0,0
7317,1,0.3875,depression real crippling cases feeling bummed,93,0,0,0,0
3123,0,-0.6038,full disclosure heads depression hit low point...,148,0,0,0,0
20746,0,-0.6843,particularly depression super boring hear time...,100,1,0,0,0
22341,0,-0.5719,milev laboratory seeking participants particip...,175,1,0,0,0
20415,0,-0.5719,girl wan na depression buddies,38,0,0,0,0


In [None]:
for col in ['url_link', 'pos_emoji', 'neg_emoji']:
    df_tweets[col] = df_tweets[col].astype('float')

tweet_feat = []

for index, row in df_tweets.iterrows():
    combined = ""

    combined += f"{row['tweet_length']} [SEP] " \
                f"{row['profanity_word']} [SEP] " \
                f"{row['url_link']} [SEP] " \
                f"{row['pos_emoji']} [SEP] " \
                f"{row['neg_emoji']} [SEP] "

    combined += row['tweet']
    tweet_feat.append(combined)

df_tweets.insert(loc = 3,
          column = 'tweet_feat',
          value = tweet_feat)

In [None]:
df_tweets.sample(10)

Unnamed: 0,vader_sentiment_label,vader_score,tweet,tweet_feat,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
6118,1,0.1431,oomfs take meds anxiety depression think help ...,120 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP...,120,0.0,0.0,0.0,0
10512,1,0.0854,benzodiazepines alone adequate depression ther...,88 [SEP] 0 [SEP] 1.0 [SEP] 0.0 [SEP] 0.0 [SEP]...,88,1.0,0.0,0.0,0
22757,0,-0.9468,legend continues real horrors whether family m...,246 [SEP] 0 [SEP] 1.0 [SEP] 0.0 [SEP] 0.0 [SEP...,246,1.0,0.0,0.0,0
18193,0,-0.5719,depression,10 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP]...,10,0.0,0.0,0.0,0
2872,0,-0.8658,anxiety turned paranoi would depression turned...,67 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP]...,67,0.0,0.0,0.0,0
6316,1,0.5707,karens vi wouldeos keep depression right many ...,78 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP]...,78,0.0,0.0,0.0,0
13164,0,-0.8,u suffered depression sokay bro,50 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP]...,50,0.0,0.0,0.0,0
5524,1,0.7876,yes great sleep ai would depressionanxietybipo...,230 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP...,230,0.0,0.0,0.0,0
15706,0,-0.9608,im telling using causes mental illness argue d...,187 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP...,187,0.0,0.0,0.0,0
910,0,-0.6597,depression beginning overwhelm think going tak...,94 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP]...,94,0.0,0.0,0.0,0


In [None]:
from collections import Counter
from imblearn.over_sampling import RandomOverSampler

print(f"Before oversampling: {Counter(df_tweets['vader_sentiment_label'].tolist())}")

ros = RandomOverSampler(random_state=42)
X_res, y_res = ros.fit_resample(df_tweets[['tweet_feat']], df_tweets['vader_sentiment_label'])

print(f"After oversampling: {Counter(y_res)}")

Before oversampling: Counter({0: 18453, 1: 4377})
After oversampling: Counter({0: 18453, 1: 18453})


In [None]:
X_res = X_res['tweet_feat'].tolist()

# sanity check
X_res[0]

'278 [SEP] 0 [SEP] 0.0 [SEP] 0.0 [SEP] 0.0 [SEP] wow dad yesterday take stupi would depression drugs anymore though absolute worst thing never need great family supporti moms sisters stance similar way'

## Tokenize & Encode & Load Model

In [None]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.cuda.get_device_name(0)

'NVIDIA A100-SXM4-40GB'

In [None]:
from transformers import BertForSequenceClassification

# Load BertForSequenceClassification, the pretrained BERT model with a single
# linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
    num_labels = 2, # The number of output labels--2 for binary classification.
)

# Tell pytorch to run this model on the GPU.
model.cuda()

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

In [None]:
from transformers import AdamW

batch_size = 16
learning_rate = 2e-5 # try 3e-3 later
epochs = 4


optimizer = AdamW(model.parameters(),
                  lr = learning_rate,
                  eps = 1e-8 # a very small number to prevent any division by zero in the implementation
                  )



In [None]:
from transformers import get_linear_schedule_with_warmup

# Total number of training steps is [number of batches] x [number of epochs].
# (Note that this is not the same as the number of training samples!)
total_steps = 1846 * epochs

"""
  Create a schedule with a learning rate that decreases linearly from the
  initial lr set in the optimizer to 0, after a warmup period during which it
  increases linearly from 0 to the initial lr set in the optimizer.
"""
scheduler = get_linear_schedule_with_warmup(optimizer,
                                            num_warmup_steps = 0, # Default value in run_glue.py
                                            num_training_steps = total_steps)

In [None]:
# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

In [None]:
from torch.utils.data import Dataset

# Returns tokens of the tweet, and tensors of the tokens and segment ids
class TextDataset(Dataset):
  def __init__(self, texts, labels):
    self.texts = texts
    self.labels = labels

  def __len__(self):
    return len(self.texts)

  def __getitem__(self, idx):
    text = self.texts[idx]
    label = self.labels[idx]

    encoding = tokenizer(text, padding='max_length', truncation=True, max_length=510, return_tensors='pt')
    input_ids = encoding['input_ids'].squeeze()
    attention_masks = encoding['attention_mask'].squeeze()
    return {'input_ids': input_ids, 'attention_mask': attention_masks, 'labels': torch.tensor(label)}

In [None]:
from sklearn.model_selection import train_test_split
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler

from tqdm import tqdm, trange
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from imblearn.metrics import geometric_mean_score

dataset = TextDataset(X_res, y_res)

# Define k-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Perform k-fold cross-validation
for fold, (train_indices, val_indices) in enumerate(skf.split(X_res, y_res)):

    # Split dataset into train and validation sets for the current fold
    train_dataset = torch.utils.data.Subset(dataset, train_indices)
    val_dataset =  torch.utils.data.Subset(dataset, val_indices)

    # Create data loaders
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    validation_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    # Training loop
    total_train_loss, num_train_steps = 0, 0
    print(f"Training fold: {fold+1}/{5}")
    model.to(device)
    model.train()
    for _ in trange(epochs, desc="Epoch"):
        for batch in train_loader:
            b_input_ids = batch['input_ids'].to(device)
            b_input_mask = batch['attention_mask'].to(device)
            b_labels = batch['labels'].to(device)

            model.zero_grad()
            result = model(b_input_ids,
                              token_type_ids=None,
                              attention_mask=b_input_mask,
                              labels=b_labels)

            loss = result['loss']
            logits = result['logits']

            total_train_loss += loss.item()
            num_train_steps += 1

            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()
            scheduler.step()

    print(f"\nTrain loss: {total_train_loss / num_train_steps}")

    # Validation
    # Put model on evaluation mode to evaluate loss on the validation set
    model.eval()
    print(f"Evaluation fold: {fold+1}/{5}")
    # Tracking variables for performance evaluation
    predictions , true_labels = [], []
    for batch in validation_dataloader:
        # Unpack this training batch from our dataloader.
        # As we unpack the batch, we'll also copy each tensor to the GPU using
        # the 'to' method

        b_input_ids = batch['input_ids'].to(device)
        b_input_mask = batch['attention_mask'].to(device)
        b_labels = batch['labels'].to(device)

        with torch.no_grad():
            result = model(b_input_ids,
                           token_type_ids=None,
                           attention_mask=b_input_mask,
                           labels=b_labels)

        # Get the loss and "logits" output by the model. The "logits" are the
        # output values prior to applying an activation function like the
        # softmax.
        true_labels.extend(b_labels.tolist())
        _, predicted_labels = torch.max(result["logits"], dim=1)
        predictions.extend(predicted_labels.tolist())

        loss = result['loss']
        logits = result['logits']

        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()

    print(f"\nAcc:{(accuracy_score(true_labels, predictions)).round(2)}," \
            f" Prec:{precision_score(true_labels, predictions).round(2)}," \
            f" Rec:{recall_score(true_labels, predictions).round(2)}," \
            f" F1:{f1_score(true_labels, predictions).round(2)}," \
            f" F1-micro:{f1_score(true_labels, predictions, average='micro').round(2)}," \
            f" F1-macro:{f1_score(true_labels, predictions, average='macro').round(2)}," \
            f" F1-weighted:{f1_score(true_labels, predictions, average='weighted').round(2)}," \
            f" G-mean:{geometric_mean_score(true_labels, predictions).round(2)}")

Training fold: 1/5


Epoch: 100%|██████████| 4/4 [39:16<00:00, 589.05s/it]



Train loss: 0.17544630119375845
Evaluation fold: 1/5

Acc:0.95, Prec:0.93, Rec:0.98, F1:0.95, F1-micro:0.95, F1-macro:0.95, F1-weighted:0.95, G-mean:0.95
Training fold: 2/5


Epoch: 100%|██████████| 4/4 [39:15<00:00, 588.84s/it]



Train loss: 0.09262096157356078
Evaluation fold: 2/5

Acc:1.0, Prec:1.0, Rec:1.0, F1:1.0, F1-micro:1.0, F1-macro:1.0, F1-weighted:1.0, G-mean:1.0
Training fold: 3/5


Epoch: 100%|██████████| 4/4 [39:16<00:00, 589.02s/it]



Train loss: 0.09135288633219966
Evaluation fold: 3/5

Acc:0.99, Prec:0.99, Rec:0.99, F1:0.99, F1-micro:0.99, F1-macro:0.99, F1-weighted:0.99, G-mean:0.99
Training fold: 4/5


Epoch: 100%|██████████| 4/4 [39:15<00:00, 588.90s/it]



Train loss: 0.09051583521282118
Evaluation fold: 4/5

Acc:0.99, Prec:0.99, Rec:0.99, F1:0.99, F1-micro:0.99, F1-macro:0.99, F1-weighted:0.99, G-mean:0.99
Training fold: 5/5


Epoch: 100%|██████████| 4/4 [39:14<00:00, 588.73s/it]



Train loss: 0.09082590979423291
Evaluation fold: 5/5

Acc:0.99, Prec:0.99, Rec:0.99, F1:0.99, F1-micro:0.99, F1-macro:0.99, F1-weighted:0.99, G-mean:0.99


In [None]:
print(f"\nAcc:{(accuracy_score(true_labels, predictions))}," \
            f" Prec:{precision_score(true_labels, predictions)}," \
            f" Rec:{recall_score(true_labels, predictions)}," \
            f" F1:{f1_score(true_labels, predictions)}," \
            f" F1-micro:{f1_score(true_labels, predictions, average='micro')}," \
            f" F1-macro:{f1_score(true_labels, predictions, average='macro')}," \
            f" F1-weighted:{f1_score(true_labels, predictions, average='weighted')}," \
            f" G-mean:{geometric_mean_score(true_labels, predictions)}")


Acc:0.9936322991464571, Prec:0.99269875608437, Rec:0.9945814142508805, F1:0.993639193395588, F1-micro:0.9936322991464571, F1-macro:0.9936322916659209, F1-weighted:0.993632292600988, G-mean:0.9936317171208643


In [None]:
from google.colab import runtime
runtime.unassign()