# Data Balance Methods

In this notebook, we attempt to decide which data sampling technique is the most appropriate for identifying depression from Twitter data regarding F1 metrics with G-mean metric.

Three different methods are applied:
- SVM-SMOTE
  - tweets => BERT (tokenizer, encoder layers) => embedding vectors (768)
  - embedding vectors => SVM-SMOTE => re-sampled embedding vectors
  - re-sampled embedding vectors => SVM => results

- Straightforward Oversampling
  - tweets => BERT (tokenizer, encoder layers) => embedding vectors (768)
  - embedding vectors => RandomOverSampler (duplicate minority) => re-sampled embedding vectors
  - re-sampled embedding vectors => SVM => results

- Straightforward Undersampling
  - tweets => BERT (tokenizer, encoder layers) => embedding vectors (768)
  - embedding vectors => RandomUnderSampler (reduce majority) => re-sampled embedding vectors
  - re-sampled embedding vectors => SVM => results

\
Lastly, picked the best balancing method.

- SVM-SMOTE:
  - holdout:
    - Acc: 0.81, Prec: 0.81, Rec: 0.81, F1: 0.81, F1-micro: 0.81, F1-macro: 0.81, F1-weighted: 0.81, G-mean: 0.81
  - 5-fold:
    - Acc: 0.87, Prec: 0.80, Rec: 0.97, F1: 0.88, F1-micro: 0.87, F1-macro: 0.87, F1-weighted: 0.87, G-mean: 0.86

- Oversampling
  - holdout:
    - Acc:0.75, Prec:0.75, Rec:0.76, F1:0.76, F1-micro:0.75, F1-macro:0.75, F1-weighted:0.75, G-mean:0.75
  - 5-fold:
    - Acc:0.75, Prec:0.74, Rec:0.75, F1:0.75, F1-micro:0.75, F1-macro:0.75, F1-weighted:0.75, G-mean:0.75

- Undersampling
  - holdout:
    - Acc:0.72, Prec:0.72, Rec:0.71, F1:0.71, F1-micro:0.72, F1-macro:0.72, F1-weighted:0.72, G-mean:0.72
  - 5-fold:
    - Acc:0.73, Prec:0.73, Rec:0.72, F1:0.73, F1-micro:0.73, F1-macro:0.73, F1-weighted:0.73, G-mean:0.73

In [None]:
import os

import pandas as pd
import numpy as np
from google.colab import runtime
import zipfile

In [None]:
# unzipping the zip file
with zipfile.ZipFile("nst_preprocessed_tweets.zip", 'r') as zip_ref:
    zip_ref.extractall(os.getcwd())

In [None]:
df_tweets = pd.read_csv('nst_preprocessed_tweets.csv')
df_tweets.shape

(22830, 9)

In [None]:
df_tweets.sample(10)

Unnamed: 0.1,Unnamed: 0,vader_sentiment_label,vader_score,tweet,tweet_length,url_link,pos_emoji,neg_emoji,profanity_word
21451,21565,0,-0.8442,uhm shittiest depression meal would meals eat ...,85,1,0,0,0
18463,18551,0,-0.5719,depression hit iss randomly night,42,1,0,0,0
7380,7403,0,-0.7964,relate depression sunburn burnt go next day su...,256,0,0,0,0
17765,17844,0,-0.3182,combating workplace depression highlight clien...,174,1,0,0,0
13991,14031,0,-0.8201,age every game visually accessible reasonable ...,267,0,0,0,0
8354,8378,0,-0.6633,hot girl summer depression,55,0,0,0,0
18467,18555,0,-0.5719,came rude astrology late night depression thou...,84,0,0,0,0
317,317,0,-0.8667,pilots stupi would paint body black try edgy r...,274,0,0,0,0
1005,1005,0,-0.2023,friendship able intimately understand recogniz...,104,0,0,0,0
3193,3202,0,-0.3612,depression yeah know,27,0,0,0,0


# Tokenize & Encode - TODO

In [None]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [None]:
import torch

max_len = 128
padding = 'post'
truncating = 'post'
dtype = 'long'

def tokenization(tweets, labels, maxlen=max_len, dtype=dtype, truncating=truncating, padding=padding, tokenizer=tokenizer):
    input_ids = []
    attention_masks = []

    for tweet in tweets:
        encoded_dict = tokenizer.encode_plus(
                        tweet,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length = max_len,           # Pad & truncate all sentences.
                        truncation = True,
                        padding = 'max_length',
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',     # Return pytorch tensors.
                )

        # Add the encoded sentence to the list.
        input_ids.append(encoded_dict['input_ids'])

        # And its attention mask (simply differentiates padding from non-padding).
        attention_masks.append(encoded_dict['attention_mask'])

    # Convert the lists into tensors.
    input_ids = torch.cat(input_ids, dim=0)
    attention_masks = torch.cat(attention_masks, dim=0)
    labels = torch.tensor(labels)

    return input_ids, attention_masks, labels

In [None]:
# Create tweets and labels lists
tweets = df_tweets.tweet.values
labels = df_tweets.vader_sentiment_label.values

input_ids, attention_masks, labels = tokenization(tweets, labels)

In [None]:
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler

batch_size = 64
training_split = .75

def get_dataloader(input_ids, attention_masks, labels, training_split=training_split, batch_size=batch_size):
    # Create an iterator of our data with torch DataLoader. This helps save on memory during training because, unlike a for loop,
    # with an iterator the entire dataset does not need to be loaded into memory

    data = TensorDataset(input_ids, attention_masks, labels)
    sampler = SequentialSampler(data)
    dataloader = DataLoader(data, sampler=sampler, batch_size=batch_size)

    return dataloader

In [None]:
dataloader = get_dataloader(input_ids, attention_masks, labels)

### Custom Classes
In order to implement our model, we need to define our own BERT class based on
`BertForSequenceClassification`. \
We named our custom class `BertEmbeddingVectors`. \
The aim of our custom model is to get the BERT embeddings of tweets. Then, we'll apply SVM-SMOTE on these vectors to re-sample the training set for the SVM classifier.

In [None]:
import math
import torch
import torch.nn as nn
from torch.nn import CrossEntropyLoss, MSELoss
from sklearn.svm import SVC


from transformers import BertForSequenceClassification

class BertEmbeddingVectors(BertForSequenceClassification):
    """
        A model for embedding extracting for oversampling and SVM
        classification.

        This class expects a transformers.BertConfig object and the config
        object.
    """

    def __init__(self, config):

      #BERT set-up

      # Call the constructor for the huggingface 'BertForSequenceClassification'
      # class, which will do all of the BERT-related setup. The resulting BERT
      # model is stored in 'self.bert'.
      super().__init__(config)

      # Feature combination set-up

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,
        position_ids=None,
        head_mask=None,
        inputs_embeds=None,
        labels=None,
        class_weights=None,
        output_attentions=None,
        output_hidden_states=None):
        # BERT

        # Run the text through the BERT model. Invoking 'self.bert' returns
        # outputs from the encoding layers, and not from the final classifier.

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states)

        # outputs[0] - All of the outputs embeddings from BERT
        # outputs[1] - The [CLS] token embedding, with some additional "pooling"
        #              done.
        cls = outputs[1]

        # Apply dropout to the CLS embedding for concatenation process.
        cls = self.dropout(cls)

        # np array here
        cls = cls.detach().cpu().data.numpy()
        return cls

### Load Model

In this section, we'll use our custom BERT class and Google's pretrained BERT model.

First, connect GPU to PyTorch

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#device = torch.device("cpu")
torch.cuda.get_device_name(0)

'Tesla T4'

In [None]:
from transformers import BertConfig

# We'll need to use a "BertConfig" object from the transformers library
# to specify our parameters.
config = BertConfig.from_pretrained(
          'bert-base-uncased',
          num_labels=2)

model = BertEmbeddingVectors.from_pretrained(
        'bert-base-uncased',
        config=config)

# Tell pytorch to run this model on the GPU
desc = model.cuda()



model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertEmbeddingVectors were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### BERT Embeddings & Resampling with SVM-SMOTE
...

In [None]:
def get_embeddings(dataloader):
    X, y = [], []

    for step, batch in enumerate(dataloader):
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)

        cls_head = model(b_input_ids,
                           token_type_ids=None,
                           attention_mask=b_input_mask,
                           labels=b_labels)

        labels = b_labels.to('cpu').numpy()

        X.extend(cls_head)
        y.extend(labels)

    return X, y

In [None]:
X, y = get_embeddings(dataloader)

In [None]:
X, y = np.asarray(X), np.asarray(y)
X.shape, y.shape

((22830, 768), (22830,))

# SVM-SMOTE Method


In [None]:
from collections import Counter
from imblearn.over_sampling import SVMSMOTE

print(f"Before oversampling: {Counter(y)}")

sm = SVMSMOTE(random_state=42)
X_res, y_res = sm.fit_resample(X, y)

print(f"After oversampling: {Counter(y_res)}")

Before oversampling: Counter({0: 18453, 1: 4377})
After oversampling: Counter({0: 18453, 1: 18453})


# Downloading the resampled dataset from the best balancing method.

In [None]:
# get the dataset as csv file
import csv

with open('resampled_data.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(zip(X_res, y_res))

In [None]:
from zipfile import ZipFile, ZIP_DEFLATED

with ZipFile('resampled_data.zip', 'w', ZIP_DEFLATED) as new_zipfile:
        # write source into it
        new_zipfile.write('resampled_data.csv')

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_res, y_res, test_size=0.25, random_state=42)

In [None]:
from sklearn.svm import SVC

svm_model = SVC(kernel='linear', verbose=True)
svm_model.fit(X_train, y_train)

[LibSVM]

In [None]:
X_pred = svm_model.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from imblearn.metrics import geometric_mean_score

print(f"Acc:{(accuracy_score(y_test, X_pred)).round(2)}," \
            f" Prec:{precision_score(y_test, X_pred).round(2)}," \
            f" Rec:{recall_score(y_test, X_pred).round(2)}," \
            f" F1:{f1_score(y_test, X_pred).round(2)}," \
            f" F1-micro:{f1_score(y_test, X_pred, average='micro').round(2)}," \
            f" F1-macro:{f1_score(y_test, X_pred, average='macro').round(2)}," \
            f" F1-weighted:{f1_score(y_test, X_pred, average='weighted').round(2)}," \
            f" G-mean:{geometric_mean_score(y_test, X_pred).round(2)}")


Val. Acc:0.81, Prec:0.81, Rec:0.81, F1:0.81, F1-micro:0.81, F1-macro:0.81, F1-weighted:0.81, G-mean:0.81


In [None]:
print(f"\nVal. Acc:{(accuracy_score(y_test, X_pred))}," \
            f" Prec:{precision_score(y_test, X_pred)}," \
            f" Rec:{recall_score(y_test, X_pred)}," \
            f" F1:{f1_score(y_test, X_pred)}," \
            f" F1-micro:{f1_score(y_test, X_pred, average='micro')}," \
            f" F1-macro:{f1_score(y_test, X_pred, average='macro')}," \
            f" F1-weighted:{f1_score(y_test, X_pred, average='weighted')}," \
            f" G-mean:{geometric_mean_score(y_test, X_pred)}")


Val. Acc:0.8064376286983852, Prec:0.8065615679590967, Rec:0.8117495711835334, F1:0.8091472536866852, F1-micro:0.8064376286983852, F1-macro:0.8063986048556557, F1-weighted:0.8064286919405081, G-mean:0.8063609543821526


In [None]:
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from imblearn.metrics import geometric_mean_score

gm_scorer = make_scorer(geometric_mean_score, greater_is_better=True)

scoring = {'accuracy': 'accuracy', 'precision': 'precision', 'recall': 'recall', 'f1': 'f1', 'f1_micro': 'f1_micro', 'f1_macro': 'f1_macro', 'f1_weighted': 'f1_weighted', 'g-mean': gm_scorer}

svm_cross_validation = SVC(kernel='linear')
cv_results = cross_validate(svm_cross_validation, X_res, y_res, scoring=scoring, cv=5, verbose=3)

[CV] END  accuracy: (test=0.742) f1: (test=0.721) f1_macro: (test=0.740) f1_micro: (test=0.742) f1_weighted: (test=0.740) g-mean: (test=0.738) precision: (test=0.783) recall: (test=0.669) total time= 8.5min
[CV] END  accuracy: (test=0.760) f1: (test=0.743) f1_macro: (test=0.759) f1_micro: (test=0.760) f1_weighted: (test=0.759) g-mean: (test=0.757) precision: (test=0.801) recall: (test=0.693) total time= 9.1min
[CV] END  accuracy: (test=0.762) f1: (test=0.749) f1_macro: (test=0.761) f1_micro: (test=0.762) f1_weighted: (test=0.761) g-mean: (test=0.760) precision: (test=0.791) recall: (test=0.712) total time= 9.3min
[CV] END  accuracy: (test=0.849) f1: (test=0.858) f1_macro: (test=0.848) f1_micro: (test=0.849) f1_weighted: (test=0.848) g-mean: (test=0.847) precision: (test=0.810) recall: (test=0.913) total time=10.3min
[CV] END  accuracy: (test=0.868) f1: (test=0.880) f1_macro: (test=0.866) f1_micro: (test=0.868) f1_weighted: (test=0.866) g-mean: (test=0.862) precision: (test=0.805) recal

In [None]:
for x in cv_results:
    print(f"{x}: {cv_results[x][4].round(2)}", end='\n')

fit_time: 578.56
score_time: 57.43
test_accuracy: 0.87
test_precision: 0.8
test_recall: 0.97
test_f1: 0.88
test_f1_micro: 0.87
test_f1_macro: 0.87
test_f1_weighted: 0.87
test_g-mean: 0.86


# Straightforward Oversampling
Duplicating minority class samples.

In [None]:
from collections import Counter
from imblearn.over_sampling import RandomOverSampler

print(f"Before oversampling: {Counter(y)}")

ros = RandomOverSampler(random_state=42)
X_res, y_res = ros.fit_resample(X, y)

print(f"After oversampling: {Counter(y_res)}")

Before oversampling: Counter({0: 18453, 1: 4377})
After oversampling: Counter({0: 18453, 1: 18453})


In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_res, y_res, test_size=0.25, random_state=42)

In [None]:
from sklearn.svm import SVC

svm_model = SVC(kernel='linear', verbose=True)
svm_model.fit(X_train, y_train)

[LibSVM]

In [None]:
X_pred = svm_model.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from imblearn.metrics import geometric_mean_score

print(f"Acc:{(accuracy_score(y_test, X_pred)).round(2)}," \
            f" Prec:{precision_score(y_test, X_pred).round(2)}," \
            f" Rec:{recall_score(y_test, X_pred).round(2)}," \
            f" F1:{f1_score(y_test, X_pred).round(2)}," \
            f" F1-micro:{f1_score(y_test, X_pred, average='micro').round(2)}," \
            f" F1-macro:{f1_score(y_test, X_pred, average='macro').round(2)}," \
            f" F1-weighted:{f1_score(y_test, X_pred, average='weighted').round(2)}," \
            f" G-mean:{geometric_mean_score(y_test, X_pred).round(2)}")

Acc:0.75, Prec:0.75, Rec:0.76, F1:0.76, F1-micro:0.75, F1-macro:0.75, F1-weighted:0.75, G-mean:0.75


In [None]:
print(f"Acc:{(accuracy_score(y_test, X_pred))}," \
            f" Prec:{precision_score(y_test, X_pred)}," \
            f" Rec:{recall_score(y_test, X_pred)}," \
            f" F1:{f1_score(y_test, X_pred)}," \
            f" F1-micro:{f1_score(y_test, X_pred, average='micro')}," \
            f" F1-macro:{f1_score(y_test, X_pred, average='macro')}," \
            f" F1-weighted:{f1_score(y_test, X_pred, average='weighted')}," \
            f" G-mean:{geometric_mean_score(y_test, X_pred)}")

Acc:0.7504064159531809, Prec:0.7492083597213426, Rec:0.760934819897084, F1:0.7550260610573343, F1-micro:0.7504064159531809, F1-macro:0.750317625690492, F1-weighted:0.7503691648659608, G-mean:0.750214377594774


In [None]:
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from imblearn.metrics import geometric_mean_score

gm_scorer = make_scorer(geometric_mean_score, greater_is_better=True)

scoring = {'accuracy': 'accuracy', 'precision': 'precision', 'recall': 'recall', 'f1': 'f1', 'f1_micro': 'f1_micro', 'f1_macro': 'f1_macro', 'f1_weighted': 'f1_weighted', 'g-mean': gm_scorer}

svm_cross_validation = SVC(kernel='linear')
cv_results = cross_validate(svm_cross_validation, X_res, y_res, scoring=scoring, cv=5, verbose=3)

[CV] END  accuracy: (test=0.750) f1: (test=0.754) f1_macro: (test=0.750) f1_micro: (test=0.750) f1_weighted: (test=0.750) g-mean: (test=0.750) precision: (test=0.743) recall: (test=0.765) total time=12.1min
[CV] END  accuracy: (test=0.765) f1: (test=0.768) f1_macro: (test=0.765) f1_micro: (test=0.765) f1_weighted: (test=0.765) g-mean: (test=0.765) precision: (test=0.760) recall: (test=0.777) total time=12.3min
[CV] END  accuracy: (test=0.749) f1: (test=0.750) f1_macro: (test=0.749) f1_micro: (test=0.749) f1_weighted: (test=0.749) g-mean: (test=0.749) precision: (test=0.749) recall: (test=0.751) total time=12.2min
[CV] END  accuracy: (test=0.760) f1: (test=0.763) f1_macro: (test=0.760) f1_micro: (test=0.760) f1_weighted: (test=0.760) g-mean: (test=0.760) precision: (test=0.756) recall: (test=0.769) total time=12.3min
[CV] END  accuracy: (test=0.746) f1: (test=0.748) f1_macro: (test=0.746) f1_micro: (test=0.746) f1_weighted: (test=0.746) g-mean: (test=0.746) precision: (test=0.742) recal

In [None]:
for x in cv_results:
    print(f"{x}: {cv_results[x][4].round(2)}", end='\n')

fit_time: 657.28
score_time: 63.64
test_accuracy: 0.75
test_precision: 0.74
test_recall: 0.75
test_f1: 0.75
test_f1_micro: 0.75
test_f1_macro: 0.75
test_f1_weighted: 0.75
test_g-mean: 0.75


# Straightforward Undersampling
Reduce majority class samples.

In [None]:
from collections import Counter
from imblearn.under_sampling  import RandomUnderSampler

print(f"Before oversampling: {Counter(y)}")

rus = RandomUnderSampler(random_state=42)
X_res, y_res = rus.fit_resample(X, y)

print(f"After oversampling: {Counter(y_res)}")

Before oversampling: Counter({0: 18453, 1: 4377})
After oversampling: Counter({0: 4377, 1: 4377})


In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_res, y_res, test_size=0.25, random_state=42)

In [None]:
from sklearn.svm import SVC

svm_model = SVC(kernel='linear', verbose=True)
svm_model.fit(X_train, y_train)

[LibSVM]

In [None]:
X_pred = svm_model.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from imblearn.metrics import geometric_mean_score

print(f"Acc:{(accuracy_score(y_test, X_pred)).round(2)}," \
            f" Prec:{precision_score(y_test, X_pred).round(2)}," \
            f" Rec:{recall_score(y_test, X_pred).round(2)}," \
            f" F1:{f1_score(y_test, X_pred).round(2)}," \
            f" F1-micro:{f1_score(y_test, X_pred, average='micro').round(2)}," \
            f" F1-macro:{f1_score(y_test, X_pred, average='macro').round(2)}," \
            f" F1-weighted:{f1_score(y_test, X_pred, average='weighted').round(2)}," \
            f" G-mean:{geometric_mean_score(y_test, X_pred).round(2)}")

Acc:0.72, Prec:0.72, Rec:0.71, F1:0.71, F1-micro:0.72, F1-macro:0.72, F1-weighted:0.72, G-mean:0.72


In [None]:
print(f"Acc:{(accuracy_score(y_test, X_pred))}," \
            f" Prec:{precision_score(y_test, X_pred)}," \
            f" Rec:{recall_score(y_test, X_pred)}," \
            f" F1:{f1_score(y_test, X_pred)}," \
            f" F1-micro:{f1_score(y_test, X_pred, average='micro')}," \
            f" F1-macro:{f1_score(y_test, X_pred, average='macro')}," \
            f" F1-weighted:{f1_score(y_test, X_pred, average='weighted')}," \
            f" G-mean:{geometric_mean_score(y_test, X_pred)}")

Acc:0.7190497944266788, Prec:0.7184284377923292, Rec:0.7097966728280961, F1:0.7140864714086471, F1-micro:0.7190497944266788, F1-macro:0.7189651036881584, F1-weighted:0.7190208212792902, G-mean:0.7188870992841048


In [None]:
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from imblearn.metrics import geometric_mean_score

gm_scorer = make_scorer(geometric_mean_score, greater_is_better=True)

scoring = {'accuracy': 'accuracy', 'precision': 'precision', 'recall': 'recall', 'f1': 'f1', 'f1_micro': 'f1_micro', 'f1_macro': 'f1_macro', 'f1_weighted': 'f1_weighted', 'g-mean': gm_scorer}

svm_cross_validation = SVC(kernel='linear')
cv_results = cross_validate(svm_cross_validation, X_res, y_res, scoring=scoring, cv=5, verbose=3)

[CV] END  accuracy: (test=0.727) f1: (test=0.725) f1_macro: (test=0.727) f1_micro: (test=0.727) f1_weighted: (test=0.727) g-mean: (test=0.727) precision: (test=0.729) recall: (test=0.721) total time=  18.2s
[CV] END  accuracy: (test=0.726) f1: (test=0.725) f1_macro: (test=0.726) f1_micro: (test=0.726) f1_weighted: (test=0.726) g-mean: (test=0.726) precision: (test=0.726) recall: (test=0.725) total time=  18.9s
[CV] END  accuracy: (test=0.708) f1: (test=0.705) f1_macro: (test=0.708) f1_micro: (test=0.708) f1_weighted: (test=0.708) g-mean: (test=0.708) precision: (test=0.712) recall: (test=0.699) total time=  17.7s
[CV] END  accuracy: (test=0.724) f1: (test=0.722) f1_macro: (test=0.724) f1_micro: (test=0.724) f1_weighted: (test=0.724) g-mean: (test=0.724) precision: (test=0.726) recall: (test=0.719) total time=  18.3s
[CV] END  accuracy: (test=0.730) f1: (test=0.728) f1_macro: (test=0.730) f1_micro: (test=0.730) f1_weighted: (test=0.730) g-mean: (test=0.730) precision: (test=0.735) recal

In [None]:
for x in cv_results:
    print(f"{x}: {cv_results[x][4].round(2)}", end='\n')

fit_time: 16.71
score_time: 2.24
test_accuracy: 0.73
test_precision: 0.73
test_recall: 0.72
test_f1: 0.73
test_f1_micro: 0.73
test_f1_macro: 0.73
test_f1_weighted: 0.73
test_g-mean: 0.73
