[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/georgianpartners/Multimodal-Toolkit/blob/master/notebooks/text_w_tabular_classification.ipynb)

# Training a BertWithTabular Model for Clothing Review Recommendation Prediction

This guide follows closely with the [example](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb#scrollTo=bwl3I_VGAZXb) from HuggingFace for text classificaion on the GLUE dataset.

Install `multimodal-transformers`, `kaggle`  so we can get the dataset.

## All other imports are here:

In [2]:
from dataclasses import dataclass, field
import json
import logging
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "5"
from typing import Optional

import numpy as np
import pandas as pd
from transformers import (
    AutoTokenizer,
    AutoConfig,
    Trainer,
    EvalPrediction,
    set_seed
)
from transformers.training_args import TrainingArguments

from multimodal_transformers.data import load_data_from_folder,load_data_into_folds
from multimodal_transformers.model import TabularConfig
from multimodal_transformers.model import AutoModelWithTabular

logging.basicConfig(level=logging.INFO)
os.environ['COMET_MODE'] = 'DISABLED'


  from .autonotebook import tqdm as notebook_tqdm



Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /data/chenxi/anaconda3/envs/myenv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...


  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


#### Let us take a look at what the dataset looks like

In [3]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

# Assuming your data is stored in a csv file named 'data.csv'
df = pd.read_csv(r'/data/chenxi/llm-feature-engeneering/dataset/circor.csv')
df=df.drop(columns=['Patient ID','Recording locations:','Additional ID'])
df_clean = df.copy()
df['Murmur locations'] = df['Murmur locations'].str.split('+')
locations = ['PV', 'TV', 'AV', 'MV']
for location in locations:
    df[location] = df['Murmur locations'].apply(lambda x: 1 if x is not np.nan and location in x else 0)
df.drop('Murmur locations', axis=1, inplace=True)
# 1. Map the Age feature
age_mapping = {'Neonate': 1, 'Infant': 2, 'Child': 3, 'Adolescent': 4, 'Young adult': 5}
df_clean['Age'] = df_clean['Age'].map(age_mapping)
df_clean['Age'].fillna(-1, inplace=True)

# 2. Map the Sex feature
le = LabelEncoder()
df_clean['Sex'] = le.fit_transform(df_clean['Sex'])

# 3. Map the Pregnancy status feature
df_clean['Pregnancy status'] = df_clean['Pregnancy status'].map({False: 0, True: 1})

# 4. Handle missing values in Height and Weight
df_clean['Height'].fillna((df_clean['Height'].mean()), inplace=True)
df_clean['Weight'].fillna((df_clean['Weight'].mean()), inplace=True)

# 5. Map the Murmur feature
df_clean['Murmur'] = df_clean['Murmur'].map({'Present': 1, 'Absent': 0, 'Unknown': 2})

# 6. Handle the 'Murmur locations' feature
df_clean['Murmur locations'] = df_clean['Murmur locations'].str.split('+')
locations = ['PV', 'TV', 'AV', 'MV']
for location in locations:
    df_clean[location] = df_clean['Murmur locations'].apply(lambda x: 1 if x is not np.nan and location in x else 0)
df_clean.drop('Murmur locations', axis=1, inplace=True)

# 7. Map the 'Most audible location' feature
df_clean['Most audible location'] = df_clean['Most audible location'].map({np.nan: 0, 'PV': 1, 'TV': 2, 'AV': 3, 'MV': 4})

# 8. Map the Outcome feature
df_clean['Outcome'] = df_clean['Outcome'].map({'Normal': 0, 'Abnormal': 1})

# 9. Map the Campaign feature
df_clean['Campaign'] = df_clean['Campaign'].map({'CC2014': 0, 'CC2015': 1})

# 10. Map other string features
string_features = ['Systolic murmur timing', 'Systolic murmur shape', 'Systolic murmur grading', 'Systolic murmur pitch', 'Systolic murmur quality', 
                   'Diastolic murmur timing', 'Diastolic murmur shape', 'Diastolic murmur grading', 'Diastolic murmur pitch', 'Diastolic murmur quality']
for feature in string_features:
    df_clean[feature] = df_clean[feature].astype('category')
    df_clean[feature] = df_clean[feature].cat.codes
    df_clean[feature].fillna(-1, inplace=True)

In [4]:
data_df=df_clean.copy()

In [5]:
# column = pd.read_csv('/data/chenxi/3/3/noise.csv')
# column = column.reset_index(drop=True)
# data_df = data_df.reset_index(drop=True)

# data_df['noise'] = column['noise']
column22 = pd.read_csv('/data/chenxi/llm-feature-engeneering/src/model/responses/circor/sum.csv')
column22 = column22.reset_index(drop=True)
data_df = data_df.reset_index(drop=True)

data_df['response'] = column22['sum']

In [6]:
data_df.to_csv('clean.csv')

We see that the data contains both text in the `Review Text` and `Title` column as well as tabular features in the `Division Name`, `Department Name`, and `Class Name` columns.

In [7]:
data_df.describe(include=np.object)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  data_df.describe(include=np.object)


Unnamed: 0,response
count,942
unique,941
top,"In summary, the common patterns, findings, or ..."
freq,2


In [8]:
data_df.head(5)

Unnamed: 0,Age,Sex,Height,Weight,Pregnancy status,Murmur,Most audible location,Systolic murmur timing,Systolic murmur shape,Systolic murmur grading,...,Diastolic murmur grading,Diastolic murmur pitch,Diastolic murmur quality,Outcome,Campaign,PV,TV,AV,MV,response
0,3.0,0,98.0,15.9,0,0,0,-1,-1,-1,...,-1,-1,-1,1,1,0,0,0,0,Based on the analysis of the hypothetical pati...
1,3.0,0,103.0,13.1,0,1,2,1,2,2,...,-1,-1,-1,1,1,1,1,1,1,"In summary, the common patterns, findings, or ..."
2,3.0,1,115.0,19.1,0,2,0,-1,-1,-1,...,-1,-1,-1,1,1,0,0,0,0,"In summary, the common patterns, findings, or ..."
3,3.0,1,98.0,15.9,0,1,2,1,3,0,...,-1,-1,-1,1,1,0,1,0,0,"In summary, the common patterns, findings, or ..."
4,3.0,1,87.0,11.2,0,1,1,0,3,1,...,-1,-1,-1,1,1,1,1,1,1,"In summary, the common patterns, findings, or ..."


In [56]:
# data_df['Outcome'] = data_df['Outcome'].apply(lambda x: np.random.randint(2) if x in [0, 1] else x)
# data_df.head(5)

In this demonstration, we split our data into 8:1:1 training splits. We also save our splits to `train.csv`, `val.csv`, and `test.csv` as this is the format our dataloader requires.


In [11]:
train_df, val_df, test_df = np.split(data_df.sample(frac=1), [int(.8*len(data_df)), int(.9 * len(data_df))])
print('Num examples train-val-test')
print(len(train_df), len(val_df), len(test_df))
train_df.to_csv('/data/chenxi/llm-feature-engeneering/src/Fine-tune/circor/dataset/train.csv')
val_df.to_csv('/data/chenxi/llm-feature-engeneering/src/Fine-tune/circor/dataset/val.csv')
test_df.to_csv('/data/chenxi/llm-feature-engeneering/src/Fine-tune/circor/dataset/test.csv')

Num examples train-val-test
753 94 95


In [58]:
# train_df, test_df = np.split(data_df.sample(frac=1), [int(.9*len(data_df))])
# print('Num examples train-val-test')
# print(len(train_df), len(test_df))
# train_df.to_csv('train.csv')

# test_df.to_csv('test.csv')

## We then our Experiment Parameters
We use Data Classes to hold each of our arguments for the model, data, and training.

In [12]:
@dataclass
class ModelArguments:
  """
  Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
  """

  model_name_or_path: str = field(
      metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
  )
  config_name: Optional[str] = field(
      default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
  )
  tokenizer_name: Optional[str] = field(
      default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
  )
  cache_dir: Optional[str] = field(
      default=None, metadata={"help": "Where do you want to store the pretrained models downloaded from s3"}
  )


@dataclass
class MultimodalDataTrainingArguments:
  """
  Arguments pertaining to how we combine tabular features
  Using `HfArgumentParser` we can turn this class
  into argparse arguments to be able to specify them on
  the command line.
  """

  data_path: str = field(metadata={
                            'help': 'the path to the csv file containing the dataset'
                        })
  column_info_path: str = field(
      default=None,
      metadata={
          'help': 'the path to the json file detailing which columns are text, categorical, numerical, and the label'
  })

  column_info: dict = field(
      default=None,
      metadata={
          'help': 'a dict referencing the text, categorical, numerical, and label columns'
                  'its keys are text_cols, num_cols, cat_cols, and label_col'
  })

  categorical_encode_type: str = field(default='ohe',
                                        metadata={
                                            'help': 'sklearn encoder to use for categorical data',
                                            'choices': ['ohe', 'binary', 'label', 'none']
                                        })
  numerical_transformer_method: str = field(default='yeo_johnson',
                                            metadata={
                                                'help': 'sklearn numerical transformer to preprocess numerical data',
                                                'choices': ['yeo_johnson', 'box_cox', 'quantile_normal', 'none']
                                            })
  task: str = field(default="classification",
                    metadata={
                        "help": "The downstream training task",
                        "choices": ["classification", "regression"]
                    })

  mlp_division: int = field(default=4,
                            metadata={
                                'help': 'the ratio of the number of '
                                        'hidden dims in a current layer to the next MLP layer'
                            })
  combine_feat_method: str = field(default='individual_mlps_on_cat_and_numerical_feats_then_concat',
                                    metadata={
                                        'help': 'method to combine categorical and numerical features, '
                                                'see README for all the method'
                                    })
  mlp_dropout: float = field(default=0.1,
                              metadata={
                                'help': 'dropout ratio used for MLP layers'
                              })
  numerical_bn: bool = field(default=True,
                              metadata={
                                  'help': 'whether to use batchnorm on numerical features'
                              })
  use_simple_classifier: str = field(default=True,
                                      metadata={
                                          'help': 'whether to use single layer or MLP as final classifier'
                                      })
  mlp_act: str = field(default='relu',
                        metadata={
                            'help': 'the activation function to use for finetuning layers',
                            'choices': ['relu', 'prelu', 'sigmoid', 'tanh', 'linear']
                        })
  gating_beta: float = field(default=0.2,
                              metadata={
                                  'help': "the beta hyperparameters used for gating tabular data "
                                          "see https://www.aclweb.org/anthology/2020.acl-main.214.pdf"
                              })

  def __post_init__(self):
      assert self.column_info != self.column_info_path
      if self.column_info is None and self.column_info_path:
          with open(self.column_info_path, 'r') as f:
              self.column_info = json.load(f)

### Here are the data and training parameters we will use.
For model we can specify any supported HuggingFace model classes (see README for more details) as well as any AutoModel that are from the supported model classes. For the data specifications, we need to specify a dictionary that specifies which columns are the `text` columns, `numerical feature` columns, `categorical feature` column, and the `label` column. If we are doing classification, we can also specify what each of the labels means in the label column through the `label list`. We can also specifiy these columns using a path to a json file with the argument `column_info_path` to `MultimodalDataTrainingArguments`.

In [22]:
text_cols = ['response']
cat_cols = ['Age','Sex','Pregnancy status','Murmur','PV', 'TV', 'AV', 'MV','Most audible location','Campaign','Systolic murmur timing', 'Systolic murmur shape', 'Systolic murmur grading', 'Systolic murmur pitch', 'Systolic murmur quality', 
                   'Diastolic murmur timing', 'Diastolic murmur shape', 'Diastolic murmur grading', 'Diastolic murmur pitch', 'Diastolic murmur quality']
# cat_cols=['num']
numerical_cols = ['Height','Weight']
# numerical_cols = ['num']
column_info_dict = {
    'text_cols': text_cols,
    'num_cols': numerical_cols,
    'cat_cols': cat_cols,
    'label_col': 'Outcome',
    'label_list': ["The expert cardiologist's overall diagnosis is normal", "The expert cardiologist's overall diagnosis is abnormal"]
}


model_args = ModelArguments(
    model_name_or_path='distilbert-base-uncased'
)

data_args = MultimodalDataTrainingArguments(
    # data_path='/data/chenxi/3/3/clean.csv',
    data_path='/data/chenxi/llm-feature-engeneering/src/Fine-tune/circor/dataset',
    combine_feat_method='gating_on_cat_and_num_feats_then_sum',
    column_info=column_info_dict,
    task='classification'
)

training_args = TrainingArguments(
    output_dir="/data/chenxi/llm-feature-engeneering/src/Fine-tune/circor/model",
    logging_dir="/data/chenxi/llm-feature-engeneering/src/Fine-tune/circor/runs",
    overwrite_output_dir=True,
    do_train=True,
    do_eval=True,
    num_train_epochs=60,
    per_device_train_batch_size=40,
    warmup_steps=500,
    weight_decay=0.01,
    logging_steps=20,
    evaluation_strategy="epoch",
    save_strategy="epoch",  # Change this to match the evaluation_strategy
    logging_first_step=True,
    learning_rate=2e-5,
    adafactor=True,
    gradient_accumulation_steps=1,
    lr_scheduler_type="cosine",
    load_best_model_at_end=True,
    seed=42,
    fp16=True,
)

set_seed(training_args.seed)

## Now we can load our model and data.
### We first instantiate our HuggingFace tokenizer
This is needed to prepare our custom torch dataset. See `torch_dataset.py` for details.

In [20]:
tokenizer_path_or_name = model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path
print('Specified tokenizer: ', tokenizer_path_or_name)
tokenizer = AutoTokenizer.from_pretrained(
    tokenizer_path_or_name,
    cache_dir=model_args.cache_dir,
)

Specified tokenizer:  distilbert-base-uncased


### Load dataset csvs to torch datasets
The function `load_data_from_folder` expects a path to a folder that contains `train.csv`, `test.csv`, and/or `val.csv` containing the respective split datasets.

In [62]:
# # Get Datasets
# train_dataset, val_dataset, test_dataset = load_data_into_folds(
#     data_csv_path=data_args.data_path,
#     num_splits=5,
#     validation_ratio=0.2,
#     text_cols = data_args.column_info['text_cols'],
#     tokenizer=tokenizer,
#     label_col=data_args.column_info['label_col'],
#     label_list=data_args.column_info['label_list'],
#     categorical_cols=data_args.column_info['cat_cols'],
#     numerical_cols=data_args.column_info['num_cols'],
#     sep_text_token_str=tokenizer.sep_token,
# )

In [23]:
# Get Datasets
train_dataset, val_dataset, test_dataset = load_data_from_folder(
    data_args.data_path,
    data_args.column_info['text_cols'],
    tokenizer,
    label_col=data_args.column_info['label_col'],
    label_list=data_args.column_info['label_list'],
    categorical_cols=data_args.column_info['cat_cols'],
    numerical_cols=data_args.column_info['num_cols'],
    sep_text_token_str=tokenizer.sep_token,
)

INFO:multimodal_transformers.data.data_utils:2 numerical columns
INFO:multimodal_transformers.data.data_utils:64 categorical columns
INFO:multimodal_transformers.data.data_utils:2 numerical columns
INFO:multimodal_transformers.data.load_data:Text columns: ['response']
INFO:multimodal_transformers.data.load_data:Raw text example: Based on the analysis of the hypothetical patient profiles, the common patterns, findings, or hypotheses that may provide insights into the overall cardiologist's diagnosis are as follows:

1. Absence of a murmur: The absence of a murmur in a child of this age and demographic is generally considered normal. However, further investigation is needed to determine if the absence of a murmur is consistent across similar patient profiles or if there are any variations.

2. Murmur location: The most audible location of the murmur being 'nan' suggests that there is no specific location where a murmur is more prominent. Further investigation is needed to determine if th

INFO:multimodal_transformers.data.data_utils:64 categorical columns
INFO:multimodal_transformers.data.data_utils:2 numerical columns
INFO:multimodal_transformers.data.load_data:Text columns: ['response']
INFO:multimodal_transformers.data.load_data:Raw text example: Based on the provided hypothetical patient profile, the common patterns, findings, or hypotheses that may provide insights into the overall cardiologist's diagnosis are as follows:

1. The patient being a child suggests that certain conditions or diseases commonly seen in children could be relevant to the diagnosis.

2. The absence of a murmur may indicate a normal cardiovascular system or the absence of any structural abnormalities. However, further comparison with data from other patients of similar age and demographics is needed to determine if this is a common occurrence.

3. The lack of specific information regarding the characteristics of the systolic and diastolic murmurs limits the ability to draw specific hypotheses

In [64]:
# train_dataset,_, test_dataset = load_data_from_folder(
#     data_args.data_path,
#     data_args.column_info['text_cols'],
#     tokenizer,
#     label_col=data_args.column_info['label_col'],
#     label_list=data_args.column_info['label_list'],
#     categorical_cols=data_args.column_info['cat_cols'],
#     numerical_cols=data_args.column_info['num_cols'],
#     sep_text_token_str=tokenizer.sep_token,
# )


In [24]:
num_labels = len(np.unique(train_dataset.labels))
# num_labels = 2
num_labels

2

In [25]:
config = AutoConfig.from_pretrained(
        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
    )
tabular_config = TabularConfig(num_labels=num_labels,
                               cat_feat_dim=train_dataset.cat_feats.shape[1],
                               numerical_feat_dim=train_dataset.numerical_feats.shape[1],
                               **vars(data_args))
config.tabular_config = tabular_config

In [26]:
model = AutoModelWithTabular.from_pretrained(
        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        config=config,
        cache_dir=model_args.cache_dir
    )

Some weights of DistilBertWithTabular were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['tabular_combiner.g_cat_layer.bias', 'tabular_combiner.layer_norm.bias', 'tabular_combiner.h_bias', 'tabular_combiner.num_bn.running_var', 'tabular_combiner.layer_norm.weight', 'tabular_combiner.num_bn.weight', 'tabular_combiner.num_bn.running_mean', 'pre_classifier.weight', 'classifier.weight', 'tabular_combiner.h_num_layer.weight', 'tabular_combiner.g_cat_layer.weight', 'classifier.bias', 'tabular_classifier.weight', 'pre_classifier.bias', 'tabular_combiner.num_bn.num_batches_tracked', 'tabular_combiner.g_num_layer.weight', 'tabular_classifier.bias', 'tabular_combiner.num_bn.bias', 'tabular_combiner.g_num_layer.bias', 'tabular_combiner.h_cat_layer.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### We need to define a task-specific way of computing relevant metrics:

In [27]:
import numpy as np
from scipy.special import softmax
from sklearn.metrics import (
    auc,
    precision_recall_curve,
    roc_auc_score,
    f1_score,
    confusion_matrix,
    matthews_corrcoef,
)

def calc_classification_metrics(p: EvalPrediction):
    predictions = p.predictions[0]
    pred_labels = np.argmax(predictions, axis=1)
    pred_scores = softmax(predictions, axis=1)[:, 1]
    labels = p.label_ids
    acc = (pred_labels == labels).mean() 
    if len(np.unique(labels)) == 2:  # binary classification
        roc_auc_pred_score = roc_auc_score(labels, pred_scores)
        precisions, recalls, thresholds = precision_recall_curve(labels,
                                                                    pred_scores)
        fscore = (2 * precisions * recalls) / (precisions + recalls)
        fscore[np.isnan(fscore)] = 0
        ix = np.argmax(fscore)
        threshold = thresholds[ix].item()
        pr_auc = auc(recalls, precisions)
        tn, fp, fn, tp = confusion_matrix(labels, pred_labels, labels=[0, 1]).ravel()
        result = {'roc_auc': roc_auc_pred_score,
                    'threshold': threshold,
                    'pr_auc': pr_auc,
                    'recall': recalls[ix].item(),
                    'precision': precisions[ix].item(), 'f1': fscore[ix].item(),
                    'tn': tn.item(), 'fp': fp.item(), 'fn': fn.item(), 'tp': tp.item(),
                    'acc': acc,
                    }
    else:
        acc = (pred_labels == labels).mean()
        f1 = f1_score(y_true=labels, y_pred=pred_labels)
        result = {
            "acc": acc,
            "f1": f1,
            "acc_and_f1": (acc + f1) / 2,
            "mcc": matthews_corrcoef(labels, pred_labels)
        }

    return result

In [28]:
trainer = Trainer(
    model=model.to(0),
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=calc_classification_metrics,
)

## Launching the training is as simple is doing trainer.train() 🤗

In [29]:
%%time
trainer.train()

Epoch,Training Loss,Validation Loss,Roc Auc,Threshold,Pr Auc,Recall,Precision,F1,Tn,Fp,Fn,Tp,Acc
1,0.6953,0.706195,0.746472,0.52495,0.586839,0.78125,0.531915,0.632911,0,62,0,32,0.340426
2,0.6907,0.699837,0.776714,0.519447,0.650117,0.78125,0.595238,0.675676,0,62,0,32,0.340426
3,0.6892,0.689022,0.801915,0.522782,0.7128,0.59375,0.904762,0.716981,11,51,3,29,0.425532
4,0.6893,0.685625,0.790323,0.52261,0.707672,0.75,0.727273,0.738462,10,52,3,29,0.414894
5,0.6875,0.669197,0.789819,0.528507,0.733963,0.75,0.727273,0.738462,28,34,6,26,0.574468
6,0.6805,0.629335,0.791331,0.528997,0.710379,0.75,0.774194,0.761905,44,18,7,25,0.734043
7,0.6663,0.582546,0.787802,0.498749,0.709436,0.75,0.774194,0.761905,55,7,9,23,0.829787
8,0.6591,0.524823,0.795867,0.404302,0.713457,0.71875,0.71875,0.71875,59,3,13,19,0.829787
9,0.6412,0.531072,0.782258,0.681144,0.726984,0.59375,0.904762,0.716981,56,6,11,21,0.819149
10,0.6359,0.590836,0.787298,0.788746,0.727489,0.59375,0.904762,0.716981,44,18,8,24,0.723404


  fscore = (2 * precisions * recalls) / (precisions + recalls)
  fscore = (2 * precisions * recalls) / (precisions + recalls)
  fscore = (2 * precisions * recalls) / (precisions + recalls)
  fscore = (2 * precisions * recalls) / (precisions + recalls)
  fscore = (2 * precisions * recalls) / (precisions + recalls)
  fscore = (2 * precisions * recalls) / (precisions + recalls)
  fscore = (2 * precisions * recalls) / (precisions + recalls)
  fscore = (2 * precisions * recalls) / (precisions + recalls)
  fscore = (2 * precisions * recalls) / (precisions + recalls)


CPU times: user 4min 17s, sys: 10.5 s, total: 4min 28s
Wall time: 4min 27s


TrainOutput(global_step=1140, training_loss=0.3054090081730433, metrics={'train_runtime': 267.6668, 'train_samples_per_second': 168.792, 'train_steps_per_second': 4.259, 'total_flos': 5585600433300480.0, 'train_loss': 0.3054090081730433, 'epoch': 60.0})

In [71]:
eval_results = trainer.evaluate(test_dataset)
print(eval_results)

{'eval_loss': 0.577767014503479, 'eval_roc_auc': 0.6271794871794871, 'eval_threshold': 0.2683658003807068, 'eval_pr_auc': 0.5471493492546209, 'eval_recall': 0.7333333333333333, 'eval_precision': 0.38596491228070173, 'eval_f1': 0.5057471264367815, 'eval_tn': 60, 'eval_fp': 5, 'eval_fn': 20, 'eval_tp': 10, 'eval_acc': 0.7368421052631579, 'eval_runtime': 0.1862, 'eval_samples_per_second': 510.232, 'eval_steps_per_second': 64.45, 'epoch': 60.0}
