## Finetuning ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli NLI model
- This is based on the Prof. Mihai Surdeanu's text book <a href="https://github.com/clulab/gentlenlp/blob/main/notebooks/chap13_classification_bert.ipynb">Gentle NLP Chapter 13 Classification using BERT model</a>
- Modified for NLI evaluation and analysis over SICCK dataset
- Author: Sushma Anand Akoju, Email: sushmaakoju@arizona.edu

In [1]:
!pip install datasets
!pip install transformers
!pip install sentencepiece
!pip install accelerate
!pip install 'transformers[torch]'

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.12.0-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.7,>=0.3.0 (from datasets)
  Downloading dill-0.3.6-py3-none-any.whl (110 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.5/212.5 kB[0m [31m29.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.14-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
Collec

# Text Classification Using Transformer Networks (Roberta)

Some initialization:

In [2]:
import random
import torch
import numpy as np
import pandas as pd
from tqdm.notebook import tqdm

# enable tqdm in pandas
tqdm.pandas()

# set to True to use the gpu (if there is one available)
use_gpu = True

# select device
device = torch.device('cuda' if use_gpu and torch.cuda.is_available() else 'cpu')
print(f'device: {device.type}')

# random seed
seed = 1234

# set random seed
if seed is not None:
    print(f'random seed: {seed}')
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

device: cuda
random seed: 1234


In [3]:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

Mounted at /content/drive/


In [4]:
import os
import pandas as pd
path = "/content/drive/MyDrive/Colab Notebooks/natural-logic/final-datasets/block-wise-data/blocks-dec26/data/15/"
assert os.path.exists(path), "Error"
temp = pd.read_csv(os.path.join(path,"SICCK-zero-shot-analysis-apr24.csv"))
df = temp[temp.columns[1:]]
df.head(), len(df)

(   SICK_id                                  Premise  \
 0       90      a man is jumping into an empty pool   
 1       90  every man is jumping into an empty pool   
 2       90      a man is jumping into an empty pool   
 3       90  every man is jumping into an empty pool   
 4       90   some man is jumping into an empty pool   
 
                               Hypothesis Modifier Premise/Hypothesis/Both  \
 0      a man is jumping into a full pool     NONE                    NONE   
 1      a man is jumping into a full pool    every                 Premise   
 2  every man is jumping into a full pool    every              Hypothesis   
 3  every man is jumping into a full pool    every                    Both   
 4      a man is jumping into a full pool     some                 Premise   
 
   Part of Premise/Hypothesis Modified Ground Truth           GT Modifier Type  \
 0                                NONE  Alternation  Alternation  No Modifiers   
 1                          

In [5]:
df['CompressedGT'].unique()

array(['Contradiction', 'Neutral', 'FE', 'RE'], dtype=object)

In [7]:
label2id_roberta = {
    'Contradiction': 2,
    'Neutral': 1,
    'FE': 0,
    'RE': 0,
}

Read the train/dev/test datasets and create a HuggingFace `Dataset` object:

## Load Cross validation sets

In [26]:
filenames = ["fold0.xlsx", "fold1.xlsx", "fold2.xlsx", "fold3.xlsx", "fold4.xlsx"]
path = "/content/drive/MyDrive/Colab Notebooks/natural-logic/june12"
folds = []
columns = ['Premise', 'Hypothesis', 'labels', 'CompressedGT', 'Modifier Type',
           'Modifier',	'Premise/Hypothesis/Both',	'Part of Premise/Hypothesis Modified']
for i,file in enumerate(filenames):
  train = pd.read_excel(os.path.join(path, file), sheet_name="train").rename(columns={"label4roberta":'labels'})
  test = pd.read_excel(os.path.join(path, file), sheet_name="test").rename(columns={"label4roberta":'labels'})
  print(len(train), len(test))
  folds.append({"train":train[columns], "test":test[columns]})
  print(i)

1043 261
0
1043 261
1
1043 261
2
1043 261
3
1043 261
4


### Create data splits with premise, hypothesis as well as hypothesis, premise for **Test** set predictions to label:
- Forward Entailment
- Reverse Entailment
- Neutral

In [28]:
def read_data(data):
    # concatenate title and description, and remove backslashes
    data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
    data['text'] = data['text'].str.replace('\\', ' ', regex=False)
    return data

In [29]:
def read_data_reverse(data):
    # concatenate title and description, and remove backslashes
    data['text'] = data['Hypothesis'] + " [SEP] " + data['Premise']
    data['text'] = data['text'].str.replace('\\', ' ', regex=False)
    return data

### Compute metrics for validation and test

In [30]:
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score

def compute_metrics(eval_pred):
    y_true = eval_pred.label_ids
    y_pred = np.argmax(eval_pred.predictions, axis=-1)
    return {'accuracy': accuracy_score(y_true, y_pred), 'recall': recall_score(y_true, y_pred, average='micro'),
            'f1':f1_score(y_true, y_pred, average='micro'), 'precision':precision_score(y_true, y_pred, average='micro')}
def compute_test_metrics(y_true, y_pred):
    return {'accuracy': accuracy_score(y_true, y_pred), 'recall': recall_score(y_true, y_pred, average='micro'),
            'f1':f1_score(y_true, y_pred, average='micro'), 'precision':precision_score(y_true, y_pred, average='micro')}

### To include FE, RE and Neutral label calculation and scores for **Test**

In [31]:
from sklearn.metrics import classification_report
def test_eval(trainer, ds, fold, model_name ):
  test_ds = ds['test'].map(
      tokenize,
      batched=True,
      remove_columns=['Premise', 'Hypothesis', 'text'],
  )
  rev_test_ds = ds['rev_test'].map(
      tokenize,
      batched=True,
      remove_columns=['Premise', 'Hypothesis', 'text'],
  )
  test_ds.to_pandas()
  output = trainer.predict(test_ds)
  rev_scores = trainer.predict(rev_test_ds)

  y_true = output.label_ids
  y_preds = np.argmax(output.predictions, axis=-1)
  y_rev_score_preds = np.argmax(rev_scores.predictions, axis=-1)
  labels = []


  for i in range(len(y_preds)):
        if y_preds[i] == 0:
          labels.append("FE")
        elif y_preds[i] == 2:
          labels.append("Negation")
        else:
          if y_rev_score_preds[i] == 0:
            labels.append("RE")
          else:
            labels.append("Neutral")
  print(classification_report(y_true, y_preds, labels=[0, 1, 2]))

  res = compute_test_metrics(y_true, y_preds)
  res['fold'] = fold
  res['model_name'] = model_name
  return y_true, y_preds, res, labels

In [None]:
# model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/nli-deberta-v3-base', num_labels=3)
# tokenizer = AutoTokenizer.from_pretrained('cross-encoder/nli-deberta-v3-base')

### Get this_train, this_validation & this_test set from a this_fold

In [32]:
def get_dataset(fold, model_name):

  columns = ['Premise', 'Hypothesis', 'labels']

  train_df = read_data(fold["train"][columns])
  test_df = read_data(fold["test"][columns])
  rev_test_df = read_data_reverse(fold["test"][columns])
  print(test_df.columns)

  train_df, eval_df = train_test_split(train_df, train_size=0.9)
  train_df.reset_index(inplace=True, drop=True)
  eval_df.reset_index(inplace=True, drop=True)
  test_df.reset_index(inplace=True, drop=True)
  rev_test_df.reset_index(inplace=True, drop=True)

  print(f'train rows: {len(train_df.index):,}')
  print(f'eval rows: {len(eval_df.index):,}')
  print(f'test rows: {len(test_df.index):,}')
  print(f'test rows: {len(rev_test_df.index):,}')

  ds = DatasetDict()
  ds['train'] = Dataset.from_pandas(train_df)
  ds['validation'] = Dataset.from_pandas(eval_df)
  ds['test'] = Dataset.from_pandas(test_df)
  ds['rev_test'] = Dataset.from_pandas(rev_test_df)

  print(ds)
  return ds, test_df, rev_test_df

### CustomTrainer for CrossEntropyLoss but we train for both custom and default Trainer classes in HuggingFace
- Note: we did not see any difference between the two

In [33]:

import torch
from torch import nn
from transformers import Trainer
from accelerate import Accelerator

class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get("logits")
        # compute custom loss (suppose one has 3 labels with different weights)
        loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0]))
        loss_fct.to('cuda')
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss

## Tokenize & Train one model at a time for all folds

In [34]:
from sklearn.model_selection import train_test_split
from datasets import Dataset, DatasetDict
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer

model_name = ""
model_names =["cross-encoder/nli-deberta-v3-base",	"ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli"]
# model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
# tokenizer = AutoTokenizer.from_pretrained(model_name)
output_path = "/content/drive/MyDrive/Colab Notebooks/natural-logic/june12"

def tokenize(examples):
    return tokenizer(examples['text'], truncation=True)

def train(model_name, this_path, folds):
  epochs = [4, 8]
  batch_sizes = [8,16,32]
  m = model_name.split("/")[1]
  all_scores = []
  # tokenizer = AutoTokenizer.from_pretrained(model_name)
  # model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
  # tokenizer = AutoTokenizer.from_pretrained(model_name)
  for num_epochs in epochs:
    for batch_size in batch_sizes:

      for i,fold in enumerate(folds):
          print("\n***********************************************************************************\n")
          print("\n**************** The number of epochs, batch_size and fold respectively are: ",num_epochs, batch_size, i,"************************\n")
          model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)

          ds, test_df, rev_test_df = get_dataset(fold,model_name)
          train_ds = ds['train'].map(
            tokenize, batched=True,
            remove_columns=['Premise', 'Hypothesis', 'text'],
          )
          eval_ds = ds['validation'].map(
              tokenize,
              batched=True,
              remove_columns=['Premise', 'Hypothesis', 'text'],
          )

          weight_decay = 0.01
          tx_model_name = f'{model_name}-sequence-classification'

          training_args = TrainingArguments(
              output_dir=os.path.join(output_path,m+"_"+str(num_epochs)+str(batch_size)+"trainer"),
              log_level='error',
              num_train_epochs=num_epochs,
              per_device_train_batch_size=batch_size,
              per_device_eval_batch_size=batch_size,
              evaluation_strategy='epoch',
              weight_decay=weight_decay,
          )
          trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=train_ds,
            eval_dataset=eval_ds,
            compute_metrics=compute_metrics,
            tokenizer=tokenizer,
          )
          trainer.train()

          customTrainer  = CustomTrainer(
            model=model,
            args=training_args,
            train_dataset=train_ds,
            eval_dataset=eval_ds,
            compute_metrics=compute_metrics,
            tokenizer=tokenizer,
          )

          customTrainer.train()
          y_true, y_pred, results, labels = test_eval(trainer, ds, i, model_name )
          y_true1, y_pred1, results1, labels1 = test_eval(customTrainer, ds, i, model_name )

          all_scores.append(results)
          fold["test"]["label"]= y_true
          fold["test"]["predictions"] = y_pred
          fold["test"]["predictions2"] = y_pred1
          fold["test"]["text"] = test_df['text']
          fold["test"]["pred_labels"] =  labels
          filename = "five_"+m+"_"+str(num_epochs)+"_"+str(batch_size)+"_"+str(i)+"_test.xlsx"
          fold["test"].to_csv(os.path.join(this_path, filename))
  return all_scores

In [35]:
torch.cuda.get_device_name(0)

'Tesla T4'

In [None]:
# if tokenizer:
#   del tokenizer
# if model:
#   del model

### "ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli"

In [None]:
all_scores = []
predictions = []
model_name = model_names[1]
m = model_name.split("/")[1]
this_path = os.path.join(path, m)
if not os.path.exists(this_path):
  os.mkdir(this_path)
assert os.path.exists(this_path), "%s Path does not exists!"%(this_path)

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
tokenizer = AutoTokenizer.from_pretrained(model_name)
all_scores.append(train(model_name, this_path, folds))


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 0 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 261
    })
    rev_test: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 261
    })
})


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.5890131592750549, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.642, 'eval_samples_per_second': 163.555, 'eval_steps_per_second': 21.807, 'epoch': 1.0}
{'eval_loss': 0.3840176463127136, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6603, 'eval_samples_per_second': 159.026, 'eval_steps_per_second': 21.203, 'epoch': 2.0}
{'eval_loss': 0.47739383578300476, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 0.8666666666666667, 'eval_runtime': 0.6977, 'eval_samples_per_second': 150.492, 'eval_steps_per_second': 20.066, 'epoch': 3.0}
{'eval_loss': 0.5830397605895996, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision':



{'eval_loss': 0.5080058574676514, 'eval_accuracy': 0.8, 'eval_recall': 0.8, 'eval_f1': 0.8000000000000002, 'eval_precision': 0.8, 'eval_runtime': 0.6873, 'eval_samples_per_second': 152.764, 'eval_steps_per_second': 20.368, 'epoch': 1.0}
{'eval_loss': 0.5714705586433411, 'eval_accuracy': 0.6761904761904762, 'eval_recall': 0.6761904761904762, 'eval_f1': 0.6761904761904762, 'eval_precision': 0.6761904761904762, 'eval_runtime': 0.7106, 'eval_samples_per_second': 147.76, 'eval_steps_per_second': 19.701, 'epoch': 2.0}
{'eval_loss': 0.38888320326805115, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.6938, 'eval_samples_per_second': 151.336, 'eval_steps_per_second': 20.178, 'epoch': 3.0}
{'eval_loss': 0.35347795486450195, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6985,

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.75      0.88      0.81       185
           2       0.43      0.08      0.13        76

    accuracy                           0.65       261
   macro avg       0.39      0.32      0.32       261
weighted avg       0.66      0.65      0.62       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.75      0.88      0.81       185
           2       0.43      0.08      0.13        76

    accuracy                           0.65       261
   macro avg       0.39      0.32      0.32       261
weighted avg       0.66      0.65      0.62       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 1 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise',

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.4165443778038025, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.7636, 'eval_samples_per_second': 137.514, 'eval_steps_per_second': 18.335, 'epoch': 1.0}
{'eval_loss': 0.38064175844192505, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.7458, 'eval_samples_per_second': 140.789, 'eval_steps_per_second': 18.772, 'epoch': 2.0}
{'eval_loss': 0.25935283303260803, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.7394, 'eval_samples_per_second': 142.002, 'eval_steps_per_second': 18.934, 'epoch': 3.0}
{'eval_loss': 0.24895218014717102, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precisio



{'eval_loss': 0.44127127528190613, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.7497, 'eval_samples_per_second': 140.064, 'eval_steps_per_second': 18.675, 'epoch': 1.0}
{'eval_loss': 0.428068608045578, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.737, 'eval_samples_per_second': 142.472, 'eval_steps_per_second': 18.996, 'epoch': 2.0}
{'eval_loss': 0.22887642681598663, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.7471, 'eval_samples_per_second': 140.546, 'eval_steps_per_second': 18.739, 'epoch': 3.0}
{'eval_loss': 0.2096574306488037, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.73      0.87      0.79        71
           1       0.55      0.68      0.61        50
           2       0.92      0.75      0.83       140

    accuracy                           0.77       261
   macro avg       0.73      0.77      0.74       261
weighted avg       0.80      0.77      0.78       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.73      0.87      0.79        71
           1       0.55      0.68      0.61        50
           2       0.92      0.75      0.83       140

    accuracy                           0.77       261
   macro avg       0.73      0.77      0.74       261
weighted avg       0.80      0.77      0.78       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 2 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise',

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.6655310988426208, 'eval_accuracy': 0.8380952380952381, 'eval_recall': 0.8380952380952381, 'eval_f1': 0.8380952380952381, 'eval_precision': 0.8380952380952381, 'eval_runtime': 0.6902, 'eval_samples_per_second': 152.121, 'eval_steps_per_second': 20.283, 'epoch': 1.0}
{'eval_loss': 0.47062233090400696, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6812, 'eval_samples_per_second': 154.148, 'eval_steps_per_second': 20.553, 'epoch': 2.0}
{'eval_loss': 0.33835771679878235, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6867, 'eval_samples_per_second': 152.906, 'eval_steps_per_second': 20.387, 'epoch': 3.0}
{'eval_loss': 0.3018152415752411, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision



{'eval_loss': 0.43004924058914185, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.6852, 'eval_samples_per_second': 153.244, 'eval_steps_per_second': 20.433, 'epoch': 1.0}
{'eval_loss': 0.406986266374588, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.6811, 'eval_samples_per_second': 154.168, 'eval_steps_per_second': 20.556, 'epoch': 2.0}
{'eval_loss': 0.4232579469680786, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6771, 'eval_samples_per_second': 155.07, 'eval_steps_per_second': 20.676, 'epoch': 3.0}
{'eval_loss': 0.1603800505399704, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.92      0.93      0.92        97
           1       0.82      0.88      0.85        74
           2       0.98      0.91      0.94        90

    accuracy                           0.91       261
   macro avg       0.91      0.91      0.91       261
weighted avg       0.91      0.91      0.91       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.92      0.93      0.92        97
           1       0.82      0.88      0.85        74
           2       0.98      0.91      0.94        90

    accuracy                           0.91       261
   macro avg       0.91      0.91      0.91       261
weighted avg       0.91      0.91      0.91       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 3 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise',

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.6256200671195984, 'eval_accuracy': 0.8571428571428571, 'eval_recall': 0.8571428571428571, 'eval_f1': 0.8571428571428571, 'eval_precision': 0.8571428571428571, 'eval_runtime': 0.7171, 'eval_samples_per_second': 146.419, 'eval_steps_per_second': 19.523, 'epoch': 1.0}
{'eval_loss': 0.5639622211456299, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.7127, 'eval_samples_per_second': 147.325, 'eval_steps_per_second': 19.643, 'epoch': 2.0}
{'eval_loss': 0.6740570664405823, 'eval_accuracy': 0.6761904761904762, 'eval_recall': 0.6761904761904762, 'eval_f1': 0.6761904761904762, 'eval_precision': 0.6761904761904762, 'eval_runtime': 0.7105, 'eval_samples_per_second': 147.785, 'eval_steps_per_second': 19.705, 'epoch': 3.0}
{'eval_loss': 0.3945542871952057, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision':



{'eval_loss': 0.5033578276634216, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.7027, 'eval_samples_per_second': 149.434, 'eval_steps_per_second': 19.924, 'epoch': 1.0}
{'eval_loss': 0.3887644112110138, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.7028, 'eval_samples_per_second': 149.413, 'eval_steps_per_second': 19.922, 'epoch': 2.0}
{'eval_loss': 0.3007029891014099, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.7026, 'eval_samples_per_second': 149.437, 'eval_steps_per_second': 19.925, 'epoch': 3.0}
{'eval_loss': 0.3266143202781677, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.89      0.88      0.88        74
           1       0.76      0.49      0.60        79
           2       0.76      0.96      0.85       108

    accuracy                           0.80       261
   macro avg       0.80      0.78      0.78       261
weighted avg       0.80      0.80      0.78       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.89      0.88      0.88        74
           1       0.76      0.49      0.60        79
           2       0.76      0.96      0.85       108

    accuracy                           0.80       261
   macro avg       0.80      0.78      0.78       261
weighted avg       0.80      0.80      0.78       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 4 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise',

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.3446755111217499, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.6549, 'eval_samples_per_second': 160.325, 'eval_steps_per_second': 21.377, 'epoch': 1.0}
{'eval_loss': 0.2950008511543274, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6486, 'eval_samples_per_second': 161.886, 'eval_steps_per_second': 21.585, 'epoch': 2.0}
{'eval_loss': 0.2944372594356537, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6491, 'eval_samples_per_second': 161.761, 'eval_steps_per_second': 21.568, 'epoch': 3.0}
{'eval_loss': 0.18558551371097565, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision'



{'eval_loss': 0.516435444355011, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6592, 'eval_samples_per_second': 159.273, 'eval_steps_per_second': 21.236, 'epoch': 1.0}
{'eval_loss': 0.2006070762872696, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.6487, 'eval_samples_per_second': 161.871, 'eval_steps_per_second': 21.583, 'epoch': 2.0}
{'eval_loss': 0.2411898672580719, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.6532, 'eval_samples_per_second': 160.741, 'eval_steps_per_second': 21.432, 'epoch': 3.0}
{'eval_loss': 0.15696527063846588, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.52      0.94      0.67        47
           1       0.98      0.75      0.85       205
           2       0.43      1.00      0.60         9

    accuracy                           0.79       261
   macro avg       0.64      0.89      0.71       261
weighted avg       0.88      0.79      0.81       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.52      0.94      0.67        47
           1       0.98      0.75      0.85       205
           2       0.43      1.00      0.60         9

    accuracy                           0.79       261
   macro avg       0.64      0.89      0.71       261
weighted avg       0.88      0.79      0.81       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 0 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.5235505104064941, 'eval_accuracy': 0.8571428571428571, 'eval_recall': 0.8571428571428571, 'eval_f1': 0.8571428571428571, 'eval_precision': 0.8571428571428571, 'eval_runtime': 0.7007, 'eval_samples_per_second': 149.847, 'eval_steps_per_second': 9.99, 'epoch': 1.0}
{'eval_loss': 0.4843621850013733, 'eval_accuracy': 0.819047619047619, 'eval_recall': 0.819047619047619, 'eval_f1': 0.819047619047619, 'eval_precision': 0.819047619047619, 'eval_runtime': 0.6919, 'eval_samples_per_second': 151.764, 'eval_steps_per_second': 10.118, 'epoch': 2.0}
{'eval_loss': 0.3657682538032532, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.684, 'eval_samples_per_second': 153.508, 'eval_steps_per_second': 10.234, 'epoch': 3.0}
{'eval_loss': 0.46389830112457275, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.895



{'eval_loss': 0.5230678915977478, 'eval_accuracy': 0.8380952380952381, 'eval_recall': 0.8380952380952381, 'eval_f1': 0.8380952380952381, 'eval_precision': 0.8380952380952381, 'eval_runtime': 0.7012, 'eval_samples_per_second': 149.75, 'eval_steps_per_second': 9.983, 'epoch': 1.0}
{'eval_loss': 0.42181718349456787, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.6969, 'eval_samples_per_second': 150.663, 'eval_steps_per_second': 10.044, 'epoch': 2.0}
{'eval_loss': 0.4421629309654236, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.6937, 'eval_samples_per_second': 151.354, 'eval_steps_per_second': 10.09, 'epoch': 3.0}
{'eval_loss': 0.5210988521575928, 'eval_accuracy': 0.8285714285714286, 'eval_recall': 0.8285714285714286, 'eval_f1': 0.8285714285714286, 'eval_precision': 0

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.80      0.92      0.86       185
           2       0.77      0.45      0.57        76

    accuracy                           0.78       261
   macro avg       0.52      0.46      0.47       261
weighted avg       0.79      0.78      0.77       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.80      0.92      0.86       185
           2       0.77      0.45      0.57        76

    accuracy                           0.78       261
   macro avg       0.52      0.46      0.47       261
weighted avg       0.79      0.78      0.77       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 1 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.27748194336891174, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.6923, 'eval_samples_per_second': 151.677, 'eval_steps_per_second': 10.112, 'epoch': 1.0}
{'eval_loss': 0.3059042692184448, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6935, 'eval_samples_per_second': 151.411, 'eval_steps_per_second': 10.094, 'epoch': 2.0}
{'eval_loss': 0.13253380358219147, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision': 0.9714285714285714, 'eval_runtime': 0.6751, 'eval_samples_per_second': 155.534, 'eval_steps_per_second': 10.369, 'epoch': 3.0}
{'eval_loss': 0.17171114683151245, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precisio



{'eval_loss': 0.2033218890428543, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.6892, 'eval_samples_per_second': 152.346, 'eval_steps_per_second': 10.156, 'epoch': 1.0}
{'eval_loss': 0.285442054271698, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6815, 'eval_samples_per_second': 154.072, 'eval_steps_per_second': 10.271, 'epoch': 2.0}
{'eval_loss': 0.2941212058067322, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6829, 'eval_samples_per_second': 153.762, 'eval_steps_per_second': 10.251, 'epoch': 3.0}
{'eval_loss': 0.12724056839942932, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.89      0.83      0.86        71
           1       0.55      0.90      0.68        50
           2       0.99      0.80      0.89       140

    accuracy                           0.83       261
   macro avg       0.81      0.84      0.81       261
weighted avg       0.88      0.83      0.84       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.89      0.83      0.86        71
           1       0.55      0.90      0.68        50
           2       0.99      0.80      0.89       140

    accuracy                           0.83       261
   macro avg       0.81      0.84      0.81       261
weighted avg       0.88      0.83      0.84       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 2 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.33932656049728394, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.6761, 'eval_samples_per_second': 155.295, 'eval_steps_per_second': 10.353, 'epoch': 1.0}
{'eval_loss': 0.3148868978023529, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6669, 'eval_samples_per_second': 157.45, 'eval_steps_per_second': 10.497, 'epoch': 2.0}
{'eval_loss': 0.36693301796913147, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6669, 'eval_samples_per_second': 157.455, 'eval_steps_per_second': 10.497, 'epoch': 3.0}
{'eval_loss': 0.30934518575668335, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision



{'eval_loss': 0.22306755185127258, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.674, 'eval_samples_per_second': 155.785, 'eval_steps_per_second': 10.386, 'epoch': 1.0}
{'eval_loss': 0.33366093039512634, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.6699, 'eval_samples_per_second': 156.749, 'eval_steps_per_second': 10.45, 'epoch': 2.0}
{'eval_loss': 0.13083422183990479, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.6798, 'eval_samples_per_second': 154.452, 'eval_steps_per_second': 10.297, 'epoch': 3.0}
{'eval_loss': 0.14413197338581085, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.87      0.46      0.60        97
           1       0.69      0.84      0.76        74
           2       0.62      0.82      0.71        90

    accuracy                           0.69       261
   macro avg       0.73      0.71      0.69       261
weighted avg       0.73      0.69      0.68       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.87      0.46      0.60        97
           1       0.69      0.84      0.76        74
           2       0.62      0.82      0.71        90

    accuracy                           0.69       261
   macro avg       0.73      0.71      0.69       261
weighted avg       0.73      0.69      0.68       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 3 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.3099159598350525, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.714, 'eval_samples_per_second': 147.065, 'eval_steps_per_second': 9.804, 'epoch': 1.0}
{'eval_loss': 0.3333929777145386, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6836, 'eval_samples_per_second': 153.588, 'eval_steps_per_second': 10.239, 'epoch': 2.0}
{'eval_loss': 0.4231325089931488, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.6739, 'eval_samples_per_second': 155.817, 'eval_steps_per_second': 10.388, 'epoch': 3.0}
{'eval_loss': 0.4662259519100189, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0



{'eval_loss': 0.42840978503227234, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6739, 'eval_samples_per_second': 155.802, 'eval_steps_per_second': 10.387, 'epoch': 1.0}
{'eval_loss': 0.22646135091781616, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6743, 'eval_samples_per_second': 155.717, 'eval_steps_per_second': 10.381, 'epoch': 2.0}
{'eval_loss': 0.405426025390625, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.679, 'eval_samples_per_second': 154.632, 'eval_steps_per_second': 10.309, 'epoch': 3.0}
{'eval_loss': 0.41985172033309937, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision'

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.87      0.88      0.87        74
           1       0.75      0.62      0.68        79
           2       0.83      0.93      0.87       108

    accuracy                           0.82       261
   macro avg       0.82      0.81      0.81       261
weighted avg       0.82      0.82      0.81       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.87      0.88      0.87        74
           1       0.75      0.62      0.68        79
           2       0.83      0.93      0.87       108

    accuracy                           0.82       261
   macro avg       0.82      0.81      0.81       261
weighted avg       0.82      0.82      0.81       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 4 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.2042093127965927, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.6154, 'eval_samples_per_second': 170.63, 'eval_steps_per_second': 11.375, 'epoch': 1.0}
{'eval_loss': 0.2423824667930603, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6239, 'eval_samples_per_second': 168.309, 'eval_steps_per_second': 11.221, 'epoch': 2.0}
{'eval_loss': 0.382192999124527, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.6196, 'eval_samples_per_second': 169.468, 'eval_steps_per_second': 11.298, 'epoch': 3.0}
{'eval_loss': 0.19808974862098694, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 



{'eval_loss': 0.28608253598213196, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.6137, 'eval_samples_per_second': 171.095, 'eval_steps_per_second': 11.406, 'epoch': 1.0}
{'eval_loss': 0.1642833948135376, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.6239, 'eval_samples_per_second': 168.292, 'eval_steps_per_second': 11.219, 'epoch': 2.0}
{'eval_loss': 0.1904136687517166, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.6215, 'eval_samples_per_second': 168.951, 'eval_steps_per_second': 11.263, 'epoch': 3.0}
{'eval_loss': 0.16680371761322021, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.43      0.96      0.59        47
           1       0.99      0.62      0.76       205
           2       0.32      1.00      0.49         9

    accuracy                           0.69       261
   macro avg       0.58      0.86      0.61       261
weighted avg       0.87      0.69      0.72       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.43      0.96      0.59        47
           1       0.99      0.62      0.76       205
           2       0.32      1.00      0.49         9

    accuracy                           0.69       261
   macro avg       0.58      0.86      0.61       261
weighted avg       0.87      0.69      0.72       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 0 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.5927488803863525, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.6902, 'eval_samples_per_second': 152.126, 'eval_steps_per_second': 5.795, 'epoch': 1.0}
{'eval_loss': 0.3248424828052521, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.6838, 'eval_samples_per_second': 153.563, 'eval_steps_per_second': 5.85, 'epoch': 2.0}
{'eval_loss': 0.2671297490596771, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6679, 'eval_samples_per_second': 157.204, 'eval_steps_per_second': 5.989, 'epoch': 3.0}
{'eval_loss': 0.32429125905036926, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.



{'eval_loss': 0.601683497428894, 'eval_accuracy': 0.819047619047619, 'eval_recall': 0.819047619047619, 'eval_f1': 0.819047619047619, 'eval_precision': 0.819047619047619, 'eval_runtime': 0.6817, 'eval_samples_per_second': 154.022, 'eval_steps_per_second': 5.867, 'epoch': 1.0}
{'eval_loss': 0.5154272317886353, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.6712, 'eval_samples_per_second': 156.436, 'eval_steps_per_second': 5.959, 'epoch': 2.0}
{'eval_loss': 0.42045778036117554, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6796, 'eval_samples_per_second': 154.499, 'eval_steps_per_second': 5.886, 'epoch': 3.0}
{'eval_loss': 0.47002682089805603, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.885

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.89      0.92      0.90       185
           2       0.92      0.71      0.80        76

    accuracy                           0.86       261
   macro avg       0.60      0.54      0.57       261
weighted avg       0.89      0.86      0.87       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.89      0.92      0.90       185
           2       0.92      0.71      0.80        76

    accuracy                           0.86       261
   macro avg       0.60      0.54      0.57       261
weighted avg       0.89      0.86      0.87       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 1 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.22569257020950317, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.6923, 'eval_samples_per_second': 151.66, 'eval_steps_per_second': 5.778, 'epoch': 1.0}
{'eval_loss': 0.20473919808864594, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision': 0.9714285714285714, 'eval_runtime': 0.6835, 'eval_samples_per_second': 153.613, 'eval_steps_per_second': 5.852, 'epoch': 2.0}
{'eval_loss': 0.4156845808029175, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.6741, 'eval_samples_per_second': 155.774, 'eval_steps_per_second': 5.934, 'epoch': 3.0}
{'eval_loss': 0.2372412234544754, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0



{'eval_loss': 0.3623957931995392, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6777, 'eval_samples_per_second': 154.927, 'eval_steps_per_second': 5.902, 'epoch': 1.0}
{'eval_loss': 0.39664992690086365, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6734, 'eval_samples_per_second': 155.928, 'eval_steps_per_second': 5.94, 'epoch': 2.0}
{'eval_loss': 0.20153245329856873, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.6816, 'eval_samples_per_second': 154.054, 'eval_steps_per_second': 5.869, 'epoch': 3.0}
{'eval_loss': 0.21272143721580505, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.82      0.86      0.84        71
           1       0.51      0.74      0.60        50
           2       0.97      0.79      0.87       140

    accuracy                           0.80       261
   macro avg       0.77      0.80      0.77       261
weighted avg       0.84      0.80      0.81       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.82      0.86      0.84        71
           1       0.51      0.74      0.60        50
           2       0.97      0.79      0.87       140

    accuracy                           0.80       261
   macro avg       0.77      0.80      0.77       261
weighted avg       0.84      0.80      0.81       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 2 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.3378436863422394, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6867, 'eval_samples_per_second': 152.91, 'eval_steps_per_second': 5.825, 'epoch': 1.0}
{'eval_loss': 0.2249663770198822, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6799, 'eval_samples_per_second': 154.425, 'eval_steps_per_second': 5.883, 'epoch': 2.0}
{'eval_loss': 0.2620493471622467, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.6682, 'eval_samples_per_second': 157.138, 'eval_steps_per_second': 5.986, 'epoch': 3.0}
{'eval_loss': 0.16130366921424866, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.



{'eval_loss': 0.09745469689369202, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.6743, 'eval_samples_per_second': 155.722, 'eval_steps_per_second': 5.932, 'epoch': 1.0}
{'eval_loss': 0.13603852689266205, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.6773, 'eval_samples_per_second': 155.025, 'eval_steps_per_second': 5.906, 'epoch': 2.0}
{'eval_loss': 0.11636911332607269, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision': 0.9714285714285714, 'eval_runtime': 0.6722, 'eval_samples_per_second': 156.199, 'eval_steps_per_second': 5.95, 'epoch': 3.0}
{'eval_loss': 0.12988975644111633, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.88      0.94      0.91        97
           1       0.76      0.82      0.79        74
           2       0.99      0.86      0.92        90

    accuracy                           0.88       261
   macro avg       0.88      0.87      0.87       261
weighted avg       0.88      0.88      0.88       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.88      0.94      0.91        97
           1       0.76      0.82      0.79        74
           2       0.99      0.86      0.92        90

    accuracy                           0.88       261
   macro avg       0.88      0.87      0.87       261
weighted avg       0.88      0.88      0.88       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 3 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.4133163392543793, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 0.8666666666666667, 'eval_runtime': 0.6941, 'eval_samples_per_second': 151.271, 'eval_steps_per_second': 5.763, 'epoch': 1.0}
{'eval_loss': 0.2751854956150055, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.6877, 'eval_samples_per_second': 152.688, 'eval_steps_per_second': 5.817, 'epoch': 2.0}
{'eval_loss': 0.291305273771286, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6742, 'eval_samples_per_second': 155.732, 'eval_steps_per_second': 5.933, 'epoch': 3.0}
{'eval_loss': 0.356219619512558, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.90



{'eval_loss': 0.2822924852371216, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.6836, 'eval_samples_per_second': 153.596, 'eval_steps_per_second': 5.851, 'epoch': 1.0}
{'eval_loss': 0.25550663471221924, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.678, 'eval_samples_per_second': 154.861, 'eval_steps_per_second': 5.899, 'epoch': 2.0}
{'eval_loss': 0.19816450774669647, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6782, 'eval_samples_per_second': 154.811, 'eval_steps_per_second': 5.898, 'epoch': 3.0}
{'eval_loss': 0.2701813578605652, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.92      0.91      0.91        74
           1       0.81      0.58      0.68        79
           2       0.79      0.96      0.87       108

    accuracy                           0.83       261
   macro avg       0.84      0.82      0.82       261
weighted avg       0.83      0.83      0.82       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.92      0.91      0.91        74
           1       0.81      0.58      0.68        79
           2       0.79      0.96      0.87       108

    accuracy                           0.83       261
   macro avg       0.84      0.82      0.82       261
weighted avg       0.83      0.83      0.82       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 4 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.24154213070869446, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.5991, 'eval_samples_per_second': 175.275, 'eval_steps_per_second': 6.677, 'epoch': 1.0}
{'eval_loss': 0.15237300097942352, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.6099, 'eval_samples_per_second': 172.148, 'eval_steps_per_second': 6.558, 'epoch': 2.0}
{'eval_loss': 0.16175419092178345, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.6082, 'eval_samples_per_second': 172.641, 'eval_steps_per_second': 6.577, 'epoch': 3.0}
{'eval_loss': 0.12498445808887482, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision'



{'eval_loss': 0.22375158965587616, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.6001, 'eval_samples_per_second': 174.962, 'eval_steps_per_second': 6.665, 'epoch': 1.0}
{'eval_loss': 0.1464337408542633, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6049, 'eval_samples_per_second': 173.59, 'eval_steps_per_second': 6.613, 'epoch': 2.0}
{'eval_loss': 0.12373825162649155, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.6062, 'eval_samples_per_second': 173.22, 'eval_steps_per_second': 6.599, 'epoch': 3.0}
{'eval_loss': 0.13749751448631287, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision': 0

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.43      0.94      0.59        47
           1       0.98      0.65      0.78       205
           2       0.43      1.00      0.60         9

    accuracy                           0.72       261
   macro avg       0.61      0.86      0.66       261
weighted avg       0.86      0.72      0.74       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.43      0.94      0.59        47
           1       0.98      0.65      0.78       205
           2       0.43      1.00      0.60         9

    accuracy                           0.72       261
   macro avg       0.61      0.86      0.66       261
weighted avg       0.86      0.72      0.74       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 0 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise',

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.8443788290023804, 'eval_accuracy': 0.6476190476190476, 'eval_recall': 0.6476190476190476, 'eval_f1': 0.6476190476190476, 'eval_precision': 0.6476190476190476, 'eval_runtime': 0.7047, 'eval_samples_per_second': 149.002, 'eval_steps_per_second': 19.867, 'epoch': 1.0}
{'eval_loss': 0.9287908673286438, 'eval_accuracy': 0.780952380952381, 'eval_recall': 0.780952380952381, 'eval_f1': 0.780952380952381, 'eval_precision': 0.780952380952381, 'eval_runtime': 0.6927, 'eval_samples_per_second': 151.578, 'eval_steps_per_second': 20.21, 'epoch': 2.0}
{'eval_loss': 0.6632158756256104, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.6958, 'eval_samples_per_second': 150.913, 'eval_steps_per_second': 20.122, 'epoch': 3.0}
{'eval_loss': 0.5024023056030273, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.88



{'eval_loss': 1.8113666772842407, 'eval_accuracy': 0.5142857142857142, 'eval_recall': 0.5142857142857142, 'eval_f1': 0.5142857142857142, 'eval_precision': 0.5142857142857142, 'eval_runtime': 0.6849, 'eval_samples_per_second': 153.301, 'eval_steps_per_second': 20.44, 'epoch': 1.0}
{'eval_loss': 0.47051334381103516, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 0.8666666666666667, 'eval_runtime': 0.6996, 'eval_samples_per_second': 150.095, 'eval_steps_per_second': 20.013, 'epoch': 2.0}
{'eval_loss': 0.6015001535415649, 'eval_accuracy': 0.8285714285714286, 'eval_recall': 0.8285714285714286, 'eval_f1': 0.8285714285714286, 'eval_precision': 0.8285714285714286, 'eval_runtime': 0.6863, 'eval_samples_per_second': 152.989, 'eval_steps_per_second': 20.399, 'epoch': 3.0}
{'eval_loss': 0.4434836208820343, 'eval_accuracy': 0.8571428571428571, 'eval_recall': 0.8571428571428571, 'eval_f1': 0.8571428571428571, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.83      0.59      0.69       185
           2       0.11      0.01      0.02        76

    accuracy                           0.42       261
   macro avg       0.31      0.20      0.24       261
weighted avg       0.62      0.42      0.50       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.83      0.59      0.69       185
           2       0.11      0.01      0.02        76

    accuracy                           0.42       261
   macro avg       0.31      0.20      0.24       261
weighted avg       0.62      0.42      0.50       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 1 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise',

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.3505767583847046, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.7685, 'eval_samples_per_second': 136.63, 'eval_steps_per_second': 18.217, 'epoch': 1.0}
{'eval_loss': 1.049504041671753, 'eval_accuracy': 0.5619047619047619, 'eval_recall': 0.5619047619047619, 'eval_f1': 0.5619047619047619, 'eval_precision': 0.5619047619047619, 'eval_runtime': 0.7182, 'eval_samples_per_second': 146.204, 'eval_steps_per_second': 19.494, 'epoch': 2.0}
{'eval_loss': 1.051193118095398, 'eval_accuracy': 0.5619047619047619, 'eval_recall': 0.5619047619047619, 'eval_f1': 0.5619047619047619, 'eval_precision': 0.5619047619047619, 'eval_runtime': 0.7223, 'eval_samples_per_second': 145.359, 'eval_steps_per_second': 19.381, 'epoch': 3.0}
{'eval_loss': 1.0028202533721924, 'eval_accuracy': 0.5619047619047619, 'eval_recall': 0.5619047619047619, 'eval_f1': 0.5619047619047619, 'eval_precision': 0.



{'eval_loss': 0.9724727272987366, 'eval_accuracy': 0.5619047619047619, 'eval_recall': 0.5619047619047619, 'eval_f1': 0.5619047619047619, 'eval_precision': 0.5619047619047619, 'eval_runtime': 0.7441, 'eval_samples_per_second': 141.115, 'eval_steps_per_second': 18.815, 'epoch': 1.0}
{'eval_loss': 0.9625229835510254, 'eval_accuracy': 0.5619047619047619, 'eval_recall': 0.5619047619047619, 'eval_f1': 0.5619047619047619, 'eval_precision': 0.5619047619047619, 'eval_runtime': 0.7432, 'eval_samples_per_second': 141.275, 'eval_steps_per_second': 18.837, 'epoch': 2.0}
{'eval_loss': 0.9703834056854248, 'eval_accuracy': 0.5619047619047619, 'eval_recall': 0.5619047619047619, 'eval_f1': 0.5619047619047619, 'eval_precision': 0.5619047619047619, 'eval_runtime': 0.7466, 'eval_samples_per_second': 140.638, 'eval_steps_per_second': 18.752, 'epoch': 3.0}
{'eval_loss': 0.9611096978187561, 'eval_accuracy': 0.5619047619047619, 'eval_recall': 0.5619047619047619, 'eval_f1': 0.5619047619047619, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.00      0.00      0.00        71
           1       0.19      1.00      0.32        50
           2       0.00      0.00      0.00       140

    accuracy                           0.19       261
   macro avg       0.06      0.33      0.11       261
weighted avg       0.04      0.19      0.06       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.00      0.00      0.00        71
           1       0.19      1.00      0.32        50
           2       0.00      0.00      0.00       140

    accuracy                           0.19       261
   macro avg       0.06      0.33      0.11       261
weighted avg       0.04      0.19      0.06       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 2 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise',

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.522759199142456, 'eval_accuracy': 0.8571428571428571, 'eval_recall': 0.8571428571428571, 'eval_f1': 0.8571428571428571, 'eval_precision': 0.8571428571428571, 'eval_runtime': 0.7228, 'eval_samples_per_second': 145.259, 'eval_steps_per_second': 19.368, 'epoch': 1.0}
{'eval_loss': 0.7813647389411926, 'eval_accuracy': 0.7904761904761904, 'eval_recall': 0.7904761904761904, 'eval_f1': 0.7904761904761904, 'eval_precision': 0.7904761904761904, 'eval_runtime': 0.6721, 'eval_samples_per_second': 156.237, 'eval_steps_per_second': 20.832, 'epoch': 2.0}
{'eval_loss': 0.3707582354545593, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6806, 'eval_samples_per_second': 154.28, 'eval_steps_per_second': 20.571, 'epoch': 3.0}
{'eval_loss': 0.29242628812789917, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 



{'eval_loss': 0.38722196221351624, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6768, 'eval_samples_per_second': 155.135, 'eval_steps_per_second': 20.685, 'epoch': 1.0}
{'eval_loss': 0.42974820733070374, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.6843, 'eval_samples_per_second': 153.448, 'eval_steps_per_second': 20.46, 'epoch': 2.0}
{'eval_loss': 0.26715588569641113, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.6842, 'eval_samples_per_second': 153.471, 'eval_steps_per_second': 20.463, 'epoch': 3.0}
{'eval_loss': 0.41519027948379517, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precisio

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.84      0.94      0.89        97
           1       0.70      0.72      0.71        74
           2       0.91      0.78      0.84        90

    accuracy                           0.82       261
   macro avg       0.82      0.81      0.81       261
weighted avg       0.82      0.82      0.82       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.84      0.94      0.89        97
           1       0.70      0.72      0.71        74
           2       0.91      0.78      0.84        90

    accuracy                           0.82       261
   macro avg       0.82      0.81      0.81       261
weighted avg       0.82      0.82      0.82       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 3 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise',

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.5834101438522339, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.7459, 'eval_samples_per_second': 140.778, 'eval_steps_per_second': 18.77, 'epoch': 1.0}
{'eval_loss': 0.40141037106513977, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 0.8666666666666667, 'eval_runtime': 0.6951, 'eval_samples_per_second': 151.064, 'eval_steps_per_second': 20.142, 'epoch': 2.0}
{'eval_loss': 0.5533255338668823, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.7153, 'eval_samples_per_second': 146.786, 'eval_steps_per_second': 19.571, 'epoch': 3.0}
{'eval_loss': 1.0721074342727661, 'eval_accuracy': 0.7333333333333333, 'eval_recall': 0.7333333333333333, 'eval_f1': 0.7333333333333333, 'eval_precision':



{'eval_loss': 0.7500479817390442, 'eval_accuracy': 0.7904761904761904, 'eval_recall': 0.7904761904761904, 'eval_f1': 0.7904761904761904, 'eval_precision': 0.7904761904761904, 'eval_runtime': 0.6984, 'eval_samples_per_second': 150.35, 'eval_steps_per_second': 20.047, 'epoch': 1.0}
{'eval_loss': 0.49103909730911255, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.7197, 'eval_samples_per_second': 145.902, 'eval_steps_per_second': 19.454, 'epoch': 2.0}
{'eval_loss': 0.34387344121932983, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.705, 'eval_samples_per_second': 148.944, 'eval_steps_per_second': 19.859, 'epoch': 3.0}
{'eval_loss': 0.3468129336833954, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.88      0.91      0.89        74
           1       0.81      0.58      0.68        79
           2       0.80      0.95      0.87       108

    accuracy                           0.83       261
   macro avg       0.83      0.81      0.81       261
weighted avg       0.83      0.83      0.82       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.88      0.91      0.89        74
           1       0.81      0.58      0.68        79
           2       0.80      0.95      0.87       108

    accuracy                           0.83       261
   macro avg       0.83      0.81      0.81       261
weighted avg       0.83      0.83      0.82       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 4 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise',

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 1.064266324043274, 'eval_accuracy': 0.42857142857142855, 'eval_recall': 0.42857142857142855, 'eval_f1': 0.42857142857142855, 'eval_precision': 0.42857142857142855, 'eval_runtime': 0.6776, 'eval_samples_per_second': 154.963, 'eval_steps_per_second': 20.662, 'epoch': 1.0}
{'eval_loss': 1.0950548648834229, 'eval_accuracy': 0.3523809523809524, 'eval_recall': 0.3523809523809524, 'eval_f1': 0.3523809523809524, 'eval_precision': 0.3523809523809524, 'eval_runtime': 0.6183, 'eval_samples_per_second': 169.814, 'eval_steps_per_second': 22.642, 'epoch': 2.0}
{'eval_loss': 1.0760473012924194, 'eval_accuracy': 0.3523809523809524, 'eval_recall': 0.3523809523809524, 'eval_f1': 0.3523809523809524, 'eval_precision': 0.3523809523809524, 'eval_runtime': 0.6315, 'eval_samples_per_second': 166.265, 'eval_steps_per_second': 22.169, 'epoch': 3.0}
{'eval_loss': 1.0673463344573975, 'eval_accuracy': 0.3523809523809524, 'eval_recall': 0.3523809523809524, 'eval_f1': 0.3523809523809524, 'eval_precisio



{'eval_loss': 0.9844110608100891, 'eval_accuracy': 0.3523809523809524, 'eval_recall': 0.3523809523809524, 'eval_f1': 0.3523809523809524, 'eval_precision': 0.3523809523809524, 'eval_runtime': 0.6172, 'eval_samples_per_second': 170.12, 'eval_steps_per_second': 22.683, 'epoch': 1.0}
{'eval_loss': 0.9834376573562622, 'eval_accuracy': 0.3523809523809524, 'eval_recall': 0.3523809523809524, 'eval_f1': 0.3523809523809524, 'eval_precision': 0.3523809523809524, 'eval_runtime': 0.6188, 'eval_samples_per_second': 169.679, 'eval_steps_per_second': 22.624, 'epoch': 2.0}
{'eval_loss': 0.9787329435348511, 'eval_accuracy': 0.3523809523809524, 'eval_recall': 0.3523809523809524, 'eval_f1': 0.3523809523809524, 'eval_precision': 0.3523809523809524, 'eval_runtime': 0.6231, 'eval_samples_per_second': 168.517, 'eval_steps_per_second': 22.469, 'epoch': 3.0}
{'eval_loss': 0.9719967246055603, 'eval_accuracy': 0.3523809523809524, 'eval_recall': 0.3523809523809524, 'eval_f1': 0.3523809523809524, 'eval_precision': 

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.00      0.00      0.00        47
           1       0.00      0.00      0.00       205
           2       0.03      1.00      0.07         9

    accuracy                           0.03       261
   macro avg       0.01      0.33      0.02       261
weighted avg       0.00      0.03      0.00       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.00      0.00      0.00        47
           1       0.00      0.00      0.00       205
           2       0.03      1.00      0.07         9

    accuracy                           0.03       261
   macro avg       0.01      0.33      0.02       261
weighted avg       0.00      0.03      0.00       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 16 0 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.5476974248886108, 'eval_accuracy': 0.819047619047619, 'eval_recall': 0.819047619047619, 'eval_f1': 0.819047619047619, 'eval_precision': 0.819047619047619, 'eval_runtime': 0.7314, 'eval_samples_per_second': 143.558, 'eval_steps_per_second': 9.571, 'epoch': 1.0}
{'eval_loss': 0.48846185207366943, 'eval_accuracy': 0.8285714285714286, 'eval_recall': 0.8285714285714286, 'eval_f1': 0.8285714285714286, 'eval_precision': 0.8285714285714286, 'eval_runtime': 0.6929, 'eval_samples_per_second': 151.533, 'eval_steps_per_second': 10.102, 'epoch': 2.0}
{'eval_loss': 0.5480553507804871, 'eval_accuracy': 0.8095238095238095, 'eval_recall': 0.8095238095238095, 'eval_f1': 0.8095238095238095, 'eval_precision': 0.8095238095238095, 'eval_runtime': 0.679, 'eval_samples_per_second': 154.643, 'eval_steps_per_second': 10.31, 'epoch': 3.0}
{'eval_loss': 0.5040140151977539, 'eval_accuracy': 0.819047619047619, 'eval_recall': 0.819047619047619, 'eval_f1': 0.819047619047619, 'eval_precision': 0.819047



{'eval_loss': 0.4387938678264618, 'eval_accuracy': 0.8285714285714286, 'eval_recall': 0.8285714285714286, 'eval_f1': 0.8285714285714286, 'eval_precision': 0.8285714285714286, 'eval_runtime': 0.6968, 'eval_samples_per_second': 150.678, 'eval_steps_per_second': 10.045, 'epoch': 1.0}
{'eval_loss': 0.3343612253665924, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.6884, 'eval_samples_per_second': 152.525, 'eval_steps_per_second': 10.168, 'epoch': 2.0}
{'eval_loss': 0.5249496102333069, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.6845, 'eval_samples_per_second': 153.397, 'eval_steps_per_second': 10.226, 'epoch': 3.0}
{'eval_loss': 0.245783731341362, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.75      0.73      0.74       185
           2       0.81      0.39      0.53        76

    accuracy                           0.63       261
   macro avg       0.52      0.37      0.42       261
weighted avg       0.76      0.63      0.68       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.75      0.73      0.74       185
           2       0.81      0.39      0.53        76

    accuracy                           0.63       261
   macro avg       0.52      0.37      0.42       261
weighted avg       0.76      0.63      0.68       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 16 1 ************************

Index(['Premise', 'Hypothesis', 'labels', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'labels', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.3832259178161621, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.6943, 'eval_samples_per_second': 151.232, 'eval_steps_per_second': 10.082, 'epoch': 1.0}
{'eval_loss': 0.35512474179267883, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.689, 'eval_samples_per_second': 152.386, 'eval_steps_per_second': 10.159, 'epoch': 2.0}
{'eval_loss': 0.20690137147903442, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.6834, 'eval_samples_per_second': 153.647, 'eval_steps_per_second': 10.243, 'epoch': 3.0}
{'eval_loss': 0.32045701146125793, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision

In [None]:
all_scores_deberta = all_scores

In [None]:
all_scores_deberta = all_scores