## Finetuning NLI-deberta-v3 model
- This is based on the Prof. Mihai Surdeanu's text book <a href="https://github.com/clulab/gentlenlp/blob/main/notebooks/chap13_classification_bert.ipynb">Gentle NLP Chapter 13 Classification using BERT model</a>
- Modified for NLI evaluation and analysis over SICCK dataset
- Author: Sushma Anand Akoju, Email: sushmaakoju@arizona.edu

In [97]:
!pip install datasets
!pip install transformers
!pip install sentencepiece
!pip install accelerate
!pip install 'transformers[torch]'

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Text Classification Using Transformer Networks (Deberta and Roberta)

Some initialization:

In [2]:
import random
import torch
import numpy as np
import pandas as pd
from tqdm.notebook import tqdm

# enable tqdm in pandas
tqdm.pandas()

# set to True to use the gpu (if there is one available)
use_gpu = True

# select device
device = torch.device('cuda' if use_gpu and torch.cuda.is_available() else 'cpu')
print(f'device: {device.type}')

# random seed
seed = 1234

# set random seed
if seed is not None:
    print(f'random seed: {seed}')
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

device: cuda
random seed: 1234


In [3]:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

Mounted at /content/drive/


In [30]:
import os
import pandas as pd
path = "/content/drive/MyDrive/Colab Notebooks/natural-logic/final-datasets/block-wise-data/blocks-dec26/data/15/"
assert os.path.exists(path), "Error"
temp = pd.read_csv(os.path.join(path,"SICCK-zero-shot-analysis-apr24.csv"))
df = temp[temp.columns[1:]]
df.head(), len(df)

(   SICK_id                                  Premise  \
 0       90      a man is jumping into an empty pool   
 1       90  every man is jumping into an empty pool   
 2       90      a man is jumping into an empty pool   
 3       90  every man is jumping into an empty pool   
 4       90   some man is jumping into an empty pool   
 
                               Hypothesis Modifier Premise/Hypothesis/Both  \
 0      a man is jumping into a full pool     NONE                    NONE   
 1      a man is jumping into a full pool    every                 Premise   
 2  every man is jumping into a full pool    every              Hypothesis   
 3  every man is jumping into a full pool    every                    Both   
 4      a man is jumping into a full pool     some                 Premise   
 
   Part of Premise/Hypothesis Modified Ground Truth           GT Modifier Type  \
 0                                NONE  Alternation  Alternation  No Modifiers   
 1                          

In [31]:
df['CompressedGT'].unique()

array(['Contradiction', 'Neutral', 'FE', 'RE'], dtype=object)

In [32]:
label2id = {
    'Contradiction': 0,
    'Neutral': 1, 
    'FE': 2, 
    'RE': 2, 
}
label2id_roberta = {
    'Contradiction': 2,
    'Neutral': 1, 
    'FE': 0, 
    'RE': 0, 
}
df['label'] = df['CompressedGT'].map(lambda x: label2id[x])
df['label4roberta'] = df['CompressedGT'].map(lambda x: label2id_roberta[x])

In [33]:
df['label'].unique(), df['CompressedGT'].unique(), df['label4roberta'].unique()

(array([0, 1, 2]),
 array(['Contradiction', 'Neutral', 'FE', 'RE'], dtype=object),
 array([2, 1, 0]))

Read the train/dev/test datasets and create a HuggingFace `Dataset` object:

## Rolling window (circular array style) splits for 5-fold Cross validation
- to account for the few extra indices that might run beyond the length of data for test or train sets

In [73]:
import math
d = list(range(len(df)))
n = len(d)
test_len = math.ceil(1304/5)
train_len = n-(2*test_len)
val_len = n-train_len - test_len
print(train_len, test_len, val_len)
math.ceil(1304/5), n-(2*261),n, n-test_len

782 261 261


(261, 782, 1304, 1043)

In [52]:
len(d)

1304

In [71]:
def circular_array(starting_index, ending_index, d):
  idx = d
  idx = np.roll(idx, -starting_index)[:(len(idx)-starting_index+ending_index)%len(idx)]

  return idx

In [76]:
len(circular_array(261+test_len,261+test_len+1043, d))

1043

In [78]:
counter = 1
folds = []
columns = ['Premise', 'Hypothesis', 'label','label4roberta', 'CompressedGT', 'Modifier Type', 
           'Modifier',	'Premise/Hypothesis/Both',	'Part of Premise/Hypothesis Modified']
# test_len = 261
for i in range(0,n, test_len):
  # print(i, len(d[i:i+test_len]), len(circular_array(i, i+test_len, d)))
  test = df.iloc[circular_array(i, i+test_len, d)][columns]
  train = df.iloc[circular_array(i+test_len, i+test_len+1043, d)][columns]
  print(len(test), len(train))
  # val = df.iloc[circular_array(i+test_len+1130, i+test_len+1130+test_len, d)][columns]
  counter += 1
  folds.append({"train":train, "test":test})

261 1043
261 1043
261 1043
261 1043
261 1043


In [80]:
len(folds[0]["train"]), len(folds[0]["test"])

(1043, 261)

### Save all data into one excel sheet

In [81]:
output_path = "/content/drive/MyDrive/Colab Notebooks/natural-logic/june12"
for i,fold in enumerate(folds):
  with pd.ExcelWriter(os.path.join(output_path, "fold"+str(i)+".xlsx")) as writer:
    fold["train"].to_excel(writer, sheet_name="train", index=False )
    fold["test"].to_excel(writer, sheet_name="test", index=False )

In [82]:
output_path = "/content/drive/MyDrive/Colab Notebooks/natural-logic/june12"
with pd.ExcelWriter(os.path.join(output_path, "five_folds.xlsx")) as writer:

  for i,fold in enumerate(folds):
      # fold["train"].to_excel(writer, sheet_name="train", index=False )
    fold["test"].to_excel(writer, sheet_name="fold"+str(i), index=False )

### Modifier type distribution each of the test set

In [83]:
with pd.ExcelWriter(os.path.join(output_path, "fold_distribution.xlsx")) as writer:
  for i,fold in enumerate(folds):
    print(i,fold["test"].groupby(["Modifier Type"]).count().reset_index()[["Modifier Type","CompressedGT"]])
    fold["test"].groupby(["Modifier Type"]).count().reset_index()[["Modifier Type","CompressedGT"]].to_excel(writer, sheet_name="fold"+str(i), index=False)

0         Modifier Type  CompressedGT
0  Adjectives/Adverbs           123
1         Existential            57
2            Negation            33
3        No Modifiers             3
4           Universal            45
1         Modifier Type  CompressedGT
0  Adjectives/Adverbs           122
1         Existential            63
2            Negation            29
3        No Modifiers             3
4           Universal            44
2         Modifier Type  CompressedGT
0  Adjectives/Adverbs           117
1         Existential            60
2            Negation            36
3        No Modifiers             3
4           Universal            45
3         Modifier Type  CompressedGT
0  Adjectives/Adverbs           125
1         Existential            56
2            Negation            36
3        No Modifiers             3
4           Universal            41
4         Modifier Type  CompressedGT
0  Adjectives/Adverbs           115
1         Existential            67
2            Negat

### Label-wise distribution in test splits

In [84]:
with pd.ExcelWriter(os.path.join(output_path, "fold_label_distribution.xlsx")) as writer:
  for i,fold in enumerate(folds):
    print(i,fold["test"].groupby(["CompressedGT"]).count().reset_index()[["CompressedGT", "Modifier Type"]])
    fold["test"].groupby(["CompressedGT"]).count().reset_index()[["CompressedGT","Modifier Type"]].to_excel(writer, sheet_name="fold"+str(i), index=False)

0     CompressedGT  Modifier Type
0  Contradiction             76
1        Neutral            185
1     CompressedGT  Modifier Type
0  Contradiction            140
1             FE             60
2        Neutral             50
3             RE             11
2     CompressedGT  Modifier Type
0  Contradiction             90
1             FE             87
2        Neutral             74
3             RE             10
3     CompressedGT  Modifier Type
0  Contradiction            108
1             FE             68
2        Neutral             79
3             RE              6
4     CompressedGT  Modifier Type
0  Contradiction              9
1             FE             44
2        Neutral            205
3             RE              3


### Label-Modifier type count distribution for analysis : to verify the gap or imbalanced distribution

In [85]:
with pd.ExcelWriter(os.path.join(output_path, "fold_label_qtype_distribution.xlsx")) as writer:
  for i,fold in enumerate(folds):
    print(i,fold["test"].groupby(["CompressedGT", "Modifier Type"]).count().reset_index()[["CompressedGT", "Modifier Type", "Modifier"]])
    fold["test"].groupby(["CompressedGT", "Modifier Type"]).count().reset_index()[["CompressedGT","Modifier Type", "Modifier"]].to_excel(writer, sheet_name="fold"+str(i), index=False)

0     CompressedGT       Modifier Type  Modifier
0  Contradiction  Adjectives/Adverbs        42
1  Contradiction         Existential        15
2  Contradiction        No Modifiers         1
3  Contradiction           Universal        18
4        Neutral  Adjectives/Adverbs        81
5        Neutral         Existential        42
6        Neutral            Negation        33
7        Neutral        No Modifiers         2
8        Neutral           Universal        27
1      CompressedGT       Modifier Type  Modifier
0   Contradiction  Adjectives/Adverbs        77
1   Contradiction         Existential        27
2   Contradiction            Negation         8
3   Contradiction        No Modifiers         2
4   Contradiction           Universal        26
5              FE  Adjectives/Adverbs        28
6              FE         Existential        17
7              FE            Negation         5
8              FE        No Modifiers         1
9              FE           Universal         

In [86]:
len(folds)

5

In [87]:
for fold in folds:
  print(len(fold["train"]), len(fold["test"]))

1043 261
1043 261
1043 261
1043 261
1043 261


### Create dtaa splits with premise, hypothesis as well as hypothesis, premise for **Test** set predictions to label:
- Forward Entailment
- Reverse Entailment
- Neutral 

In [88]:
def read_data(data):
    # concatenate title and description, and remove backslashes
    data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
    data['text'] = data['text'].str.replace('\\', ' ', regex=False)
    return data

In [89]:
def read_data_reverse(data):
    # concatenate title and description, and remove backslashes
    data['text'] = data['Hypothesis'] + " [SEP] " + data['Premise']
    data['text'] = data['text'].str.replace('\\', ' ', regex=False)
    return data

### Compute metrics for validation and test

In [90]:
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score

def compute_metrics(eval_pred):
    y_true = eval_pred.label_ids
    y_pred = np.argmax(eval_pred.predictions, axis=-1)
    return {'accuracy': accuracy_score(y_true, y_pred), 'recall': recall_score(y_true, y_pred, average='micro'), 
            'f1':f1_score(y_true, y_pred, average='micro'), 'precision':precision_score(y_true, y_pred, average='micro')}
def compute_test_metrics(y_true, y_pred):
    return {'accuracy': accuracy_score(y_true, y_pred), 'recall': recall_score(y_true, y_pred, average='micro'), 
            'f1':f1_score(y_true, y_pred, average='micro'), 'precision':precision_score(y_true, y_pred, average='micro')}

### To include FE, RE and Neutral label calculation and scores for **Test**

In [91]:
from sklearn.metrics import classification_report
def test_eval(trainer, ds, fold, model_name ):
  test_ds = ds['test'].map(
      tokenize,
      batched=True,
      remove_columns=['Premise', 'Hypothesis', 'text'],
  )
  rev_test_ds = ds['rev_test'].map(
      tokenize,
      batched=True,
      remove_columns=['Premise', 'Hypothesis', 'text'],
  )
  test_ds.to_pandas()
  output = trainer.predict(test_ds)
  rev_scores = trainer.predict(rev_test_ds)

  y_true = output.label_ids
  y_preds = np.argmax(output.predictions, axis=-1)
  y_rev_score_preds = np.argmax(rev_scores.predictions, axis=-1)
  labels = []

  if "roberta" in model_name:
    for i in range(len(y_preds)):
          if y_preds[i] == 0:
            labels.append("FE")
          elif y_preds[i] == 2:
            labels.append("Negation")
          else:
            if y_rev_score_preds[i] == 1:
              labels.append("RE")
            else:
              labels.append("Neutral")
    print(classification_report(y_true, y_preds, labels=[0, 1, 2]))
    
  else:
    for i in range(len(y_preds)):
      if y_preds[i] == 1:
        labels.append("FE")
      elif y_preds[i] == 0:
        labels.append("Negation")
      else:
        if y_rev_score_preds[i] == 1:
          labels.append("RE")
        else:
          labels.append("Neutral")
    print(classification_report(y_true, y_preds, labels=[0, 1, 2]))
  res = compute_test_metrics(y_true, y_preds)
  res['fold'] = fold
  res['model_name'] = model_name
  return y_true, y_preds, res, labels

In [None]:
# model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/nli-deberta-v3-base', num_labels=3)
# tokenizer = AutoTokenizer.from_pretrained('cross-encoder/nli-deberta-v3-base')

### Get this_train, this_validation & this_test set from a this_fold

In [92]:
def get_dataset(fold, model_name):
  labels_header_name = 'label'
  if "roberta" in model_name:
    labels_header_name = 'label4roberta'
    
  columns = ['Premise', 'Hypothesis', labels_header_name]

  train_df = read_data(fold["train"][columns])
  test_df = read_data(fold["test"][columns])
  rev_test_df = read_data_reverse(fold["test"][columns])
  print(test_df.columns)

  train_df, eval_df = train_test_split(train_df, train_size=0.9)
  train_df.reset_index(inplace=True, drop=True)
  eval_df.reset_index(inplace=True, drop=True)
  test_df.reset_index(inplace=True, drop=True)
  rev_test_df.reset_index(inplace=True, drop=True)

  print(f'train rows: {len(train_df.index):,}')
  print(f'eval rows: {len(eval_df.index):,}')
  print(f'test rows: {len(test_df.index):,}')
  print(f'test rows: {len(rev_test_df.index):,}')

  ds = DatasetDict()
  ds['train'] = Dataset.from_pandas(train_df)
  ds['validation'] = Dataset.from_pandas(eval_df)
  ds['test'] = Dataset.from_pandas(test_df)
  ds['rev_test'] = Dataset.from_pandas(rev_test_df)

  print(ds)
  return ds, test_df, rev_test_df

### CustomTrainer for CrossEntropyLoss but we train for both custom and default Trainer classes in HuggingFace
- Note: we did not see any difference between the two

In [93]:

import torch
from torch import nn
from transformers import Trainer
from accelerate import Accelerator

class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get("logits")
        # compute custom loss (suppose one has 3 labels with different weights)
        loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0]))
        loss_fct.to('cuda')
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss

## Fine tune "cross-encoder/nli-deberta-v3-base" for 1304 examples for five folds.
- Use rolling window for train-test folds
- split validation from training (same size as test)
- test size: 261

### Tokenize & Train one model at a time for all folds

In [94]:
from sklearn.model_selection import train_test_split
from datasets import Dataset, DatasetDict
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer

model_name = ""
model_names =["cross-encoder/nli-deberta-v3-base",	"ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli"]
# model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
# tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize(examples):
    return tokenizer(examples['text'], truncation=True)

def train(model_name, this_path, folds):
  epochs = [4, 8]
  batch_sizes = [8,16,32]
  m = model_name.split("/")[1]
  all_scores = []
  # tokenizer = AutoTokenizer.from_pretrained(model_name)
  # model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
  # tokenizer = AutoTokenizer.from_pretrained(model_name)
  for num_epochs in epochs:
    for batch_size in batch_sizes:

      for i,fold in enumerate(folds):
          print("\n***********************************************************************************\n")
          print("\n**************** The number of epochs, batch_size and fold respectively are: ",num_epochs, batch_size, i,"************************\n")
          model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)

          ds, test_df, rev_test_df = get_dataset(fold,model_name)
          train_ds = ds['train'].map(
            tokenize, batched=True,
            remove_columns=['Premise', 'Hypothesis', 'text'],
          )
          eval_ds = ds['validation'].map(
              tokenize,
              batched=True,
              remove_columns=['Premise', 'Hypothesis', 'text'],
          )

          weight_decay = 0.01
          tx_model_name = f'{model_name}-sequence-classification'

          training_args = TrainingArguments(
              output_dir=os.path.join(output_path,m+"_"+str(num_epochs)+str(batch_size)+"trainer"),
              log_level='error',
              num_train_epochs=num_epochs,
              per_device_train_batch_size=batch_size,
              per_device_eval_batch_size=batch_size,
              evaluation_strategy='epoch',
              weight_decay=weight_decay,
          )
          trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=train_ds,
            eval_dataset=eval_ds,
            compute_metrics=compute_metrics,
            tokenizer=tokenizer,
          )
          trainer.train()

          customTrainer  = CustomTrainer(
            model=model,
            args=training_args,
            train_dataset=train_ds,
            eval_dataset=eval_ds,
            compute_metrics=compute_metrics,
            tokenizer=tokenizer,
          )

          customTrainer.train()
          y_true, y_pred, results, labels = test_eval(trainer, ds, i, model_name )
          y_true1, y_pred1, results1, labels1 = test_eval(customTrainer, ds, i, model_name )

          all_scores.append(results)
          fold["test"]["label"]= y_true
          fold["test"]["predictions"] = y_pred
          fold["test"]["predictions2"] = y_pred1
          fold["test"]["text"] = test_df['text']
          fold["test"]["pred_labels"] =  labels
          filename = "five_"+m+"_"+str(num_epochs)+"_"+str(batch_size)+"_"+str(i)+"_test.xlsx"
          fold["test"].to_csv(os.path.join(this_path, filename))
  return all_scores

In [95]:
torch.cuda.get_device_name(0)

'Tesla T4'

In [51]:
# if tokenizer:
#   del tokenizer
# if model:
#   del model

### "cross-encoder/nli-deberta-v3-base"

In [96]:
from accelerate import Accelerator
all_scores = []
predictions = []
# if tokenizer:
#   del tokenizer
# if model:
#   del model
#for model_name in model_names:
model_name = model_names[0]
m = model_name.split("/")[1]
this_path = os.path.join(output_path, m)
if not os.path.exists(this_path):
  os.mkdir(this_path)
assert os.path.exists(this_path), "%s Path does not exists!"%(this_path)

# model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
tokenizer = AutoTokenizer.from_pretrained(model_name)
all_scores.append(train(model_name, this_path, folds))




***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 0 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 261
    })
    rev_test: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 261
    })
})


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.5949541926383972, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 0.8666666666666667, 'eval_runtime': 0.4037, 'eval_samples_per_second': 260.108, 'eval_steps_per_second': 34.681, 'epoch': 1.0}
{'eval_loss': 0.5080563426017761, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 0.8666666666666667, 'eval_runtime': 0.3795, 'eval_samples_per_second': 276.688, 'eval_steps_per_second': 36.892, 'epoch': 2.0}
{'eval_loss': 0.3276503086090088, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.376, 'eval_samples_per_second': 279.222, 'eval_steps_per_second': 37.23, 'epoch': 3.0}
{'eval_loss': 0.33664670586586, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9



{'eval_loss': 0.6657495498657227, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.3784, 'eval_samples_per_second': 277.515, 'eval_steps_per_second': 37.002, 'epoch': 1.0}
{'eval_loss': 0.5043387413024902, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.3826, 'eval_samples_per_second': 274.427, 'eval_steps_per_second': 36.59, 'epoch': 2.0}
{'eval_loss': 0.4789453148841858, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3924, 'eval_samples_per_second': 267.59, 'eval_steps_per_second': 35.679, 'epoch': 3.0}
{'eval_loss': 0.5295740962028503, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.92      0.87      0.89        76
           1       0.94      0.87      0.90       185
           2       0.00      0.00      0.00         0

    accuracy                           0.87       261
   macro avg       0.62      0.58      0.60       261
weighted avg       0.93      0.87      0.90       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.92      0.87      0.89        76
           1       0.94      0.87      0.90       185
           2       0.00      0.00      0.00         0

    accuracy                           0.87       261
   macro avg       0.62      0.58      0.60       261
weighted avg       0.93      0.87      0.90       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 1 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'H

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.34073612093925476, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.4124, 'eval_samples_per_second': 254.631, 'eval_steps_per_second': 33.951, 'epoch': 1.0}
{'eval_loss': 0.30618688464164734, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3867, 'eval_samples_per_second': 271.504, 'eval_steps_per_second': 36.201, 'epoch': 2.0}
{'eval_loss': 0.2671581208705902, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.3931, 'eval_samples_per_second': 267.118, 'eval_steps_per_second': 35.616, 'epoch': 3.0}
{'eval_loss': 0.23252074420452118, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precisio



{'eval_loss': 0.40750357508659363, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3825, 'eval_samples_per_second': 274.484, 'eval_steps_per_second': 36.598, 'epoch': 1.0}
{'eval_loss': 0.35137277841567993, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.3827, 'eval_samples_per_second': 274.341, 'eval_steps_per_second': 36.579, 'epoch': 2.0}
{'eval_loss': 0.429788738489151, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.4016, 'eval_samples_per_second': 261.424, 'eval_steps_per_second': 34.857, 'epoch': 3.0}
{'eval_loss': 0.4485253095626831, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision'

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.97      0.68      0.80       140
           1       0.39      0.90      0.55        50
           2       0.96      0.65      0.77        71

    accuracy                           0.71       261
   macro avg       0.77      0.74      0.71       261
weighted avg       0.86      0.71      0.74       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.97      0.68      0.80       140
           1       0.39      0.90      0.55        50
           2       0.96      0.65      0.77        71

    accuracy                           0.71       261
   macro avg       0.77      0.74      0.71       261
weighted avg       0.86      0.71      0.74       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 2 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'H

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.24841411411762238, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.378, 'eval_samples_per_second': 277.8, 'eval_steps_per_second': 37.04, 'epoch': 1.0}
{'eval_loss': 0.2938096225261688, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.3776, 'eval_samples_per_second': 278.049, 'eval_steps_per_second': 37.073, 'epoch': 2.0}
{'eval_loss': 0.2654634416103363, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.3793, 'eval_samples_per_second': 276.839, 'eval_steps_per_second': 36.912, 'epoch': 3.0}
{'eval_loss': 0.23400820791721344, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0



{'eval_loss': 0.534726619720459, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.372, 'eval_samples_per_second': 282.224, 'eval_steps_per_second': 37.63, 'epoch': 1.0}
{'eval_loss': 0.24757559597492218, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.39, 'eval_samples_per_second': 269.244, 'eval_steps_per_second': 35.899, 'epoch': 2.0}
{'eval_loss': 0.4493982195854187, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.377, 'eval_samples_per_second': 278.545, 'eval_steps_per_second': 37.139, 'epoch': 3.0}
{'eval_loss': 0.4034498333930969, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.95

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.93      0.77      0.84        90
           1       0.61      0.68      0.64        74
           2       0.81      0.88      0.84        97

    accuracy                           0.78       261
   macro avg       0.78      0.77      0.77       261
weighted avg       0.80      0.78      0.78       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.93      0.77      0.84        90
           1       0.61      0.68      0.64        74
           2       0.81      0.88      0.84        97

    accuracy                           0.78       261
   macro avg       0.78      0.77      0.77       261
weighted avg       0.80      0.78      0.78       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 3 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'H

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.2748808264732361, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.3774, 'eval_samples_per_second': 278.231, 'eval_steps_per_second': 37.097, 'epoch': 1.0}
{'eval_loss': 0.56716388463974, 'eval_accuracy': 0.8571428571428571, 'eval_recall': 0.8571428571428571, 'eval_f1': 0.8571428571428571, 'eval_precision': 0.8571428571428571, 'eval_runtime': 0.3764, 'eval_samples_per_second': 278.951, 'eval_steps_per_second': 37.194, 'epoch': 2.0}
{'eval_loss': 0.5814638137817383, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.3976, 'eval_samples_per_second': 264.073, 'eval_steps_per_second': 35.21, 'epoch': 3.0}
{'eval_loss': 0.6229392886161804, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.



{'eval_loss': 0.575042724609375, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.3838, 'eval_samples_per_second': 273.6, 'eval_steps_per_second': 36.48, 'epoch': 1.0}
{'eval_loss': 0.6113000512123108, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 0.8666666666666667, 'eval_runtime': 0.3901, 'eval_samples_per_second': 269.155, 'eval_steps_per_second': 35.887, 'epoch': 2.0}
{'eval_loss': 0.8103451132774353, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.3835, 'eval_samples_per_second': 273.761, 'eval_steps_per_second': 36.501, 'epoch': 3.0}
{'eval_loss': 0.7911688685417175, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.78      0.91      0.84       108
           1       0.66      0.57      0.61        79
           2       0.88      0.81      0.85        74

    accuracy                           0.78       261
   macro avg       0.78      0.76      0.77       261
weighted avg       0.77      0.78      0.77       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.78      0.91      0.84       108
           1       0.66      0.57      0.61        79
           2       0.88      0.81      0.85        74

    accuracy                           0.78       261
   macro avg       0.78      0.76      0.77       261
weighted avg       0.77      0.78      0.77       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 8 4 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'H

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.3304544985294342, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3721, 'eval_samples_per_second': 282.163, 'eval_steps_per_second': 37.622, 'epoch': 1.0}
{'eval_loss': 0.15108774602413177, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.3744, 'eval_samples_per_second': 280.451, 'eval_steps_per_second': 37.393, 'epoch': 2.0}
{'eval_loss': 0.15854430198669434, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.3736, 'eval_samples_per_second': 281.045, 'eval_steps_per_second': 37.473, 'epoch': 3.0}
{'eval_loss': 0.16825932264328003, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precisio



{'eval_loss': 0.351960688829422, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3657, 'eval_samples_per_second': 287.145, 'eval_steps_per_second': 38.286, 'epoch': 1.0}
{'eval_loss': 0.28978243470191956, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.3756, 'eval_samples_per_second': 279.55, 'eval_steps_per_second': 37.273, 'epoch': 2.0}
{'eval_loss': 0.14473414421081543, 'eval_accuracy': 0.9809523809523809, 'eval_recall': 0.9809523809523809, 'eval_f1': 0.9809523809523809, 'eval_precision': 0.9809523809523809, 'eval_runtime': 0.3672, 'eval_samples_per_second': 285.957, 'eval_steps_per_second': 38.128, 'epoch': 3.0}
{'eval_loss': 0.149297833442688, 'eval_accuracy': 0.9809523809523809, 'eval_recall': 0.9809523809523809, 'eval_f1': 0.9809523809523809, 'eval_precision': 

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.23      0.89      0.36         9
           1       0.79      0.80      0.80       205
           2       0.00      0.00      0.00        47

    accuracy                           0.66       261
   macro avg       0.34      0.56      0.39       261
weighted avg       0.63      0.66      0.64       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.23      0.89      0.36         9
           1       0.79      0.80      0.80       205
           2       0.00      0.00      0.00        47

    accuracy                           0.66       261
   macro avg       0.34      0.56      0.39       261
weighted avg       0.63      0.66      0.64       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 0 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.419146865606308, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 0.8666666666666667, 'eval_runtime': 0.3088, 'eval_samples_per_second': 340.067, 'eval_steps_per_second': 22.671, 'epoch': 1.0}
{'eval_loss': 0.378111332654953, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.3041, 'eval_samples_per_second': 345.234, 'eval_steps_per_second': 23.016, 'epoch': 2.0}
{'eval_loss': 0.3661981225013733, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.3029, 'eval_samples_per_second': 346.626, 'eval_steps_per_second': 23.108, 'epoch': 3.0}
{'eval_loss': 0.42617568373680115, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 



{'eval_loss': 0.6636369824409485, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2998, 'eval_samples_per_second': 350.258, 'eval_steps_per_second': 23.351, 'epoch': 1.0}
{'eval_loss': 0.7228593826293945, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.3014, 'eval_samples_per_second': 348.355, 'eval_steps_per_second': 23.224, 'epoch': 2.0}
{'eval_loss': 0.6002426147460938, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.3013, 'eval_samples_per_second': 348.491, 'eval_steps_per_second': 23.233, 'epoch': 3.0}
{'eval_loss': 0.6399492025375366, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.92      0.72      0.81        76
           1       0.86      0.71      0.78       185
           2       0.00      0.00      0.00         0

    accuracy                           0.72       261
   macro avg       0.59      0.48      0.53       261
weighted avg       0.88      0.72      0.79       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.92      0.72      0.81        76
           1       0.86      0.71      0.78       185
           2       0.00      0.00      0.00         0

    accuracy                           0.72       261
   macro avg       0.59      0.48      0.53       261
weighted avg       0.88      0.72      0.79       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 1 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.2413586974143982, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.2974, 'eval_samples_per_second': 353.101, 'eval_steps_per_second': 23.54, 'epoch': 1.0}
{'eval_loss': 0.21061715483665466, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.2984, 'eval_samples_per_second': 351.932, 'eval_steps_per_second': 23.462, 'epoch': 2.0}
{'eval_loss': 0.23707613348960876, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2987, 'eval_samples_per_second': 351.468, 'eval_steps_per_second': 23.431, 'epoch': 3.0}
{'eval_loss': 0.2259693294763565, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision'



{'eval_loss': 0.20713800191879272, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2978, 'eval_samples_per_second': 352.573, 'eval_steps_per_second': 23.505, 'epoch': 1.0}
{'eval_loss': 0.5046153664588928, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2919, 'eval_samples_per_second': 359.765, 'eval_steps_per_second': 23.984, 'epoch': 2.0}
{'eval_loss': 0.4379531443119049, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.2937, 'eval_samples_per_second': 357.541, 'eval_steps_per_second': 23.836, 'epoch': 3.0}
{'eval_loss': 0.4750248193740845, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision'

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.98      0.84      0.90       140
           1       0.34      0.88      0.49        50
           2       0.64      0.10      0.17        71

    accuracy                           0.64       261
   macro avg       0.65      0.60      0.52       261
weighted avg       0.76      0.64      0.62       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.98      0.84      0.90       140
           1       0.34      0.88      0.49        50
           2       0.64      0.10      0.17        71

    accuracy                           0.64       261
   macro avg       0.65      0.60      0.52       261
weighted avg       0.76      0.64      0.62       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 2 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.2985483705997467, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.2968, 'eval_samples_per_second': 353.731, 'eval_steps_per_second': 23.582, 'epoch': 1.0}
{'eval_loss': 0.2992590367794037, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.296, 'eval_samples_per_second': 354.686, 'eval_steps_per_second': 23.646, 'epoch': 2.0}
{'eval_loss': 0.27127647399902344, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2945, 'eval_samples_per_second': 356.524, 'eval_steps_per_second': 23.768, 'epoch': 3.0}
{'eval_loss': 0.29621046781539917, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision'



{'eval_loss': 0.21379287540912628, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2937, 'eval_samples_per_second': 357.497, 'eval_steps_per_second': 23.833, 'epoch': 1.0}
{'eval_loss': 0.3284280598163605, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.299, 'eval_samples_per_second': 351.132, 'eval_steps_per_second': 23.409, 'epoch': 2.0}
{'eval_loss': 0.2553551197052002, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.2994, 'eval_samples_per_second': 350.644, 'eval_steps_per_second': 23.376, 'epoch': 3.0}
{'eval_loss': 0.27682802081108093, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision'

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.91      0.86      0.88        90
           1       0.53      0.64      0.58        74
           2       0.75      0.67      0.71        97

    accuracy                           0.72       261
   macro avg       0.73      0.72      0.72       261
weighted avg       0.74      0.72      0.73       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.91      0.86      0.88        90
           1       0.53      0.64      0.58        74
           2       0.75      0.67      0.71        97

    accuracy                           0.72       261
   macro avg       0.73      0.72      0.72       261
weighted avg       0.74      0.72      0.73       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 3 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.42260387539863586, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.2939, 'eval_samples_per_second': 357.222, 'eval_steps_per_second': 23.815, 'epoch': 1.0}
{'eval_loss': 0.3858968913555145, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.2936, 'eval_samples_per_second': 357.646, 'eval_steps_per_second': 23.843, 'epoch': 2.0}
{'eval_loss': 0.43622487783432007, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2898, 'eval_samples_per_second': 362.349, 'eval_steps_per_second': 24.157, 'epoch': 3.0}
{'eval_loss': 0.5062685608863831, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision



{'eval_loss': 0.5591791272163391, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.291, 'eval_samples_per_second': 360.878, 'eval_steps_per_second': 24.059, 'epoch': 1.0}
{'eval_loss': 0.690313994884491, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2907, 'eval_samples_per_second': 361.239, 'eval_steps_per_second': 24.083, 'epoch': 2.0}
{'eval_loss': 0.762001097202301, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.2964, 'eval_samples_per_second': 354.192, 'eval_steps_per_second': 23.613, 'epoch': 3.0}
{'eval_loss': 0.788240909576416, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.79      0.92      0.85       108
           1       0.68      0.59      0.64        79
           2       0.88      0.78      0.83        74

    accuracy                           0.78       261
   macro avg       0.78      0.77      0.77       261
weighted avg       0.78      0.78      0.78       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.79      0.92      0.85       108
           1       0.68      0.59      0.64        79
           2       0.88      0.78      0.83        74

    accuracy                           0.78       261
   macro avg       0.78      0.77      0.77       261
weighted avg       0.78      0.78      0.78       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 16 4 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.32730022072792053, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.2621, 'eval_samples_per_second': 400.609, 'eval_steps_per_second': 26.707, 'epoch': 1.0}
{'eval_loss': 0.24275420606136322, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2725, 'eval_samples_per_second': 385.327, 'eval_steps_per_second': 25.688, 'epoch': 2.0}
{'eval_loss': 0.24764540791511536, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2647, 'eval_samples_per_second': 396.653, 'eval_steps_per_second': 26.444, 'epoch': 3.0}
{'eval_loss': 0.24954868853092194, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precisi



{'eval_loss': 0.1750779151916504, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.2674, 'eval_samples_per_second': 392.653, 'eval_steps_per_second': 26.177, 'epoch': 1.0}
{'eval_loss': 0.32867151498794556, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.2841, 'eval_samples_per_second': 369.626, 'eval_steps_per_second': 24.642, 'epoch': 2.0}
{'eval_loss': 0.2062104493379593, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.2635, 'eval_samples_per_second': 398.458, 'eval_steps_per_second': 26.564, 'epoch': 3.0}
{'eval_loss': 0.24303212761878967, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.35      0.89      0.50         9
           1       0.78      0.79      0.79       205
           2       0.00      0.00      0.00        47

    accuracy                           0.65       261
   macro avg       0.38      0.56      0.43       261
weighted avg       0.63      0.65      0.63       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.35      0.89      0.50         9
           1       0.78      0.79      0.79       205
           2       0.00      0.00      0.00        47

    accuracy                           0.65       261
   macro avg       0.38      0.56      0.43       261
weighted avg       0.63      0.65      0.63       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 0 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.4334985315799713, 'eval_accuracy': 0.819047619047619, 'eval_recall': 0.819047619047619, 'eval_f1': 0.819047619047619, 'eval_precision': 0.819047619047619, 'eval_runtime': 0.2864, 'eval_samples_per_second': 366.647, 'eval_steps_per_second': 13.968, 'epoch': 1.0}
{'eval_loss': 0.39518964290618896, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2919, 'eval_samples_per_second': 359.741, 'eval_steps_per_second': 13.704, 'epoch': 2.0}
{'eval_loss': 0.38969144225120544, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.2949, 'eval_samples_per_second': 356.032, 'eval_steps_per_second': 13.563, 'epoch': 3.0}
{'eval_loss': 0.3968445062637329, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0



{'eval_loss': 0.5101266503334045, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.2901, 'eval_samples_per_second': 361.926, 'eval_steps_per_second': 13.788, 'epoch': 1.0}
{'eval_loss': 0.4429044723510742, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.2913, 'eval_samples_per_second': 360.504, 'eval_steps_per_second': 13.734, 'epoch': 2.0}
{'eval_loss': 0.4190234839916229, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.2905, 'eval_samples_per_second': 361.464, 'eval_steps_per_second': 13.77, 'epoch': 3.0}
{'eval_loss': 0.4033786654472351, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.92      0.87      0.89        76
           1       0.93      0.71      0.80       185
           2       0.00      0.00      0.00         0

    accuracy                           0.75       261
   macro avg       0.62      0.53      0.57       261
weighted avg       0.93      0.75      0.83       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.92      0.87      0.89        76
           1       0.93      0.71      0.80       185
           2       0.00      0.00      0.00         0

    accuracy                           0.75       261
   macro avg       0.62      0.53      0.57       261
weighted avg       0.93      0.75      0.83       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 1 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.26808950304985046, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2816, 'eval_samples_per_second': 372.876, 'eval_steps_per_second': 14.205, 'epoch': 1.0}
{'eval_loss': 0.2623831331729889, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.2836, 'eval_samples_per_second': 370.219, 'eval_steps_per_second': 14.104, 'epoch': 2.0}
{'eval_loss': 0.24851356446743011, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2868, 'eval_samples_per_second': 366.157, 'eval_steps_per_second': 13.949, 'epoch': 3.0}
{'eval_loss': 0.30841195583343506, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precisio



{'eval_loss': 0.20519423484802246, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.282, 'eval_samples_per_second': 372.402, 'eval_steps_per_second': 14.187, 'epoch': 1.0}
{'eval_loss': 0.26663830876350403, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2859, 'eval_samples_per_second': 367.258, 'eval_steps_per_second': 13.991, 'epoch': 2.0}
{'eval_loss': 0.35381680727005005, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.2921, 'eval_samples_per_second': 359.502, 'eval_steps_per_second': 13.695, 'epoch': 3.0}
{'eval_loss': 0.30644142627716064, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precisio

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.97      0.80      0.88       140
           1       0.32      0.82      0.46        50
           2       0.62      0.14      0.23        71

    accuracy                           0.62       261
   macro avg       0.64      0.59      0.52       261
weighted avg       0.75      0.62      0.62       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.97      0.80      0.88       140
           1       0.32      0.82      0.46        50
           2       0.62      0.14      0.23        71

    accuracy                           0.62       261
   macro avg       0.64      0.59      0.52       261
weighted avg       0.75      0.62      0.62       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 2 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.45356670022010803, 'eval_accuracy': 0.8285714285714286, 'eval_recall': 0.8285714285714286, 'eval_f1': 0.8285714285714286, 'eval_precision': 0.8285714285714286, 'eval_runtime': 0.2795, 'eval_samples_per_second': 375.684, 'eval_steps_per_second': 14.312, 'epoch': 1.0}
{'eval_loss': 0.22519293427467346, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.2761, 'eval_samples_per_second': 380.36, 'eval_steps_per_second': 14.49, 'epoch': 2.0}
{'eval_loss': 0.1658877730369568, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.2857, 'eval_samples_per_second': 367.528, 'eval_steps_per_second': 14.001, 'epoch': 3.0}
{'eval_loss': 0.1648276150226593, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision':



{'eval_loss': 0.19693805277347565, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.2794, 'eval_samples_per_second': 375.8, 'eval_steps_per_second': 14.316, 'epoch': 1.0}
{'eval_loss': 0.10408980399370193, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.2816, 'eval_samples_per_second': 372.901, 'eval_steps_per_second': 14.206, 'epoch': 2.0}
{'eval_loss': 0.38488584756851196, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.2865, 'eval_samples_per_second': 366.499, 'eval_steps_per_second': 13.962, 'epoch': 3.0}
{'eval_loss': 0.3564707040786743, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision'

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.99      0.78      0.87        90
           1       0.52      0.73      0.61        74
           2       0.77      0.69      0.73        97

    accuracy                           0.73       261
   macro avg       0.76      0.73      0.74       261
weighted avg       0.77      0.73      0.74       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.99      0.78      0.87        90
           1       0.52      0.73      0.61        74
           2       0.77      0.69      0.73        97

    accuracy                           0.73       261
   macro avg       0.76      0.73      0.74       261
weighted avg       0.77      0.73      0.74       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 3 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.4659029245376587, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.2927, 'eval_samples_per_second': 358.772, 'eval_steps_per_second': 13.667, 'epoch': 1.0}
{'eval_loss': 0.28249603509902954, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.2861, 'eval_samples_per_second': 367.003, 'eval_steps_per_second': 13.981, 'epoch': 2.0}
{'eval_loss': 0.4219551384449005, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.2884, 'eval_samples_per_second': 364.056, 'eval_steps_per_second': 13.869, 'epoch': 3.0}
{'eval_loss': 0.4324091672897339, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision'



{'eval_loss': 0.5702294707298279, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 0.8666666666666667, 'eval_runtime': 0.2915, 'eval_samples_per_second': 360.197, 'eval_steps_per_second': 13.722, 'epoch': 1.0}
{'eval_loss': 0.5556296110153198, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2904, 'eval_samples_per_second': 361.552, 'eval_steps_per_second': 13.773, 'epoch': 2.0}
{'eval_loss': 0.5056846141815186, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2843, 'eval_samples_per_second': 369.363, 'eval_steps_per_second': 14.071, 'epoch': 3.0}
{'eval_loss': 0.5094501972198486, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.77      0.95      0.85       108
           1       0.73      0.54      0.62        79
           2       0.90      0.84      0.87        74

    accuracy                           0.80       261
   macro avg       0.80      0.78      0.78       261
weighted avg       0.80      0.80      0.79       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.77      0.95      0.85       108
           1       0.73      0.54      0.62        79
           2       0.90      0.84      0.87        74

    accuracy                           0.80       261
   macro avg       0.80      0.78      0.78       261
weighted avg       0.80      0.80      0.79       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  4 32 4 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.3942722976207733, 'eval_accuracy': 0.8380952380952381, 'eval_recall': 0.8380952380952381, 'eval_f1': 0.8380952380952381, 'eval_precision': 0.8380952380952381, 'eval_runtime': 0.2564, 'eval_samples_per_second': 409.543, 'eval_steps_per_second': 15.602, 'epoch': 1.0}
{'eval_loss': 0.24950525164604187, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.2513, 'eval_samples_per_second': 417.802, 'eval_steps_per_second': 15.916, 'epoch': 2.0}
{'eval_loss': 0.16022484004497528, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.2504, 'eval_samples_per_second': 419.374, 'eval_steps_per_second': 15.976, 'epoch': 3.0}
{'eval_loss': 0.1769201159477234, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision



{'eval_loss': 0.16849933564662933, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.2538, 'eval_samples_per_second': 413.728, 'eval_steps_per_second': 15.761, 'epoch': 1.0}
{'eval_loss': 0.14871931076049805, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2573, 'eval_samples_per_second': 408.077, 'eval_steps_per_second': 15.546, 'epoch': 2.0}
{'eval_loss': 0.13156946003437042, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.2516, 'eval_samples_per_second': 417.382, 'eval_steps_per_second': 15.9, 'epoch': 3.0}
{'eval_loss': 0.12909843027591705, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.39      1.00      0.56         9
           1       0.91      0.90      0.90       205
           2       0.78      0.60      0.67        47

    accuracy                           0.85       261
   macro avg       0.69      0.83      0.71       261
weighted avg       0.87      0.85      0.85       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.39      1.00      0.56         9
           1       0.91      0.90      0.90       205
           2       0.78      0.60      0.67        47

    accuracy                           0.85       261
   macro avg       0.69      0.83      0.71       261
weighted avg       0.87      0.85      0.85       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 0 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'H

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.6307787299156189, 'eval_accuracy': 0.8571428571428571, 'eval_recall': 0.8571428571428571, 'eval_f1': 0.8571428571428571, 'eval_precision': 0.8571428571428571, 'eval_runtime': 0.3795, 'eval_samples_per_second': 276.715, 'eval_steps_per_second': 36.895, 'epoch': 1.0}
{'eval_loss': 0.5363133549690247, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.3742, 'eval_samples_per_second': 280.604, 'eval_steps_per_second': 37.414, 'epoch': 2.0}
{'eval_loss': 0.35069307684898376, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.3797, 'eval_samples_per_second': 276.511, 'eval_steps_per_second': 36.868, 'epoch': 3.0}
{'eval_loss': 0.5682750940322876, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision'



{'eval_loss': 0.7251337170600891, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.3934, 'eval_samples_per_second': 266.935, 'eval_steps_per_second': 35.591, 'epoch': 1.0}
{'eval_loss': 0.5589132905006409, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.3793, 'eval_samples_per_second': 276.853, 'eval_steps_per_second': 36.914, 'epoch': 2.0}
{'eval_loss': 0.5742582082748413, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3779, 'eval_samples_per_second': 277.822, 'eval_steps_per_second': 37.043, 'epoch': 3.0}
{'eval_loss': 0.476683109998703, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.91      0.88      0.89        76
           1       0.94      0.75      0.83       185
           2       0.00      0.00      0.00         0

    accuracy                           0.79       261
   macro avg       0.61      0.54      0.58       261
weighted avg       0.93      0.79      0.85       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.91      0.88      0.89        76
           1       0.94      0.75      0.83       185
           2       0.00      0.00      0.00         0

    accuracy                           0.79       261
   macro avg       0.61      0.54      0.58       261
weighted avg       0.93      0.79      0.85       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 1 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'H

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.314633309841156, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.4015, 'eval_samples_per_second': 261.528, 'eval_steps_per_second': 34.87, 'epoch': 1.0}
{'eval_loss': 0.3177555501461029, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.387, 'eval_samples_per_second': 271.3, 'eval_steps_per_second': 36.173, 'epoch': 2.0}
{'eval_loss': 0.29892992973327637, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.3837, 'eval_samples_per_second': 273.652, 'eval_steps_per_second': 36.487, 'epoch': 3.0}
{'eval_loss': 0.34442368149757385, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.



{'eval_loss': 0.577813446521759, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.38, 'eval_samples_per_second': 276.328, 'eval_steps_per_second': 36.844, 'epoch': 1.0}
{'eval_loss': 0.2704566717147827, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.3862, 'eval_samples_per_second': 271.872, 'eval_steps_per_second': 36.25, 'epoch': 2.0}
{'eval_loss': 0.5316863059997559, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.3884, 'eval_samples_per_second': 270.335, 'eval_steps_per_second': 36.045, 'epoch': 3.0}
{'eval_loss': 0.5027703046798706, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.98      0.90      0.94       140
           1       0.58      0.82      0.68        50
           2       0.89      0.76      0.82        71

    accuracy                           0.85       261
   macro avg       0.81      0.83      0.81       261
weighted avg       0.88      0.85      0.85       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.98      0.90      0.94       140
           1       0.58      0.82      0.68        50
           2       0.89      0.76      0.82        71

    accuracy                           0.85       261
   macro avg       0.81      0.83      0.81       261
weighted avg       0.88      0.85      0.85       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 2 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'H

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.3422563076019287, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.393, 'eval_samples_per_second': 267.16, 'eval_steps_per_second': 35.621, 'epoch': 1.0}
{'eval_loss': 0.2922799587249756, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.3822, 'eval_samples_per_second': 274.733, 'eval_steps_per_second': 36.631, 'epoch': 2.0}
{'eval_loss': 0.24539609253406525, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.3773, 'eval_samples_per_second': 278.315, 'eval_steps_per_second': 37.109, 'epoch': 3.0}
{'eval_loss': 0.1959085315465927, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 



{'eval_loss': 0.33524057269096375, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.4075, 'eval_samples_per_second': 257.675, 'eval_steps_per_second': 34.357, 'epoch': 1.0}
{'eval_loss': 0.26512035727500916, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.3831, 'eval_samples_per_second': 274.081, 'eval_steps_per_second': 36.544, 'epoch': 2.0}
{'eval_loss': 0.29486995935440063, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.3944, 'eval_samples_per_second': 266.255, 'eval_steps_per_second': 35.501, 'epoch': 3.0}
{'eval_loss': 0.4050534665584564, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precisio

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.94      0.87      0.90        90
           1       0.67      0.68      0.67        74
           2       0.81      0.86      0.83        97

    accuracy                           0.81       261
   macro avg       0.80      0.80      0.80       261
weighted avg       0.81      0.81      0.81       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.94      0.87      0.90        90
           1       0.67      0.68      0.67        74
           2       0.81      0.86      0.83        97

    accuracy                           0.81       261
   macro avg       0.80      0.80      0.80       261
weighted avg       0.81      0.81      0.81       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 3 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'H

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.26645708084106445, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3782, 'eval_samples_per_second': 277.645, 'eval_steps_per_second': 37.019, 'epoch': 1.0}
{'eval_loss': 0.39748939871788025, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.3763, 'eval_samples_per_second': 279.069, 'eval_steps_per_second': 37.209, 'epoch': 2.0}
{'eval_loss': 0.7260977625846863, 'eval_accuracy': 0.8571428571428571, 'eval_recall': 0.8571428571428571, 'eval_f1': 0.8571428571428571, 'eval_precision': 0.8571428571428571, 'eval_runtime': 0.4119, 'eval_samples_per_second': 254.927, 'eval_steps_per_second': 33.99, 'epoch': 3.0}
{'eval_loss': 0.71003258228302, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 



{'eval_loss': 0.727177619934082, 'eval_accuracy': 0.8571428571428571, 'eval_recall': 0.8571428571428571, 'eval_f1': 0.8571428571428571, 'eval_precision': 0.8571428571428571, 'eval_runtime': 0.3756, 'eval_samples_per_second': 279.534, 'eval_steps_per_second': 37.271, 'epoch': 1.0}
{'eval_loss': 0.7533581852912903, 'eval_accuracy': 0.8380952380952381, 'eval_recall': 0.8380952380952381, 'eval_f1': 0.8380952380952381, 'eval_precision': 0.8380952380952381, 'eval_runtime': 0.3733, 'eval_samples_per_second': 281.279, 'eval_steps_per_second': 37.504, 'epoch': 2.0}
{'eval_loss': 0.6778311729431152, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.3794, 'eval_samples_per_second': 276.769, 'eval_steps_per_second': 36.903, 'epoch': 3.0}
{'eval_loss': 0.8253631591796875, 'eval_accuracy': 0.8666666666666667, 'eval_recall': 0.8666666666666667, 'eval_f1': 0.8666666666666667, 'eval_precision': 

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.78      0.94      0.85       108
           1       0.72      0.53      0.61        79
           2       0.86      0.86      0.86        74

    accuracy                           0.79       261
   macro avg       0.79      0.78      0.78       261
weighted avg       0.79      0.79      0.78       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.78      0.94      0.85       108
           1       0.72      0.53      0.61        79
           2       0.86      0.86      0.86        74

    accuracy                           0.79       261
   macro avg       0.79      0.78      0.78       261
weighted avg       0.79      0.79      0.78       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 8 4 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', 'H

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.253418505191803, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.3817, 'eval_samples_per_second': 275.099, 'eval_steps_per_second': 36.68, 'epoch': 1.0}
{'eval_loss': 0.2151954621076584, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.39, 'eval_samples_per_second': 269.226, 'eval_steps_per_second': 35.897, 'epoch': 2.0}
{'eval_loss': 0.15374314785003662, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.3878, 'eval_samples_per_second': 270.755, 'eval_steps_per_second': 36.101, 'epoch': 3.0}
{'eval_loss': 0.1339833289384842, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision': 0.



{'eval_loss': 0.45487555861473083, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.3928, 'eval_samples_per_second': 267.29, 'eval_steps_per_second': 35.639, 'epoch': 1.0}
{'eval_loss': 0.5671329498291016, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.4012, 'eval_samples_per_second': 261.743, 'eval_steps_per_second': 34.899, 'epoch': 2.0}
{'eval_loss': 0.2537814676761627, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.3947, 'eval_samples_per_second': 266.055, 'eval_steps_per_second': 35.474, 'epoch': 3.0}
{'eval_loss': 0.1667623072862625, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.27      0.89      0.41         9
           1       0.84      0.83      0.84       205
           2       0.44      0.26      0.32        47

    accuracy                           0.73       261
   macro avg       0.52      0.66      0.52       261
weighted avg       0.75      0.73      0.73       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.27      0.89      0.41         9
           1       0.84      0.83      0.84       205
           2       0.44      0.26      0.32        47

    accuracy                           0.73       261
   macro avg       0.52      0.66      0.52       261
weighted avg       0.75      0.73      0.73       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 16 0 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.417160302400589, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.3041, 'eval_samples_per_second': 345.264, 'eval_steps_per_second': 23.018, 'epoch': 1.0}
{'eval_loss': 0.4101192355155945, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.3063, 'eval_samples_per_second': 342.842, 'eval_steps_per_second': 22.856, 'epoch': 2.0}
{'eval_loss': 0.41509440541267395, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.3167, 'eval_samples_per_second': 331.542, 'eval_steps_per_second': 22.103, 'epoch': 3.0}
{'eval_loss': 0.47616976499557495, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision'



{'eval_loss': 0.588147759437561, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.3053, 'eval_samples_per_second': 343.887, 'eval_steps_per_second': 22.926, 'epoch': 1.0}
{'eval_loss': 0.6246115565299988, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.3082, 'eval_samples_per_second': 340.725, 'eval_steps_per_second': 22.715, 'epoch': 2.0}
{'eval_loss': 0.5435302257537842, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.3043, 'eval_samples_per_second': 345.029, 'eval_steps_per_second': 23.002, 'epoch': 3.0}
{'eval_loss': 0.48911651968955994, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.89      0.82      0.85        76
           1       0.90      0.69      0.78       185
           2       0.00      0.00      0.00         0

    accuracy                           0.72       261
   macro avg       0.60      0.50      0.54       261
weighted avg       0.90      0.72      0.80       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.89      0.82      0.85        76
           1       0.90      0.69      0.78       185
           2       0.00      0.00      0.00         0

    accuracy                           0.72       261
   macro avg       0.60      0.50      0.54       261
weighted avg       0.90      0.72      0.80       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 16 1 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.24769343435764313, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.305, 'eval_samples_per_second': 344.254, 'eval_steps_per_second': 22.95, 'epoch': 1.0}
{'eval_loss': 0.2762012779712677, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.2964, 'eval_samples_per_second': 354.193, 'eval_steps_per_second': 23.613, 'epoch': 2.0}
{'eval_loss': 0.28450390696525574, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3053, 'eval_samples_per_second': 343.904, 'eval_steps_per_second': 22.927, 'epoch': 3.0}
{'eval_loss': 0.32058900594711304, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision'



{'eval_loss': 0.4005185067653656, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3044, 'eval_samples_per_second': 344.989, 'eval_steps_per_second': 22.999, 'epoch': 1.0}
{'eval_loss': 0.5224425792694092, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.299, 'eval_samples_per_second': 351.151, 'eval_steps_per_second': 23.41, 'epoch': 2.0}
{'eval_loss': 0.31389501690864563, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.3011, 'eval_samples_per_second': 348.714, 'eval_steps_per_second': 23.248, 'epoch': 3.0}
{'eval_loss': 0.3326341509819031, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.97      0.81      0.89       140
           1       0.35      0.88      0.50        50
           2       0.82      0.20      0.32        71

    accuracy                           0.66       261
   macro avg       0.71      0.63      0.57       261
weighted avg       0.81      0.66      0.66       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.97      0.81      0.89       140
           1       0.35      0.88      0.50        50
           2       0.82      0.20      0.32        71

    accuracy                           0.66       261
   macro avg       0.71      0.63      0.57       261
weighted avg       0.81      0.66      0.66       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 16 2 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.28557178378105164, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.2983, 'eval_samples_per_second': 352.038, 'eval_steps_per_second': 23.469, 'epoch': 1.0}
{'eval_loss': 0.2939890921115875, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.2991, 'eval_samples_per_second': 351.007, 'eval_steps_per_second': 23.4, 'epoch': 2.0}
{'eval_loss': 0.32220378518104553, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.2996, 'eval_samples_per_second': 350.459, 'eval_steps_per_second': 23.364, 'epoch': 3.0}
{'eval_loss': 0.2614929974079132, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision':



{'eval_loss': 0.2477284073829651, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.2954, 'eval_samples_per_second': 355.505, 'eval_steps_per_second': 23.7, 'epoch': 1.0}
{'eval_loss': 0.3883025646209717, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.295, 'eval_samples_per_second': 355.925, 'eval_steps_per_second': 23.728, 'epoch': 2.0}
{'eval_loss': 0.4795609712600708, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2963, 'eval_samples_per_second': 354.427, 'eval_steps_per_second': 23.628, 'epoch': 3.0}
{'eval_loss': 0.54767245054245, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.94

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.94      0.89      0.91        90
           1       0.48      0.66      0.55        74
           2       0.73      0.55      0.62        97

    accuracy                           0.70       261
   macro avg       0.71      0.70      0.70       261
weighted avg       0.73      0.70      0.70       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.94      0.89      0.91        90
           1       0.48      0.66      0.55        74
           2       0.73      0.55      0.62        97

    accuracy                           0.70       261
   macro avg       0.71      0.70      0.70       261
weighted avg       0.73      0.70      0.70       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 16 3 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.4070644676685333, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.3006, 'eval_samples_per_second': 349.35, 'eval_steps_per_second': 23.29, 'epoch': 1.0}
{'eval_loss': 0.38019904494285583, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.2944, 'eval_samples_per_second': 356.652, 'eval_steps_per_second': 23.777, 'epoch': 2.0}
{'eval_loss': 0.5144875049591064, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2967, 'eval_samples_per_second': 353.92, 'eval_steps_per_second': 23.595, 'epoch': 3.0}
{'eval_loss': 0.6366152763366699, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0



{'eval_loss': 0.8818289041519165, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.2938, 'eval_samples_per_second': 357.376, 'eval_steps_per_second': 23.825, 'epoch': 1.0}
{'eval_loss': 0.6890614032745361, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.3027, 'eval_samples_per_second': 346.856, 'eval_steps_per_second': 23.124, 'epoch': 2.0}
{'eval_loss': 1.0109528303146362, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.293, 'eval_samples_per_second': 358.344, 'eval_steps_per_second': 23.89, 'epoch': 3.0}
{'eval_loss': 0.8726605176925659, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.78      0.94      0.85       108
           1       0.74      0.53      0.62        79
           2       0.86      0.86      0.86        74

    accuracy                           0.79       261
   macro avg       0.79      0.78      0.78       261
weighted avg       0.79      0.79      0.78       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.78      0.94      0.85       108
           1       0.74      0.53      0.62        79
           2       0.86      0.86      0.86        74

    accuracy                           0.79       261
   macro avg       0.79      0.78      0.78       261
weighted avg       0.79      0.79      0.78       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 16 4 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.303975909948349, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.2715, 'eval_samples_per_second': 386.693, 'eval_steps_per_second': 25.78, 'epoch': 1.0}
{'eval_loss': 0.12338095903396606, 'eval_accuracy': 0.9809523809523809, 'eval_recall': 0.9809523809523809, 'eval_f1': 0.9809523809523809, 'eval_precision': 0.9809523809523809, 'eval_runtime': 0.2778, 'eval_samples_per_second': 377.953, 'eval_steps_per_second': 25.197, 'epoch': 2.0}
{'eval_loss': 0.1712164431810379, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2718, 'eval_samples_per_second': 386.364, 'eval_steps_per_second': 25.758, 'epoch': 3.0}
{'eval_loss': 0.16073724627494812, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision':



{'eval_loss': 0.32808491587638855, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.277, 'eval_samples_per_second': 379.069, 'eval_steps_per_second': 25.271, 'epoch': 1.0}
{'eval_loss': 0.29222673177719116, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2735, 'eval_samples_per_second': 383.873, 'eval_steps_per_second': 25.592, 'epoch': 2.0}
{'eval_loss': 0.3979768753051758, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.278, 'eval_samples_per_second': 377.716, 'eval_steps_per_second': 25.181, 'epoch': 3.0}
{'eval_loss': 0.49057066440582275, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision'

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.40      0.89      0.55         9
           1       0.77      0.73      0.75       205
           2       0.00      0.00      0.00        47

    accuracy                           0.61       261
   macro avg       0.39      0.54      0.43       261
weighted avg       0.62      0.61      0.61       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.40      0.89      0.55         9
           1       0.77      0.73      0.75       205
           2       0.00      0.00      0.00        47

    accuracy                           0.61       261
   macro avg       0.39      0.54      0.43       261
weighted avg       0.62      0.61      0.61       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 32 0 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.436544805765152, 'eval_accuracy': 0.8095238095238095, 'eval_recall': 0.8095238095238095, 'eval_f1': 0.8095238095238095, 'eval_precision': 0.8095238095238095, 'eval_runtime': 0.2887, 'eval_samples_per_second': 363.706, 'eval_steps_per_second': 13.855, 'epoch': 1.0}
{'eval_loss': 0.44799888134002686, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.2867, 'eval_samples_per_second': 366.243, 'eval_steps_per_second': 13.952, 'epoch': 2.0}
{'eval_loss': 0.3957487940788269, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2886, 'eval_samples_per_second': 363.785, 'eval_steps_per_second': 13.858, 'epoch': 3.0}
{'eval_loss': 0.453372061252594, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 



{'eval_loss': 0.539080023765564, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.289, 'eval_samples_per_second': 363.368, 'eval_steps_per_second': 13.843, 'epoch': 1.0}
{'eval_loss': 0.404153436422348, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.293, 'eval_samples_per_second': 358.309, 'eval_steps_per_second': 13.65, 'epoch': 2.0}
{'eval_loss': 0.6199252009391785, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.292, 'eval_samples_per_second': 359.645, 'eval_steps_per_second': 13.701, 'epoch': 3.0}
{'eval_loss': 0.6162952184677124, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.904

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.89      0.78      0.83        76
           1       0.88      0.65      0.75       185
           2       0.00      0.00      0.00         0

    accuracy                           0.69       261
   macro avg       0.59      0.47      0.53       261
weighted avg       0.88      0.69      0.77       261



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           0       0.89      0.78      0.83        76
           1       0.88      0.65      0.75       185
           2       0.00      0.00      0.00         0

    accuracy                           0.69       261
   macro avg       0.59      0.47      0.53       261
weighted avg       0.88      0.69      0.77       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 32 1 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.24762573838233948, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.2856, 'eval_samples_per_second': 367.641, 'eval_steps_per_second': 14.005, 'epoch': 1.0}
{'eval_loss': 0.26192963123321533, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.2846, 'eval_samples_per_second': 368.959, 'eval_steps_per_second': 14.056, 'epoch': 2.0}
{'eval_loss': 0.2621002495288849, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.2916, 'eval_samples_per_second': 360.114, 'eval_steps_per_second': 13.719, 'epoch': 3.0}
{'eval_loss': 0.2408839762210846, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision



{'eval_loss': 0.23304209113121033, 'eval_accuracy': 0.9238095238095239, 'eval_recall': 0.9238095238095239, 'eval_f1': 0.9238095238095239, 'eval_precision': 0.9238095238095239, 'eval_runtime': 0.2883, 'eval_samples_per_second': 364.197, 'eval_steps_per_second': 13.874, 'epoch': 1.0}
{'eval_loss': 0.43333473801612854, 'eval_accuracy': 0.9047619047619048, 'eval_recall': 0.9047619047619048, 'eval_f1': 0.9047619047619048, 'eval_precision': 0.9047619047619048, 'eval_runtime': 0.2897, 'eval_samples_per_second': 362.429, 'eval_steps_per_second': 13.807, 'epoch': 2.0}
{'eval_loss': 0.30443909764289856, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.2876, 'eval_samples_per_second': 365.151, 'eval_steps_per_second': 13.911, 'epoch': 3.0}
{'eval_loss': 0.2685248553752899, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precisio

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.98      0.86      0.92       140
           1       0.46      0.86      0.60        50
           2       0.89      0.55      0.68        71

    accuracy                           0.78       261
   macro avg       0.78      0.76      0.73       261
weighted avg       0.86      0.78      0.79       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.98      0.86      0.92       140
           1       0.46      0.86      0.60        50
           2       0.89      0.55      0.68        71

    accuracy                           0.78       261
   macro avg       0.78      0.76      0.73       261
weighted avg       0.86      0.78      0.79       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 32 2 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.4610474109649658, 'eval_accuracy': 0.8285714285714286, 'eval_recall': 0.8285714285714286, 'eval_f1': 0.8285714285714286, 'eval_precision': 0.8285714285714286, 'eval_runtime': 0.2808, 'eval_samples_per_second': 373.972, 'eval_steps_per_second': 14.247, 'epoch': 1.0}
{'eval_loss': 0.2062986046075821, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.2854, 'eval_samples_per_second': 367.847, 'eval_steps_per_second': 14.013, 'epoch': 2.0}
{'eval_loss': 0.2006743997335434, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.2882, 'eval_samples_per_second': 364.294, 'eval_steps_per_second': 13.878, 'epoch': 3.0}
{'eval_loss': 0.10665684938430786, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision'



{'eval_loss': 0.30329596996307373, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.2776, 'eval_samples_per_second': 378.258, 'eval_steps_per_second': 14.41, 'epoch': 1.0}
{'eval_loss': 0.23295170068740845, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision': 0.9619047619047619, 'eval_runtime': 0.2806, 'eval_samples_per_second': 374.135, 'eval_steps_per_second': 14.253, 'epoch': 2.0}
{'eval_loss': 0.3535411059856415, 'eval_accuracy': 0.9523809523809523, 'eval_recall': 0.9523809523809523, 'eval_f1': 0.9523809523809523, 'eval_precision': 0.9523809523809523, 'eval_runtime': 0.2792, 'eval_samples_per_second': 376.093, 'eval_steps_per_second': 14.327, 'epoch': 3.0}
{'eval_loss': 0.13431000709533691, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.93      0.83      0.88        90
           1       0.66      0.66      0.66        74
           2       0.81      0.89      0.85        97

    accuracy                           0.80       261
   macro avg       0.80      0.79      0.80       261
weighted avg       0.81      0.80      0.81       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.93      0.83      0.88        90
           1       0.66      0.66      0.66        74
           2       0.81      0.89      0.85        97

    accuracy                           0.80       261
   macro avg       0.80      0.79      0.80       261
weighted avg       0.81      0.80      0.81       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 32 3 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.4755781590938568, 'eval_accuracy': 0.8476190476190476, 'eval_recall': 0.8476190476190476, 'eval_f1': 0.8476190476190476, 'eval_precision': 0.8476190476190476, 'eval_runtime': 0.2834, 'eval_samples_per_second': 370.49, 'eval_steps_per_second': 14.114, 'epoch': 1.0}
{'eval_loss': 0.29808706045150757, 'eval_accuracy': 0.9142857142857143, 'eval_recall': 0.9142857142857143, 'eval_f1': 0.9142857142857143, 'eval_precision': 0.9142857142857143, 'eval_runtime': 0.2948, 'eval_samples_per_second': 356.147, 'eval_steps_per_second': 13.568, 'epoch': 2.0}
{'eval_loss': 0.5040914416313171, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision': 0.8857142857142857, 'eval_runtime': 0.2911, 'eval_samples_per_second': 360.649, 'eval_steps_per_second': 13.739, 'epoch': 3.0}
{'eval_loss': 0.4442352056503296, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision':



{'eval_loss': 0.5475656390190125, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.2883, 'eval_samples_per_second': 364.189, 'eval_steps_per_second': 13.874, 'epoch': 1.0}
{'eval_loss': 0.9440596103668213, 'eval_accuracy': 0.8761904761904762, 'eval_recall': 0.8761904761904762, 'eval_f1': 0.8761904761904762, 'eval_precision': 0.8761904761904762, 'eval_runtime': 0.2913, 'eval_samples_per_second': 360.515, 'eval_steps_per_second': 13.734, 'epoch': 2.0}
{'eval_loss': 0.5161827206611633, 'eval_accuracy': 0.8952380952380953, 'eval_recall': 0.8952380952380953, 'eval_f1': 0.8952380952380953, 'eval_precision': 0.8952380952380953, 'eval_runtime': 0.2885, 'eval_samples_per_second': 363.923, 'eval_steps_per_second': 13.864, 'epoch': 3.0}
{'eval_loss': 0.6852839589118958, 'eval_accuracy': 0.8857142857142857, 'eval_recall': 0.8857142857142857, 'eval_f1': 0.8857142857142857, 'eval_precision':

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.79      0.94      0.86       108
           1       0.76      0.57      0.65        79
           2       0.88      0.88      0.88        74

    accuracy                           0.81       261
   macro avg       0.81      0.79      0.80       261
weighted avg       0.81      0.81      0.80       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.79      0.94      0.86       108
           1       0.76      0.57      0.65        79
           2       0.88      0.88      0.88        74

    accuracy                           0.81       261
   macro avg       0.81      0.79      0.80       261
weighted avg       0.81      0.81      0.80       261


***********************************************************************************


**************** The number of epochs, batch_size and fold respectively are:  8 32 4 ************************

Index(['Premise', 'Hypothesis', 'label', 'text'], dtype='object')
train rows: 938
eval rows: 105
test rows: 261
test rows: 261
DatasetDict({
    train: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 938
    })
    validation: Dataset({
        features: ['Premise', 'Hypothesis', 'label', 'text'],
        num_rows: 105
    })
    test: Dataset({
        features: ['Premise', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['text'].str.replace('\\', ' ', regex=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['text'] = data['Premise'] + " [SEP] " + data['Hypothesis']
A value is tr

Map:   0%|          | 0/938 [00:00<?, ? examples/s]

Map:   0%|          | 0/105 [00:00<?, ? examples/s]



{'eval_loss': 0.38153305649757385, 'eval_accuracy': 0.819047619047619, 'eval_recall': 0.819047619047619, 'eval_f1': 0.819047619047619, 'eval_precision': 0.819047619047619, 'eval_runtime': 0.2516, 'eval_samples_per_second': 417.405, 'eval_steps_per_second': 15.901, 'epoch': 1.0}
{'eval_loss': 0.1755131334066391, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.251, 'eval_samples_per_second': 418.254, 'eval_steps_per_second': 15.933, 'epoch': 2.0}
{'eval_loss': 0.17747607827186584, 'eval_accuracy': 0.9428571428571428, 'eval_recall': 0.9428571428571428, 'eval_f1': 0.9428571428571428, 'eval_precision': 0.9428571428571428, 'eval_runtime': 0.2531, 'eval_samples_per_second': 414.797, 'eval_steps_per_second': 15.802, 'epoch': 3.0}
{'eval_loss': 0.22840788960456848, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0



{'eval_loss': 0.12901243567466736, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision': 0.9714285714285714, 'eval_runtime': 0.2535, 'eval_samples_per_second': 414.281, 'eval_steps_per_second': 15.782, 'epoch': 1.0}
{'eval_loss': 0.23533864319324493, 'eval_accuracy': 0.9333333333333333, 'eval_recall': 0.9333333333333333, 'eval_f1': 0.9333333333333333, 'eval_precision': 0.9333333333333333, 'eval_runtime': 0.252, 'eval_samples_per_second': 416.733, 'eval_steps_per_second': 15.876, 'epoch': 2.0}
{'eval_loss': 0.16531966626644135, 'eval_accuracy': 0.9714285714285714, 'eval_recall': 0.9714285714285714, 'eval_f1': 0.9714285714285714, 'eval_precision': 0.9714285714285714, 'eval_runtime': 0.2527, 'eval_samples_per_second': 415.502, 'eval_steps_per_second': 15.829, 'epoch': 3.0}
{'eval_loss': 0.2755979895591736, 'eval_accuracy': 0.9619047619047619, 'eval_recall': 0.9619047619047619, 'eval_f1': 0.9619047619047619, 'eval_precision

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.22      0.89      0.36         9
           1       0.83      0.78      0.80       205
           2       0.39      0.28      0.33        47

    accuracy                           0.69       261
   macro avg       0.48      0.65      0.49       261
weighted avg       0.73      0.69      0.70       261



Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Map:   0%|          | 0/261 [00:00<?, ? examples/s]

              precision    recall  f1-score   support

           0       0.22      0.89      0.36         9
           1       0.83      0.78      0.80       205
           2       0.39      0.28      0.33        47

    accuracy                           0.69       261
   macro avg       0.48      0.65      0.49       261
weighted avg       0.73      0.69      0.70       261



In [98]:
all_scores_deberta = all_scores

In [99]:
torch.cuda.empty_cache() 

In [100]:
all_scores

[[{'accuracy': 0.8697318007662835,
   'recall': 0.8697318007662835,
   'f1': 0.8697318007662835,
   'precision': 0.8697318007662835,
   'fold': 0,
   'model_name': 'cross-encoder/nli-deberta-v3-base'},
  {'accuracy': 0.7126436781609196,
   'recall': 0.7126436781609196,
   'f1': 0.7126436781609196,
   'precision': 0.7126436781609196,
   'fold': 1,
   'model_name': 'cross-encoder/nli-deberta-v3-base'},
  {'accuracy': 0.7816091954022989,
   'recall': 0.7816091954022989,
   'f1': 0.781609195402299,
   'precision': 0.7816091954022989,
   'fold': 2,
   'model_name': 'cross-encoder/nli-deberta-v3-base'},
  {'accuracy': 0.7777777777777778,
   'recall': 0.7777777777777778,
   'f1': 0.7777777777777778,
   'precision': 0.7777777777777778,
   'fold': 3,
   'model_name': 'cross-encoder/nli-deberta-v3-base'},
  {'accuracy': 0.6628352490421456,
   'recall': 0.6628352490421456,
   'f1': 0.6628352490421456,
   'precision': 0.6628352490421456,
   'fold': 4,
   'model_name': 'cross-encoder/nli-deberta-v3

### "ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli"

Tokenize the texts:

In [None]:
# from transformers.models.deberta.modeling_deberta import DebertaModel, DebertaPreTrainedModel

In [None]:
# from transformers import DebertaForSequenceClassification
# model = DebertaForSequenceClassification.from_pretrained(models[0])
# model.train()

Create the transformer model:

In [None]:
# from torch import nn
# from transformers.modeling_outputs import SequenceClassifierOutput

# from transformers.models.bert.modeling_bert import BertModel, BertPreTrainedModel

# #BERT, SentenceTransformer pretrained model
# # https://github.com/huggingface/transformers/blob/65659a29cf5a079842e61a63d57fa24474288998/src/transformers/models/bert/modeling_bert.py#L1486

# class BertForSequenceClassification(BertPreTrainedModel):
#     def __init__(self, config):
#         super().__init__(config)
#         self.num_labels = config.num_labels
#         self.bert = BertModel(config)
#         self.dropout = nn.Dropout(config.hidden_dropout_prob)
#         self.classifier = nn.Linear(config.hidden_size, config.num_labels)
#         self.init_weights()
        
#     def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, labels=None, **kwargs):
#         outputs = self.bert(
#             input_ids,
#             attention_mask=attention_mask,
#             token_type_ids=token_type_ids,
#             **kwargs,
#         )
#         cls_outputs = outputs.last_hidden_state[:, 0, :]
#         cls_outputs = self.dropout(cls_outputs)
#         logits = self.classifier(cls_outputs)
#         loss = None
#         if labels is not None:
#             loss_fn = nn.CrossEntropyLoss()
#             loss = loss_fn(logits, labels)
#         return SequenceClassifierOutput(
#             loss=loss,
#             logits=logits,
#             hidden_states=outputs.hidden_states,
#             attentions=outputs.attentions,
#         )

In [None]:
# sliding window folds 
# 15 
# 10% 
# 130 val
# 130 test
# 1000 train

# 1304 indices
# window size 130

# for fold in folds:
#   train, test, val = fold


In [None]:
# from transformers import AutoConfig

# config = AutoConfig.from_pretrained(
#     transformer_name,
#     num_labels=len(labels),
# )

# model = (
#     BertForSequenceClassification
#     .from_pretrained(transformer_name, config=config)
# )

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

Create the trainer object and train:

In [None]:
# from sklearn.metrics import accuracy_score

# def compute_metrics(eval_pred):
#     y_true = eval_pred.label_ids
#     y_pred = np.argmax(eval_pred.predictions, axis=-1)
#     return {'accuracy': accuracy_score(y_true, y_pred)}