<a href="https://colab.research.google.com/github/bogus1aw/text-classification-benchmark/blob/main/M_herBERT_PolEmo2_0_raw.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HerBERT benchmark for PolEmo2.0 raw dataset 

In [None]:
# check available GPU
!nvidia-smi --query-gpu=gpu_name,driver_version,memory.total --format=csv

name, driver_version, memory.total [MiB]
Tesla T4, 460.32.03, 15109 MiB


In [None]:
%%capture
!pip install datasets transformers

In [None]:
from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'
cuda.is_available()

True

In [None]:
from sklearn.model_selection import train_test_split
import pandas as pd
import datetime
import time
import math
import shutil

In [None]:
def load_corpora_to_dataframe(corpora):
  data = open(corpora).read()
  labels, texts = [], []
  for i, line in enumerate(data.split("\n")):
      content = line.split()
      if len(content) > 0: 
        labels.append(content[-1])
        texts.append(" ".join(content[:-1]))
  # return texts, labels
  # create a dataframe using texts and labels
  trainDF = pd.DataFrame()
  trainDF['texts'] = texts
  trainDF['labels'] = labels
  return trainDF


# 1. load corpora
# 2. create training, test fractions
# 3. create specific No. per class fractions
# 4. create train, val sets
# 5. create new fresh model (tokenizer can stay the same)
# 6. tokenize and encode train, test 


In [None]:
import torch
# from transformers import AutoTokenizer, RobertaForSequenceClassification, EvalPrediction
from transformers import HerbertTokenizer, RobertaForSequenceClassification, EvalPrediction


# tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-base-cased")
# model = RobertaForSequenceClassification.from_pretrained("allegro/herbert-base-cased", num_labels=4)
tokenizer = HerbertTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
model = RobertaForSequenceClassification.from_pretrained("allegro/herbert-klej-cased-v1", num_labels=4)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1037897.0, style=ProgressStyle(descriptâ€¦




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=590648.0, style=ProgressStyle(descriptiâ€¦




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=300.0, style=ProgressStyle(description_â€¦




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=341.0, style=ProgressStyle(description_â€¦




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=547.0, style=ProgressStyle(description_â€¦




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=499534190.0, style=ProgressStyle(descriâ€¦




Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

In [None]:
from sklearn import preprocessing
encoder = preprocessing.LabelEncoder()

def build_databases(train_data, dev_data, test_data):

  max_length = 200
  train_encodings = tokenizer(train_data['texts'].to_list(), truncation=True, padding=True, max_length=max_length)
  val_encodings = tokenizer(dev_data['texts'].to_list(), truncation=True, padding=True, max_length=max_length)
  test_encodings = tokenizer(test_data['texts'].to_list(), truncation=True, padding=True, max_length=max_length)

  # encode labels
  train_labels = encoder.fit_transform(train_data['labels'].to_list())
  val_labels = encoder.fit_transform(dev_data['labels'].to_list())
  test_labels = encoder.fit_transform(test_data['labels'].to_list())

  # build pyTorch dataset
  import torch

  class wikiDataset(torch.utils.data.Dataset):
      def __init__(self, encodings, labels):
          self.encodings = encodings
          self.labels = labels

      def __getitem__(self, idx):
          item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
          item['labels'] = torch.tensor(self.labels[idx])
          return item

      def __len__(self):
          return len(self.labels)

  train_dataset = wikiDataset(train_encodings, train_labels)
  val_dataset = wikiDataset(val_encodings, val_labels)
  test_dataset = wikiDataset(test_encodings, test_labels)
  return train_dataset, val_dataset, test_dataset


In [None]:
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',          
    evaluation_strategy = "epoch",
    num_train_epochs=4,              
    per_device_train_batch_size=16,  
    per_device_eval_batch_size=64,   
    warmup_steps=100,                
    weight_decay=0.01,               
    logging_dir='./logs',            
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy"
)

from datasets import load_metric
import numpy as np
metric = load_metric('accuracy')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    # print(predictions[:10])
    # print(labels[:10])
    return metric.compute(predictions=predictions, references=labels)

def get_trainer(model, train_dataset, val_dataset):
  trainer = Trainer(
      model=model,                         # the instantiated ðŸ¤— Transformers model to be trained
      args=training_args,                  # training arguments, defined above
      train_dataset=train_dataset,         # training dataset
      eval_dataset=val_dataset,             # evaluation dataset
      tokenizer=tokenizer,
      compute_metrics=compute_metrics
  )
  return trainer


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1362.0, style=ProgressStyle(descriptionâ€¦




In [None]:
def write_to_logs(values):
  with open(metrice_path, 'a') as f:
    f.write(values)

In [None]:
timestamp = datetime.datetime.now().replace(microsecond=0).isoformat().replace(':', '-')
metrice_path = '/content/drive/MyDrive/metrics/hebert_PolEmo2.0_raw2.0' + timestamp + '.txt'
fig_path = '/content/drive/MyDrive/figures/'
dataset_path = '/content/drive/MyDrive/master_datasets/dataset_conll/'


no_samples_per_class = [1, 3, 5, 8, 10, 20, 30, 60, 100, 200] ###### FINALL SAMPLES LIST  
repetitions = 5

domains = [
           ('all', 'MDT-A'),
           ('hotels', 'SDT-H'),
           ('medicine', 'SDT-M'),
           ('products', 'SDT-P'),
           ('reviews', 'SDT-R')
           ]


In [None]:
def benchmark(train, dev, test, n_sample, domain):
  accuraccy, train_time, eval_time = [], [], []
  for repeat in range(repetitions):
    # print('%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%')
    # print('training for n_sample = ', n_sample)
    # print('%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%')
    # model = RobertaForSequenceClassification.from_pretrained("allegro/herbert-base-cased", num_labels=4) ######## Change num of labels !!!!!
    model = RobertaForSequenceClassification.from_pretrained("allegro/herbert-klej-cased-v1", num_labels=4) ######## Change num of labels !!!!!
    train_fraction = train.groupby(['labels']).sample(n=max(math.floor(n_sample * 0.9), 1), replace=True)
    dev_fraction = dev.groupby(['labels']).sample(n=max(math.floor(n_sample*0.1), 1), replace=True)
    train_dataset, val_dataset, test_dataset = build_databases(train_fraction, dev_fraction, test)
    trainer = get_trainer(model=model, train_dataset=train_dataset, val_dataset=val_dataset)
    
    train_time_start = time.time()
    trainer.train()
    train_elapsed_time = time.time() - train_time_start
    train_time.append(train_elapsed_time)
    # trainer.evaluate()
    metrics = trainer.predict(test_dataset)
    accuraccy.append(metrics.metrics['eval_accuracy']) 
    eval_time.append(metrics.metrics['eval_runtime']) 
    to_save = f'domiain {domain} n_samples_per_class={n_sample}, repeat={repeat}, time_elapsed={train_elapsed_time}, {metrics.metrics}\n' 
    print(to_save)
    write_to_logs(to_save)
    shutil.rmtree('./results') # deleate checkpoints files

  return accuraccy, train_time, eval_time

In [None]:
print(metrice_path)

results_a = pd.DataFrame()
results_t = pd.DataFrame()
results_e = pd.DataFrame()

for domain, ix_name in domains:
  write_to_logs('%%%%%%%%%%%%%%%% domain: ' + domain)
  CORPORA_TRAIN = dataset_path + domain + '.text.train.txt'
  CORPORA_DEV = dataset_path + domain + '.text.dev.txt'
  CORPORA_TEST = dataset_path + domain + '.text.test.txt'

  train = load_corpora_to_dataframe(CORPORA_TRAIN)
  dev = load_corpora_to_dataframe(CORPORA_DEV)
  test = load_corpora_to_dataframe(CORPORA_TEST)

  accu_list, train_t, eval_t = [], [], []

  for n_sample in no_samples_per_class:
    accuraccy, train_time, eval_time = benchmark(train, dev, test, n_sample, domain)
    accu_list.append(np.mean(accuraccy))
    train_t.append(np.mean(train_time))
    eval_t.append(np.mean(eval_time))
  
  results_a.append(pd.DataFrame(accu_list, index=no_samples_per_class, columns=[ix_name + '_R_']).T)
  results_t.append(pd.DataFrame(train_t, index=no_samples_per_class, columns=[ix_name + '_R_']).T)
  results_e.append(pd.DataFrame(eval_t, index=no_samples_per_class, columns=[ix_name + '_R_']).T)
  
display(results_a)
display(results_t)
display(results_e)

 

/content/drive/MyDrive/metrics/hebert_PolEmo2.0_raw2.02021-02-17T21-24-28.txt


Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.47473,0.25,0.0594,67.329
2,No log,1.47397,0.25,0.0542,73.866
3,No log,1.472673,0.25,0.0447,89.478
4,No log,1.470563,0.25,0.0473,84.637


domiain all n_samples_per_class=1, repeat=0, time_elapsed=116.9348087310791, {'eval_loss': 1.3309404850006104, 'eval_accuracy': 0.41097560975609754, 'eval_runtime': 9.6005, 'eval_samples_per_second': 85.412}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.384162,0.5,0.0485,82.477
2,No log,1.384037,0.5,0.0633,63.183
3,No log,1.38377,0.5,0.0526,76.022
4,No log,1.38335,0.5,0.0658,60.758


domiain all n_samples_per_class=1, repeat=1, time_elapsed=115.69878792762756, {'eval_loss': 1.3671376705169678, 'eval_accuracy': 0.2926829268292683, 'eval_runtime': 9.8962, 'eval_samples_per_second': 82.86}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.384162,0.5,0.0558,71.736
2,No log,1.384037,0.5,0.0462,86.63
3,No log,1.38377,0.5,0.0525,76.199
4,No log,1.38335,0.5,0.0469,85.209


domiain all n_samples_per_class=1, repeat=2, time_elapsed=116.88199162483215, {'eval_loss': 1.3671376705169678, 'eval_accuracy': 0.2926829268292683, 'eval_runtime': 10.149, 'eval_samples_per_second': 80.796}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.384162,0.5,0.0487,82.167
2,No log,1.384037,0.5,0.0495,80.831
3,No log,1.38377,0.5,0.0482,82.943
4,No log,1.38335,0.5,0.1612,24.818


domiain all n_samples_per_class=1, repeat=3, time_elapsed=117.6092882156372, {'eval_loss': 1.3671376705169678, 'eval_accuracy': 0.2926829268292683, 'eval_runtime': 10.449, 'eval_samples_per_second': 78.477}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.384162,0.5,0.0477,83.89
2,No log,1.384037,0.5,0.0606,66.057
3,No log,1.38377,0.5,0.0668,59.839
4,No log,1.38335,0.5,0.0476,84.07


domiain all n_samples_per_class=1, repeat=4, time_elapsed=116.94195866584778, {'eval_loss': 1.3671376705169678, 'eval_accuracy': 0.2926829268292683, 'eval_runtime': 10.6745, 'eval_samples_per_second': 76.818}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.4865,0.25,0.039,102.574
2,No log,1.485653,0.25,0.0423,94.451
3,No log,1.48389,0.25,0.0418,95.725
4,No log,1.481349,0.25,0.042,95.27


domiain all n_samples_per_class=3, repeat=0, time_elapsed=127.53033256530762, {'eval_loss': 1.3671376705169678, 'eval_accuracy': 0.2926829268292683, 'eval_runtime': 10.7749, 'eval_samples_per_second': 76.103}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.423654,0.0,0.0407,98.331
2,No log,1.422865,0.0,0.0418,95.631
3,No log,1.42119,0.0,0.0428,93.558
4,No log,1.418713,0.0,0.0418,95.766


domiain all n_samples_per_class=3, repeat=1, time_elapsed=125.64426946640015, {'eval_loss': 1.4157105684280396, 'eval_accuracy': 0.22682926829268293, 'eval_runtime': 10.6543, 'eval_samples_per_second': 76.964}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.423654,0.0,0.0401,99.847
2,No log,1.422865,0.0,0.0424,94.387
3,No log,1.42119,0.0,2.9697,1.347
4,No log,1.418713,0.0,0.0444,90.05


domiain all n_samples_per_class=3, repeat=2, time_elapsed=124.54135823249817, {'eval_loss': 1.4157105684280396, 'eval_accuracy': 0.22682926829268293, 'eval_runtime': 11.1752, 'eval_samples_per_second': 73.377}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.423654,0.0,0.0383,104.389
2,No log,1.422865,0.0,0.0413,96.859
3,No log,1.42119,0.0,0.0395,101.261
4,No log,1.418713,0.0,0.042,95.201


domiain all n_samples_per_class=3, repeat=3, time_elapsed=124.74802756309509, {'eval_loss': 1.4157105684280396, 'eval_accuracy': 0.22682926829268293, 'eval_runtime': 10.8488, 'eval_samples_per_second': 75.584}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.423654,0.0,0.0412,97.075
2,No log,1.422865,0.0,0.0417,96.023
3,No log,1.42119,0.0,0.0401,99.861
4,No log,1.418713,0.0,0.0424,94.383


domiain all n_samples_per_class=3, repeat=4, time_elapsed=124.43976855278015, {'eval_loss': 1.4157105684280396, 'eval_accuracy': 0.22682926829268293, 'eval_runtime': 11.007, 'eval_samples_per_second': 74.498}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.387928,0.0,0.0549,72.819
2,No log,1.386524,0.0,0.048,83.405
3,No log,1.383874,0.0,0.0475,84.139
4,No log,1.380041,0.0,0.0483,82.762


domiain all n_samples_per_class=5, repeat=0, time_elapsed=130.16784572601318, {'eval_loss': 1.4148428440093994, 'eval_accuracy': 0.2304878048780488, 'eval_runtime': 10.9463, 'eval_samples_per_second': 74.911}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.443239,0.25,0.0524,76.307
2,No log,1.44205,0.25,0.0511,78.204
3,No log,1.439694,0.25,0.0498,80.318
4,No log,1.436189,0.25,0.0472,84.805


domiain all n_samples_per_class=5, repeat=1, time_elapsed=129.67134523391724, {'eval_loss': 1.4143271446228027, 'eval_accuracy': 0.20243902439024392, 'eval_runtime': 10.9799, 'eval_samples_per_second': 74.682}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.443239,0.25,0.0512,78.157
2,No log,1.44205,0.25,0.0524,76.377
3,No log,1.439694,0.25,0.0486,82.335
4,No log,1.436189,0.25,0.0493,81.149


domiain all n_samples_per_class=5, repeat=2, time_elapsed=131.20811986923218, {'eval_loss': 1.4143271446228027, 'eval_accuracy': 0.20243902439024392, 'eval_runtime': 10.9785, 'eval_samples_per_second': 74.691}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.443239,0.25,0.0504,79.417
2,No log,1.44205,0.25,0.0499,80.227
3,No log,1.439694,0.25,0.0495,80.833
4,No log,1.436189,0.25,0.049,81.62


domiain all n_samples_per_class=5, repeat=3, time_elapsed=130.7179012298584, {'eval_loss': 1.4143271446228027, 'eval_accuracy': 0.20243902439024392, 'eval_runtime': 11.0017, 'eval_samples_per_second': 74.534}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.443239,0.25,0.0493,81.206
2,No log,1.44205,0.25,0.0489,81.822
3,No log,1.439694,0.25,0.0487,82.12
4,No log,1.436189,0.25,0.0483,82.8


domiain all n_samples_per_class=5, repeat=4, time_elapsed=131.22910952568054, {'eval_loss': 1.4143271446228027, 'eval_accuracy': 0.20243902439024392, 'eval_runtime': 10.9255, 'eval_samples_per_second': 75.054}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.349373,0.25,0.0513,77.928
2,No log,1.343441,0.25,0.0525,76.14
3,No log,1.332838,0.5,0.0501,79.89
4,No log,1.317436,0.5,0.0511,78.349


domiain all n_samples_per_class=8, repeat=0, time_elapsed=117.0272011756897, {'eval_loss': 1.4005522727966309, 'eval_accuracy': 0.22682926829268293, 'eval_runtime': 10.9438, 'eval_samples_per_second': 74.928}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.410356,0.25,0.0538,74.282
2,No log,1.403504,0.25,0.0508,78.763
3,No log,1.391431,0.25,0.0497,80.557
4,No log,1.373695,0.25,0.0507,78.969


domiain all n_samples_per_class=8, repeat=1, time_elapsed=142.75325441360474, {'eval_loss': 1.3643858432769775, 'eval_accuracy': 0.3951219512195122, 'eval_runtime': 11.0951, 'eval_samples_per_second': 73.906}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.410356,0.25,0.0499,80.093
2,No log,1.403504,0.25,0.0495,80.743
3,No log,1.391431,0.25,0.0471,84.898
4,No log,1.373695,0.25,0.0496,80.678


domiain all n_samples_per_class=8, repeat=2, time_elapsed=134.6869616508484, {'eval_loss': 1.3643858432769775, 'eval_accuracy': 0.3951219512195122, 'eval_runtime': 10.9606, 'eval_samples_per_second': 74.814}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.410356,0.25,0.0505,79.258
2,No log,1.403504,0.25,0.0484,82.664
3,No log,1.391431,0.25,0.0526,76.098
4,No log,1.373695,0.25,0.0484,82.711


domiain all n_samples_per_class=8, repeat=3, time_elapsed=135.09771084785461, {'eval_loss': 1.3643858432769775, 'eval_accuracy': 0.3951219512195122, 'eval_runtime': 11.0089, 'eval_samples_per_second': 74.485}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.410356,0.25,0.0514,77.824
2,No log,1.403504,0.25,0.0506,78.992
3,No log,1.391431,0.25,0.0523,76.48
4,No log,1.373695,0.25,0.0501,79.804


domiain all n_samples_per_class=8, repeat=4, time_elapsed=133.25821995735168, {'eval_loss': 1.3643858432769775, 'eval_accuracy': 0.3951219512195122, 'eval_runtime': 10.1525, 'eval_samples_per_second': 80.769}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.405934,0.25,0.0491,81.387
2,No log,1.398448,0.25,0.049,81.689
3,No log,1.385594,0.25,0.0473,84.604
4,1.405200,1.370515,0.25,0.0485,82.505


domiain all n_samples_per_class=10, repeat=0, time_elapsed=134.99726843833923, {'eval_loss': 1.3636269569396973, 'eval_accuracy': 0.39390243902439026, 'eval_runtime': 9.8131, 'eval_samples_per_second': 83.562}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.457344,0.0,0.0459,87.138
2,No log,1.450601,0.0,0.0458,87.279
3,No log,1.43945,0.0,0.0455,87.907
4,1.396100,1.425192,0.0,0.0462,86.547


domiain all n_samples_per_class=10, repeat=1, time_elapsed=132.64716172218323, {'eval_loss': 1.3829638957977295, 'eval_accuracy': 0.2524390243902439, 'eval_runtime': 9.6914, 'eval_samples_per_second': 84.611}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.457344,0.0,0.0474,84.41
2,No log,1.450601,0.0,0.046,86.986
3,No log,1.43945,0.0,0.0452,88.408
4,1.396100,1.425192,0.0,0.0459,87.086


domiain all n_samples_per_class=10, repeat=2, time_elapsed=134.94492530822754, {'eval_loss': 1.3829638957977295, 'eval_accuracy': 0.2524390243902439, 'eval_runtime': 9.5638, 'eval_samples_per_second': 85.74}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.457344,0.0,0.0471,85.011
2,No log,1.450601,0.0,0.0462,86.655
3,No log,1.43945,0.0,0.0472,84.835
4,1.396100,1.425192,0.0,0.0678,58.983


domiain all n_samples_per_class=10, repeat=3, time_elapsed=134.75359058380127, {'eval_loss': 1.3829638957977295, 'eval_accuracy': 0.2524390243902439, 'eval_runtime': 9.6363, 'eval_samples_per_second': 85.095}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.457344,0.0,0.0455,87.888
2,No log,1.450601,0.0,0.0457,87.503
3,No log,1.43945,0.0,0.0457,87.564
4,1.396100,1.425192,0.0,0.046,86.927


domiain all n_samples_per_class=10, repeat=4, time_elapsed=132.90381860733032, {'eval_loss': 1.3829638957977295, 'eval_accuracy': 0.2524390243902439, 'eval_runtime': 9.5835, 'eval_samples_per_second': 85.564}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.405628,0.25,0.0918,87.122
2,1.386200,1.383923,0.375,0.0825,97.025
3,1.386200,1.347862,0.25,0.0926,86.38
4,1.289600,1.303191,0.25,0.979,8.171


domiain all n_samples_per_class=20, repeat=0, time_elapsed=138.86830401420593, {'eval_loss': 1.3554974794387817, 'eval_accuracy': 0.27926829268292686, 'eval_runtime': 10.1675, 'eval_samples_per_second': 80.649}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.436522,0.25,0.0949,84.282
2,1.402600,1.396961,0.25,0.0952,84.007
3,1.402600,1.346703,0.375,1.0757,7.437
4,1.291000,1.294353,0.375,0.0999,80.115


domiain all n_samples_per_class=20, repeat=1, time_elapsed=124.25374436378479, {'eval_loss': 1.3473306894302368, 'eval_accuracy': 0.32439024390243903, 'eval_runtime': 10.8844, 'eval_samples_per_second': 75.337}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.436522,0.25,0.1048,76.319
2,1.402600,1.396961,0.25,0.0997,80.279
3,1.402600,1.346703,0.375,0.099,80.801
4,1.291000,1.294353,0.375,0.1024,78.114


domiain all n_samples_per_class=20, repeat=2, time_elapsed=130.2249252796173, {'eval_loss': 1.3473306894302368, 'eval_accuracy': 0.32439024390243903, 'eval_runtime': 11.0164, 'eval_samples_per_second': 74.435}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.436522,0.25,0.1043,76.735
2,1.402600,1.396961,0.25,0.0954,83.879
3,1.402600,1.346703,0.375,0.0981,81.561
4,1.291000,1.294353,0.375,0.1019,78.516


domiain all n_samples_per_class=20, repeat=3, time_elapsed=131.68166971206665, {'eval_loss': 1.3473306894302368, 'eval_accuracy': 0.32439024390243903, 'eval_runtime': 11.049, 'eval_samples_per_second': 74.215}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.436522,0.25,0.1047,76.432
2,1.402600,1.396961,0.25,0.0962,83.144
3,1.402600,1.346703,0.375,0.0977,81.924
4,1.291000,1.294353,0.375,0.0981,81.537


domiain all n_samples_per_class=20, repeat=4, time_elapsed=130.42649674415588, {'eval_loss': 1.3473306894302368, 'eval_accuracy': 0.32439024390243903, 'eval_runtime': 11.0504, 'eval_samples_per_second': 74.206}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.380577,0.25,0.1412,84.966
2,1.395000,1.304811,0.416667,0.1363,88.067
3,1.314100,1.213,0.583333,0.1307,91.827
4,1.314100,1.072612,0.666667,0.1428,84.06


domiain all n_samples_per_class=30, repeat=0, time_elapsed=130.37679409980774, {'eval_loss': 1.103852391242981, 'eval_accuracy': 0.6109756097560975, 'eval_runtime': 11.019, 'eval_samples_per_second': 74.417}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.398209,0.25,0.1399,85.8
2,1.413900,1.326153,0.5,0.1368,87.743
3,1.293500,1.230095,0.583333,0.1406,85.372
4,1.293500,1.082112,0.75,0.1401,85.669


domiain all n_samples_per_class=30, repeat=1, time_elapsed=131.9581458568573, {'eval_loss': 1.1626746654510498, 'eval_accuracy': 0.5390243902439025, 'eval_runtime': 10.8722, 'eval_samples_per_second': 75.422}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.398209,0.25,0.1425,84.228
2,1.413900,1.326153,0.5,0.1382,86.811
3,1.293500,1.230095,0.583333,0.1419,84.583
4,1.293500,1.082112,0.75,0.1402,85.589


domiain all n_samples_per_class=30, repeat=2, time_elapsed=131.45348262786865, {'eval_loss': 1.1626746654510498, 'eval_accuracy': 0.5390243902439025, 'eval_runtime': 10.8438, 'eval_samples_per_second': 75.619}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.398209,0.25,0.1438,83.47
2,1.413900,1.326153,0.5,0.1377,87.151
3,1.293500,1.230095,0.583333,0.1409,85.196
4,1.293500,1.082112,0.75,0.1398,85.82


domiain all n_samples_per_class=30, repeat=3, time_elapsed=131.24456405639648, {'eval_loss': 1.1626746654510498, 'eval_accuracy': 0.5390243902439025, 'eval_runtime': 10.8153, 'eval_samples_per_second': 75.819}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.398209,0.25,0.1393,86.149
2,1.413900,1.326153,0.5,0.1418,84.621
3,1.293500,1.230095,0.583333,0.133,90.225
4,1.293500,1.082112,0.75,0.1421,84.444


domiain all n_samples_per_class=30, repeat=4, time_elapsed=132.64521741867065, {'eval_loss': 1.1626746654510498, 'eval_accuracy': 0.5390243902439025, 'eval_runtime': 10.7513, 'eval_samples_per_second': 76.27}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.4079,1.332958,0.5,0.3275,73.283
2,1.3188,1.073363,0.625,0.3117,77.002
3,0.9241,0.751946,0.75,0.3125,76.795
4,0.7057,0.426709,0.875,0.3184,75.372


domiain all n_samples_per_class=60, repeat=0, time_elapsed=132.53670239448547, {'eval_loss': 0.6195520162582397, 'eval_accuracy': 0.7707317073170732, 'eval_runtime': 10.6266, 'eval_samples_per_second': 77.165}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.417,1.307456,0.541667,0.3437,69.834
2,1.3575,1.072754,0.791667,0.3116,77.021
3,0.961,0.693968,0.875,0.3049,78.706
4,0.668,0.364095,0.958333,0.3186,75.324


domiain all n_samples_per_class=60, repeat=1, time_elapsed=132.62281489372253, {'eval_loss': 0.5339027047157288, 'eval_accuracy': 0.8146341463414634, 'eval_runtime': 10.6797, 'eval_samples_per_second': 76.781}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.417,1.307456,0.541667,0.3285,73.057
2,1.3575,1.072754,0.791667,0.3133,76.601
3,0.961,0.693968,0.875,0.3061,78.402
4,0.668,0.364095,0.958333,0.3204,74.896


domiain all n_samples_per_class=60, repeat=2, time_elapsed=132.72448253631592, {'eval_loss': 0.5339027047157288, 'eval_accuracy': 0.8146341463414634, 'eval_runtime': 10.6876, 'eval_samples_per_second': 76.724}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.417,1.307456,0.541667,0.321,74.756
2,1.3575,1.072754,0.791667,0.3117,77.004
3,0.961,0.693968,0.875,0.308,77.923
4,0.668,0.364095,0.958333,0.316,75.953


domiain all n_samples_per_class=60, repeat=3, time_elapsed=130.7470245361328, {'eval_loss': 0.5339027047157288, 'eval_accuracy': 0.8146341463414634, 'eval_runtime': 10.1257, 'eval_samples_per_second': 80.982}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.417,1.307456,0.541667,0.3107,77.255
2,1.3575,1.072754,0.791667,0.3105,77.288
3,0.961,0.693968,0.875,0.3128,76.715
4,0.668,0.364095,0.958333,0.3277,73.239


domiain all n_samples_per_class=60, repeat=4, time_elapsed=135.7773609161377, {'eval_loss': 0.5339027047157288, 'eval_accuracy': 0.8146341463414634, 'eval_runtime': 10.6056, 'eval_samples_per_second': 77.318}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3563,1.231084,0.6,0.5585,71.618
2,1.0252,0.761877,0.6,0.5187,77.118
3,0.5871,0.575584,0.75,0.5169,77.379
4,0.2642,1.044864,0.675,0.5554,72.023


domiain all n_samples_per_class=100, repeat=0, time_elapsed=137.27813577651978, {'eval_loss': 0.4746408462524414, 'eval_accuracy': 0.8256097560975609, 'eval_runtime': 10.4259, 'eval_samples_per_second': 78.65}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3559,1.208348,0.55,0.5521,72.444
2,0.9973,0.789701,0.575,0.5238,76.368
3,0.6232,0.615727,0.7,0.522,76.629
4,0.2252,1.077912,0.625,0.5594,71.501


domiain all n_samples_per_class=100, repeat=1, time_elapsed=138.12435388565063, {'eval_loss': 0.46763575077056885, 'eval_accuracy': 0.8182926829268292, 'eval_runtime': 10.4141, 'eval_samples_per_second': 78.739}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3559,1.208348,0.55,0.5589,71.568
2,0.9973,0.789701,0.575,0.5204,76.868
3,0.6232,0.615727,0.7,0.518,77.22
4,0.2252,1.077912,0.625,0.557,71.808


domiain all n_samples_per_class=100, repeat=2, time_elapsed=136.98505544662476, {'eval_loss': 0.46763575077056885, 'eval_accuracy': 0.8182926829268292, 'eval_runtime': 10.4144, 'eval_samples_per_second': 78.737}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3559,1.208348,0.55,0.5549,72.081
2,0.9973,0.789701,0.575,0.5259,76.058
3,0.6232,0.615727,0.7,0.5222,76.599
4,0.2252,1.077912,0.625,0.558,71.682


domiain all n_samples_per_class=100, repeat=3, time_elapsed=138.28917694091797, {'eval_loss': 0.46763575077056885, 'eval_accuracy': 0.8182926829268292, 'eval_runtime': 10.429, 'eval_samples_per_second': 78.627}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3559,1.208348,0.55,0.5614,71.251
2,0.9973,0.789701,0.575,0.525,76.186
3,0.6232,0.615727,0.7,0.5182,77.191
4,0.2252,1.077912,0.625,0.5572,71.782


domiain all n_samples_per_class=100, repeat=4, time_elapsed=137.41269397735596, {'eval_loss': 0.46763575077056885, 'eval_accuracy': 0.8182926829268292, 'eval_runtime': 10.39, 'eval_samples_per_second': 78.922}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.0189,0.816688,0.6625,1.0767,74.3
2,0.4624,0.687399,0.7875,1.0426,76.734
3,0.3632,0.595227,0.825,1.0873,73.58
4,0.1075,0.518442,0.85,1.0627,75.28


domiain all n_samples_per_class=200, repeat=0, time_elapsed=154.21652603149414, {'eval_loss': 0.5273145437240601, 'eval_accuracy': 0.8268292682926829, 'eval_runtime': 10.4021, 'eval_samples_per_second': 78.83}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.0195,0.799211,0.6625,1.0746,74.448
2,0.5326,0.680211,0.7625,1.044,76.631
3,0.3583,0.544035,0.8,1.0887,73.484
4,0.0923,0.484496,0.8375,1.0656,75.072


domiain all n_samples_per_class=200, repeat=1, time_elapsed=153.19752860069275, {'eval_loss': 0.5267388224601746, 'eval_accuracy': 0.8292682926829268, 'eval_runtime': 10.4103, 'eval_samples_per_second': 78.768}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.0195,0.799211,0.6625,1.0746,74.444
2,0.5326,0.680211,0.7625,1.0568,75.698
3,0.3583,0.544035,0.8,1.077,74.28
4,0.0923,0.484496,0.8375,1.05,76.194


domiain all n_samples_per_class=200, repeat=2, time_elapsed=152.4750213623047, {'eval_loss': 0.5267388224601746, 'eval_accuracy': 0.8292682926829268, 'eval_runtime': 10.4243, 'eval_samples_per_second': 78.663}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.0195,0.799211,0.6625,1.0661,75.039
2,0.5326,0.680211,0.7625,1.0676,74.933
3,0.3583,0.544035,0.8,1.0756,74.379
4,0.0923,0.484496,0.8375,1.0642,75.175


domiain all n_samples_per_class=200, repeat=3, time_elapsed=151.63527274131775, {'eval_loss': 0.5267388224601746, 'eval_accuracy': 0.8292682926829268, 'eval_runtime': 10.3599, 'eval_samples_per_second': 79.151}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.0195,0.799211,0.6625,1.0822,73.923
2,0.5326,0.680211,0.7625,1.0504,76.164
3,0.3583,0.544035,0.8,1.0895,73.43
4,0.0923,0.484496,0.8375,1.0505,76.155


domiain all n_samples_per_class=200, repeat=4, time_elapsed=154.11087083816528, {'eval_loss': 0.5267388224601746, 'eval_accuracy': 0.8292682926829268, 'eval_runtime': 10.2925, 'eval_samples_per_second': 79.67}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.406825,0.25,0.0468,85.484
2,No log,1.405892,0.25,0.0541,73.976
3,No log,1.403973,0.5,0.0665,60.172
4,No log,1.400992,0.5,0.0487,82.053


domiain hotels n_samples_per_class=1, repeat=0, time_elapsed=107.75221991539001, {'eval_loss': 1.470782995223999, 'eval_accuracy': 0.13670886075949368, 'eval_runtime': 5.2746, 'eval_samples_per_second': 74.887}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.523796,0.0,0.047,85.175
2,No log,1.523041,0.0,0.0491,81.434
3,No log,1.521493,0.0,0.0701,57.052
4,No log,1.51914,0.0,0.0508,78.798


domiain hotels n_samples_per_class=1, repeat=1, time_elapsed=133.2407832145691, {'eval_loss': 1.3659207820892334, 'eval_accuracy': 0.25063291139240507, 'eval_runtime': 5.3257, 'eval_samples_per_second': 74.169}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.523796,0.0,0.0596,67.11
2,No log,1.523041,0.0,0.0519,77.064
3,No log,1.521493,0.0,0.0491,81.462
4,No log,1.51914,0.0,0.049,81.674


domiain hotels n_samples_per_class=1, repeat=2, time_elapsed=115.53569054603577, {'eval_loss': 1.3659207820892334, 'eval_accuracy': 0.25063291139240507, 'eval_runtime': 5.2649, 'eval_samples_per_second': 75.025}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.523796,0.0,0.0493,81.088
2,No log,1.523041,0.0,0.0607,65.878
3,No log,1.521493,0.0,0.0517,77.297
4,No log,1.51914,0.0,0.0617,64.861


domiain hotels n_samples_per_class=1, repeat=3, time_elapsed=115.46952629089355, {'eval_loss': 1.3659207820892334, 'eval_accuracy': 0.25063291139240507, 'eval_runtime': 5.3253, 'eval_samples_per_second': 74.174}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.523796,0.0,0.0534,74.969
2,No log,1.523041,0.0,0.0495,80.8
3,No log,1.521493,0.0,0.0506,79.0
4,No log,1.51914,0.0,0.0601,66.611


domiain hotels n_samples_per_class=1, repeat=4, time_elapsed=116.64693999290466, {'eval_loss': 1.3659207820892334, 'eval_accuracy': 0.25063291139240507, 'eval_runtime': 5.2928, 'eval_samples_per_second': 74.63}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.448311,0.0,0.045,88.899
2,No log,1.44727,0.0,0.0466,85.923
3,No log,1.444998,0.0,2.7129,1.474
4,No log,1.441611,0.0,0.0453,88.365


domiain hotels n_samples_per_class=3, repeat=0, time_elapsed=126.42521405220032, {'eval_loss': 1.3659207820892334, 'eval_accuracy': 0.25063291139240507, 'eval_runtime': 5.3272, 'eval_samples_per_second': 74.147}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.386391,0.0,0.0437,91.48
2,No log,1.385286,0.0,0.0474,84.424
3,No log,1.38311,0.0,0.0616,64.986
4,No log,1.379992,0.0,0.0471,84.84


domiain hotels n_samples_per_class=3, repeat=1, time_elapsed=125.67099523544312, {'eval_loss': 1.3991011381149292, 'eval_accuracy': 0.24556962025316456, 'eval_runtime': 5.2176, 'eval_samples_per_second': 75.705}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.386391,0.0,0.0405,98.664
2,No log,1.385286,0.0,0.0441,90.67
3,No log,1.38311,0.0,0.0463,86.46
4,No log,1.379992,0.0,0.0461,86.761


domiain hotels n_samples_per_class=3, repeat=2, time_elapsed=124.24862241744995, {'eval_loss': 1.3991011381149292, 'eval_accuracy': 0.24556962025316456, 'eval_runtime': 5.3333, 'eval_samples_per_second': 74.063}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.386391,0.0,0.0456,87.751
2,No log,1.385286,0.0,0.0448,89.377
3,No log,1.38311,0.0,0.0465,85.995
4,No log,1.379992,0.0,0.0449,89.066


domiain hotels n_samples_per_class=3, repeat=3, time_elapsed=127.16549944877625, {'eval_loss': 1.3991011381149292, 'eval_accuracy': 0.24556962025316456, 'eval_runtime': 5.2507, 'eval_samples_per_second': 75.228}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.386391,0.0,0.0453,88.212
2,No log,1.385286,0.0,0.0471,84.863
3,No log,1.38311,0.0,0.0457,87.451
4,No log,1.379992,0.0,0.0463,86.41


domiain hotels n_samples_per_class=3, repeat=4, time_elapsed=125.28523230552673, {'eval_loss': 1.3991011381149292, 'eval_accuracy': 0.24556962025316456, 'eval_runtime': 5.2315, 'eval_samples_per_second': 75.504}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.487767,0.0,0.0466,85.779
2,No log,1.484172,0.0,0.0474,84.32
3,No log,1.477988,0.0,0.0438,91.388
4,No log,1.470653,0.0,0.0406,98.526


domiain hotels n_samples_per_class=5, repeat=0, time_elapsed=128.16864204406738, {'eval_loss': 1.3980635404586792, 'eval_accuracy': 0.2481012658227848, 'eval_runtime': 5.2529, 'eval_samples_per_second': 75.197}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.312669,0.5,0.0451,88.706
2,No log,1.311588,0.5,0.0496,80.611
3,No log,1.309279,0.5,0.0459,87.058
4,No log,1.305659,0.5,0.0448,89.236


domiain hotels n_samples_per_class=5, repeat=1, time_elapsed=128.47619199752808, {'eval_loss': 1.4197226762771606, 'eval_accuracy': 0.19240506329113924, 'eval_runtime': 5.3478, 'eval_samples_per_second': 73.862}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.312669,0.5,0.0462,86.544
2,No log,1.311588,0.5,0.046,86.875
3,No log,1.309279,0.5,0.0474,84.367
4,No log,1.305659,0.5,0.0451,88.594


domiain hotels n_samples_per_class=5, repeat=2, time_elapsed=128.0210316181183, {'eval_loss': 1.4197226762771606, 'eval_accuracy': 0.19240506329113924, 'eval_runtime': 5.272, 'eval_samples_per_second': 74.924}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.312669,0.5,0.0453,88.299
2,No log,1.311588,0.5,0.0477,83.844
3,No log,1.309279,0.5,0.0451,88.7
4,No log,1.305659,0.5,0.045,88.79


domiain hotels n_samples_per_class=5, repeat=3, time_elapsed=130.23974323272705, {'eval_loss': 1.4197226762771606, 'eval_accuracy': 0.19240506329113924, 'eval_runtime': 5.3268, 'eval_samples_per_second': 74.154}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.312669,0.5,0.0492,81.318
2,No log,1.311588,0.5,0.0478,83.72
3,No log,1.309279,0.5,0.0468,85.541
4,No log,1.305659,0.5,0.0434,92.21


domiain hotels n_samples_per_class=5, repeat=4, time_elapsed=128.91452145576477, {'eval_loss': 1.4197226762771606, 'eval_accuracy': 0.19240506329113924, 'eval_runtime': 5.2906, 'eval_samples_per_second': 74.661}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.35499,0.25,0.0503,79.489
2,No log,1.352122,0.25,0.0505,79.177
3,No log,1.347157,0.25,0.0489,81.844


Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.35499,0.25,0.0503,79.489
2,No log,1.352122,0.25,0.0505,79.177
3,No log,1.347157,0.25,0.0489,81.844
4,No log,1.339219,0.25,0.0492,81.364


domiain hotels n_samples_per_class=8, repeat=0, time_elapsed=134.30914759635925, {'eval_loss': 1.4196823835372925, 'eval_accuracy': 0.1949367088607595, 'eval_runtime': 5.3466, 'eval_samples_per_second': 73.879}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.385012,0.25,0.0497,80.453
2,No log,1.381601,0.25,0.0537,74.537
3,No log,1.375169,0.25,0.0504,79.362
4,No log,1.365588,0.25,0.0479,83.525


domiain hotels n_samples_per_class=8, repeat=1, time_elapsed=132.95599746704102, {'eval_loss': 1.3600934743881226, 'eval_accuracy': 0.41265822784810124, 'eval_runtime': 5.307, 'eval_samples_per_second': 74.43}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.385012,0.25,0.0513,78.015
2,No log,1.381601,0.25,0.051,78.498
3,No log,1.375169,0.25,0.0492,81.276
4,No log,1.365588,0.25,0.0486,82.342


domiain hotels n_samples_per_class=8, repeat=2, time_elapsed=131.8357377052307, {'eval_loss': 1.3600934743881226, 'eval_accuracy': 0.41265822784810124, 'eval_runtime': 5.209, 'eval_samples_per_second': 75.83}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.385012,0.25,0.0491,81.456
2,No log,1.381601,0.25,0.0485,82.408
3,No log,1.375169,0.25,0.0491,81.42
4,No log,1.365588,0.25,0.0488,82.027


domiain hotels n_samples_per_class=8, repeat=3, time_elapsed=134.95044374465942, {'eval_loss': 1.3600934743881226, 'eval_accuracy': 0.41265822784810124, 'eval_runtime': 5.2926, 'eval_samples_per_second': 74.632}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.385012,0.25,0.051,78.472
2,No log,1.381601,0.25,0.0513,77.952
3,No log,1.375169,0.25,0.0479,83.502
4,No log,1.365588,0.25,0.0493,81.116


domiain hotels n_samples_per_class=8, repeat=4, time_elapsed=131.30047607421875, {'eval_loss': 1.3600934743881226, 'eval_accuracy': 0.41265822784810124, 'eval_runtime': 5.1662, 'eval_samples_per_second': 76.459}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.327527,0.25,0.0455,87.887
2,No log,1.32142,0.25,0.0456,87.81
3,No log,1.312093,0.5,0.0461,86.827
4,1.369400,1.30119,0.5,0.0455,87.974


domiain hotels n_samples_per_class=10, repeat=0, time_elapsed=120.00443458557129, {'eval_loss': 1.3431566953659058, 'eval_accuracy': 0.4050632911392405, 'eval_runtime': 5.17, 'eval_samples_per_second': 76.403}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.373614,0.5,0.0457,87.554
2,No log,1.368359,0.5,0.0448,89.319
3,No log,1.358795,0.5,0.0464,86.188
4,1.416600,1.347479,0.5,0.0452,88.573


domiain hotels n_samples_per_class=10, repeat=1, time_elapsed=149.24051785469055, {'eval_loss': 1.3752024173736572, 'eval_accuracy': 0.2481012658227848, 'eval_runtime': 5.2492, 'eval_samples_per_second': 75.25}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.373614,0.5,0.0468,85.519
2,No log,1.368359,0.5,0.0458,87.292
3,No log,1.358795,0.5,0.0459,87.134
4,1.416600,1.347479,0.5,0.0571,70.08


domiain hotels n_samples_per_class=10, repeat=2, time_elapsed=135.00540208816528, {'eval_loss': 1.3752024173736572, 'eval_accuracy': 0.2481012658227848, 'eval_runtime': 5.3488, 'eval_samples_per_second': 73.849}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.373614,0.5,0.0476,83.952
2,No log,1.368359,0.5,0.0454,88.198
3,No log,1.358795,0.5,0.0464,86.145
4,1.416600,1.347479,0.5,0.0452,88.503


domiain hotels n_samples_per_class=10, repeat=3, time_elapsed=133.7684769630432, {'eval_loss': 1.3752024173736572, 'eval_accuracy': 0.2481012658227848, 'eval_runtime': 5.3187, 'eval_samples_per_second': 74.266}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.373614,0.5,0.0452,88.427
2,No log,1.368359,0.5,0.0464,86.231
3,No log,1.358795,0.5,0.0448,89.214
4,1.416600,1.347479,0.5,0.0456,87.736


domiain hotels n_samples_per_class=10, repeat=4, time_elapsed=135.4343385696411, {'eval_loss': 1.3752024173736572, 'eval_accuracy': 0.2481012658227848, 'eval_runtime': 5.3335, 'eval_samples_per_second': 74.06}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.393609,0.375,0.1078,74.18
2,1.389900,1.361024,0.375,0.103,77.64
3,1.389900,1.307788,0.625,0.0961,83.256
4,1.285800,1.234982,0.75,0.0944,84.712


domiain hotels n_samples_per_class=20, repeat=0, time_elapsed=124.53935551643372, {'eval_loss': 1.2465801239013672, 'eval_accuracy': 0.5746835443037974, 'eval_runtime': 5.3036, 'eval_samples_per_second': 74.478}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.407997,0.375,0.1057,75.715
2,1.421000,1.359168,0.375,0.0934,85.641
3,1.421000,1.293571,0.5,0.0994,80.471
4,1.294500,1.21307,0.5,0.0997,80.205


domiain hotels n_samples_per_class=20, repeat=1, time_elapsed=136.51379132270813, {'eval_loss': 1.329270601272583, 'eval_accuracy': 0.3670886075949367, 'eval_runtime': 5.1865, 'eval_samples_per_second': 76.159}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.407997,0.375,0.0992,80.616
2,1.421000,1.359168,0.375,0.0958,83.523
3,1.421000,1.293571,0.5,0.0996,80.344
4,1.294500,1.21307,0.5,0.1014,78.883


domiain hotels n_samples_per_class=20, repeat=2, time_elapsed=136.35003399848938, {'eval_loss': 1.329270601272583, 'eval_accuracy': 0.3670886075949367, 'eval_runtime': 5.3002, 'eval_samples_per_second': 74.525}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.407997,0.375,0.102,78.421
2,1.421000,1.359168,0.375,0.0976,81.981
3,1.421000,1.293571,0.5,0.1008,79.394
4,1.294500,1.21307,0.5,0.1007,79.483


domiain hotels n_samples_per_class=20, repeat=3, time_elapsed=136.59494805335999, {'eval_loss': 1.329270601272583, 'eval_accuracy': 0.3670886075949367, 'eval_runtime': 5.2164, 'eval_samples_per_second': 75.723}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.407997,0.375,0.1035,77.288
2,1.421000,1.359168,0.375,0.0962,83.176
3,1.421000,1.293571,0.5,0.1015,78.84
4,1.294500,1.21307,0.5,0.0973,82.194


domiain hotels n_samples_per_class=20, repeat=4, time_elapsed=135.8537323474884, {'eval_loss': 1.329270601272583, 'eval_accuracy': 0.3670886075949367, 'eval_runtime': 5.2316, 'eval_samples_per_second': 75.503}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.36089,0.416667,0.1686,71.194
2,1.391900,1.281228,0.583333,0.1613,74.4
3,1.310900,1.185818,0.583333,0.1641,73.129
4,1.310900,1.03946,0.666667,0.1657,72.411


domiain hotels n_samples_per_class=30, repeat=0, time_elapsed=137.20176148414612, {'eval_loss': 1.0706162452697754, 'eval_accuracy': 0.6987341772151898, 'eval_runtime': 5.1528, 'eval_samples_per_second': 76.658}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.376218,0.166667,0.1651,72.705
2,1.430500,1.302947,0.5,0.1678,71.519
3,1.316100,1.215865,0.416667,0.1616,74.273
4,1.316100,1.06912,0.583333,0.1714,70.028


domiain hotels n_samples_per_class=30, repeat=1, time_elapsed=138.5325710773468, {'eval_loss': 1.1166874170303345, 'eval_accuracy': 0.5620253164556962, 'eval_runtime': 5.1408, 'eval_samples_per_second': 76.836}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.376218,0.166667,0.1668,71.945
2,1.430500,1.302947,0.5,0.1635,73.38
3,1.316100,1.215865,0.416667,0.1632,73.53
4,1.316100,1.06912,0.583333,0.1634,73.438


domiain hotels n_samples_per_class=30, repeat=2, time_elapsed=136.73936581611633, {'eval_loss': 1.1166874170303345, 'eval_accuracy': 0.5620253164556962, 'eval_runtime': 5.1087, 'eval_samples_per_second': 77.319}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.376218,0.166667,0.165,72.726
2,1.430500,1.302947,0.5,0.1655,72.495
3,1.316100,1.215865,0.416667,0.161,74.525
4,1.316100,1.06912,0.583333,0.1696,70.759


domiain hotels n_samples_per_class=30, repeat=3, time_elapsed=138.88422060012817, {'eval_loss': 1.1166874170303345, 'eval_accuracy': 0.5620253164556962, 'eval_runtime': 5.1394, 'eval_samples_per_second': 76.857}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.376218,0.166667,0.1206,99.488
2,1.430500,1.302947,0.5,0.1638,73.242
3,1.316100,1.215865,0.416667,0.169,71.011
4,1.316100,1.06912,0.583333,0.1709,70.196


domiain hotels n_samples_per_class=30, repeat=4, time_elapsed=137.23033475875854, {'eval_loss': 1.1166874170303345, 'eval_accuracy': 0.5620253164556962, 'eval_runtime': 5.1312, 'eval_samples_per_second': 76.98}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.4105,1.337584,0.416667,0.3261,73.595
2,1.3249,1.091064,0.541667,0.3144,76.327
3,0.8398,0.789219,0.708333,0.313,76.672
4,0.6332,0.838216,0.666667,0.327,73.394


domiain hotels n_samples_per_class=60, repeat=0, time_elapsed=138.6463165283203, {'eval_loss': 0.7749429941177368, 'eval_accuracy': 0.6658227848101266, 'eval_runtime': 5.0451, 'eval_samples_per_second': 78.294}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.4134,1.332324,0.458333,0.3163,75.874
2,1.3491,1.093492,0.75,0.3218,74.59
3,0.8704,0.684389,0.708333,0.3059,78.466
4,0.5789,0.511802,0.791667,0.3276,73.265


domiain hotels n_samples_per_class=60, repeat=1, time_elapsed=136.44257378578186, {'eval_loss': 0.4584990441799164, 'eval_accuracy': 0.8177215189873418, 'eval_runtime': 5.0331, 'eval_samples_per_second': 78.481}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.4134,1.332324,0.458333,0.3144,76.335
2,1.3491,1.093492,0.75,0.3157,76.015
3,0.8704,0.684389,0.708333,0.3084,77.815
4,0.5789,0.511802,0.791667,0.3281,73.154


domiain hotels n_samples_per_class=60, repeat=2, time_elapsed=139.38192582130432, {'eval_loss': 0.4584990441799164, 'eval_accuracy': 0.8177215189873418, 'eval_runtime': 5.1215, 'eval_samples_per_second': 77.126}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.4134,1.332324,0.458333,0.3318,72.324
2,1.3491,1.093492,0.75,0.3183,75.398
3,0.8704,0.684389,0.708333,0.3119,76.957
4,0.5789,0.511802,0.791667,0.3156,76.042


domiain hotels n_samples_per_class=60, repeat=3, time_elapsed=137.42465090751648, {'eval_loss': 0.4584990441799164, 'eval_accuracy': 0.8177215189873418, 'eval_runtime': 5.0295, 'eval_samples_per_second': 78.537}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.4134,1.332324,0.458333,0.3218,74.573
2,1.3491,1.093492,0.75,0.3209,74.787
3,0.8704,0.684389,0.708333,0.3101,77.392
4,0.5789,0.511802,0.791667,0.3251,73.816


domiain hotels n_samples_per_class=60, repeat=4, time_elapsed=139.90052771568298, {'eval_loss': 0.4584990441799164, 'eval_accuracy': 0.8177215189873418, 'eval_runtime': 5.0301, 'eval_samples_per_second': 78.527}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3348,1.193639,0.7,0.5538,72.224
2,0.9423,0.54811,0.825,0.5283,75.722
3,0.43,0.571217,0.775,0.5162,77.483
4,0.304,0.336106,0.875,0.5587,71.591


domiain hotels n_samples_per_class=100, repeat=0, time_elapsed=137.87100958824158, {'eval_loss': 0.41386091709136963, 'eval_accuracy': 0.8126582278481013, 'eval_runtime': 4.9687, 'eval_samples_per_second': 79.498}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3155,1.153356,0.675,0.5508,72.615
2,0.8974,0.583601,0.8,0.533,75.049
3,0.4815,0.661029,0.75,0.5206,76.838
4,0.279,0.346447,0.9,0.5632,71.025


domiain hotels n_samples_per_class=100, repeat=1, time_elapsed=138.6345887184143, {'eval_loss': 0.5521266460418701, 'eval_accuracy': 0.7924050632911392, 'eval_runtime': 4.9633, 'eval_samples_per_second': 79.584}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3155,1.153356,0.675,0.5346,74.819
2,0.8974,0.583601,0.8,0.5392,74.183
3,0.4815,0.661029,0.75,0.5193,77.031
4,0.279,0.346447,0.9,0.5675,70.484


domiain hotels n_samples_per_class=100, repeat=2, time_elapsed=138.35020852088928, {'eval_loss': 0.5521266460418701, 'eval_accuracy': 0.7924050632911392, 'eval_runtime': 4.9813, 'eval_samples_per_second': 79.297}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3155,1.153356,0.675,0.5479,73.008
2,0.8974,0.583601,0.8,0.5387,74.25
3,0.4815,0.661029,0.75,0.5198,76.949
4,0.279,0.346447,0.9,0.5688,70.325


domiain hotels n_samples_per_class=100, repeat=3, time_elapsed=139.40799760818481, {'eval_loss': 0.5521266460418701, 'eval_accuracy': 0.7924050632911392, 'eval_runtime': 4.9678, 'eval_samples_per_second': 79.513}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3155,1.153356,0.675,0.5494,72.813
2,0.8974,0.583601,0.8,0.5377,74.384
3,0.4815,0.661029,0.75,0.52,76.918
4,0.279,0.346447,0.9,0.5674,70.492


domiain hotels n_samples_per_class=100, repeat=4, time_elapsed=139.26244354248047, {'eval_loss': 0.5521266460418701, 'eval_accuracy': 0.7924050632911392, 'eval_runtime': 4.9669, 'eval_samples_per_second': 79.526}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,0.9183,0.586043,0.775,1.101,72.659
2,0.3082,0.53055,0.825,1.0377,77.095
3,0.2178,0.547824,0.8375,1.0939,73.136
4,0.0556,0.651933,0.85,1.0534,75.945


domiain hotels n_samples_per_class=200, repeat=0, time_elapsed=152.49638056755066, {'eval_loss': 0.6584933400154114, 'eval_accuracy': 0.8126582278481013, 'eval_runtime': 4.9758, 'eval_samples_per_second': 79.384}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,0.9066,0.569716,0.7875,1.1103,72.055
2,0.3303,0.516463,0.7875,1.0388,77.012
3,0.1366,0.580107,0.8125,1.0954,73.036
4,0.0191,0.646161,0.8375,1.0634,75.232


domiain hotels n_samples_per_class=200, repeat=1, time_elapsed=152.23806381225586, {'eval_loss': 0.5491312742233276, 'eval_accuracy': 0.8405063291139241, 'eval_runtime': 4.9621, 'eval_samples_per_second': 79.603}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,0.9066,0.569716,0.7875,1.1095,72.104
2,0.3303,0.516463,0.7875,1.0383,77.047
3,0.1366,0.580107,0.8125,1.0851,73.727
4,0.0191,0.646161,0.8375,1.0608,75.417


domiain hotels n_samples_per_class=200, repeat=2, time_elapsed=152.11292481422424, {'eval_loss': 0.5491312742233276, 'eval_accuracy': 0.8405063291139241, 'eval_runtime': 4.9647, 'eval_samples_per_second': 79.562}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,0.9066,0.569716,0.7875,1.1092,72.122
2,0.3303,0.516463,0.7875,1.034,77.37
3,0.1366,0.580107,0.8125,1.0924,73.233
4,0.0191,0.646161,0.8375,1.0561,75.754


domiain hotels n_samples_per_class=200, repeat=3, time_elapsed=154.09865999221802, {'eval_loss': 0.5491312742233276, 'eval_accuracy': 0.8405063291139241, 'eval_runtime': 4.9266, 'eval_samples_per_second': 80.176}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,0.9066,0.569716,0.7875,1.1198,71.444
2,0.3303,0.516463,0.7875,1.0326,77.472
3,0.1366,0.580107,0.8125,1.0934,73.169
4,0.0191,0.646161,0.8375,1.049,76.264


domiain hotels n_samples_per_class=200, repeat=4, time_elapsed=151.81238317489624, {'eval_loss': 0.5491312742233276, 'eval_accuracy': 0.8405063291139241, 'eval_runtime': 4.9372, 'eval_samples_per_second': 80.006}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.42943,0.0,0.0639,62.559
2,No log,1.428205,0.0,0.0495,80.794
3,No log,1.425797,0.0,0.0484,82.574
4,No log,1.422297,0.0,0.077,51.968


domiain medicine n_samples_per_class=1, repeat=0, time_elapsed=128.19208407402039, {'eval_loss': 1.470704436302185, 'eval_accuracy': 0.18654434250764526, 'eval_runtime': 4.239, 'eval_samples_per_second': 77.141}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.495823,0.0,0.0484,82.62
2,No log,1.494811,0.0,0.0743,53.861
3,No log,1.492713,0.0,0.0762,52.461
4,No log,1.489619,0.0,0.057,70.157


domiain medicine n_samples_per_class=1, repeat=1, time_elapsed=116.99443364143372, {'eval_loss': 1.3705888986587524, 'eval_accuracy': 0.3547400611620795, 'eval_runtime': 4.396, 'eval_samples_per_second': 74.386}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.495823,0.0,0.0531,75.397
2,No log,1.494811,0.0,0.075,53.365
3,No log,1.492713,0.0,0.0628,63.652
4,No log,1.489619,0.0,0.0512,78.168


domiain medicine n_samples_per_class=1, repeat=2, time_elapsed=116.92654705047607, {'eval_loss': 1.3705888986587524, 'eval_accuracy': 0.3547400611620795, 'eval_runtime': 4.3146, 'eval_samples_per_second': 75.789}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.495823,0.0,0.0484,82.584
2,No log,1.494811,0.0,0.0678,59.039
3,No log,1.492713,0.0,0.071,56.344
4,No log,1.489619,0.0,0.0674,59.373


domiain medicine n_samples_per_class=1, repeat=3, time_elapsed=119.33610391616821, {'eval_loss': 1.3705888986587524, 'eval_accuracy': 0.3547400611620795, 'eval_runtime': 4.3844, 'eval_samples_per_second': 74.583}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.495823,0.0,0.0664,60.246
2,No log,1.494811,0.0,0.0552,72.529
3,No log,1.492713,0.0,0.0497,80.503
4,No log,1.489619,0.0,0.078,51.266


domiain medicine n_samples_per_class=1, repeat=4, time_elapsed=115.19455909729004, {'eval_loss': 1.3705888986587524, 'eval_accuracy': 0.3547400611620795, 'eval_runtime': 4.3114, 'eval_samples_per_second': 75.845}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.356806,0.25,0.0469,85.363
2,No log,1.356072,0.25,0.0489,81.842
3,No log,1.354476,0.25,0.0488,81.911
4,No log,1.352198,0.25,0.0494,81.022


domiain medicine n_samples_per_class=3, repeat=0, time_elapsed=124.64973664283752, {'eval_loss': 1.3705888986587524, 'eval_accuracy': 0.3547400611620795, 'eval_runtime': 4.3652, 'eval_samples_per_second': 74.91}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.45734,0.25,0.0457,87.44
2,No log,1.456583,0.25,0.051,78.463
3,No log,1.45514,0.25,0.0489,81.843
4,No log,1.452924,0.25,0.0484,82.709


domiain medicine n_samples_per_class=3, repeat=1, time_elapsed=125.62961435317993, {'eval_loss': 1.436017632484436, 'eval_accuracy': 0.19877675840978593, 'eval_runtime': 4.4175, 'eval_samples_per_second': 74.025}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.45734,0.25,0.0473,84.481
2,No log,1.456583,0.25,0.0495,80.791
3,No log,1.45514,0.25,0.0511,78.33
4,No log,1.452924,0.25,0.05,79.955


domiain medicine n_samples_per_class=3, repeat=2, time_elapsed=128.2307620048523, {'eval_loss': 1.436017632484436, 'eval_accuracy': 0.19877675840978593, 'eval_runtime': 4.3751, 'eval_samples_per_second': 74.74}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.45734,0.25,0.0463,86.343
2,No log,1.456583,0.25,0.0505,79.143
3,No log,1.45514,0.25,0.0471,84.914
4,No log,1.452924,0.25,0.0504,79.427


domiain medicine n_samples_per_class=3, repeat=3, time_elapsed=123.39036989212036, {'eval_loss': 1.436017632484436, 'eval_accuracy': 0.19877675840978593, 'eval_runtime': 4.3817, 'eval_samples_per_second': 74.629}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.45734,0.25,0.0456,87.795
2,No log,1.456583,0.25,0.0503,79.542
3,No log,1.45514,0.25,0.0491,81.429
4,No log,1.452924,0.25,0.0487,82.15


domiain medicine n_samples_per_class=3, repeat=4, time_elapsed=126.93490028381348, {'eval_loss': 1.436017632484436, 'eval_accuracy': 0.19877675840978593, 'eval_runtime': 4.4368, 'eval_samples_per_second': 73.702}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.444348,0.0,0.0511,78.352
2,No log,1.439344,0.0,0.0494,80.925
3,No log,1.430476,0.0,0.0471,84.855
4,No log,1.418584,0.0,0.0517,77.315


domiain medicine n_samples_per_class=5, repeat=0, time_elapsed=131.49458503723145, {'eval_loss': 1.4351778030395508, 'eval_accuracy': 0.2018348623853211, 'eval_runtime': 4.4098, 'eval_samples_per_second': 74.153}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.390309,0.5,0.0536,74.654
2,No log,1.38681,0.5,0.0513,78.036
3,No log,1.380687,0.5,0.0475,84.295
4,No log,1.371911,0.5,0.0494,81.048


domiain medicine n_samples_per_class=5, repeat=1, time_elapsed=129.92553210258484, {'eval_loss': 1.4151179790496826, 'eval_accuracy': 0.1834862385321101, 'eval_runtime': 4.4048, 'eval_samples_per_second': 74.237}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.390309,0.5,0.0515,77.688
2,No log,1.38681,0.5,0.0498,80.314
3,No log,1.380687,0.5,0.0487,82.18
4,No log,1.371911,0.5,0.0467,85.743


domiain medicine n_samples_per_class=5, repeat=2, time_elapsed=130.53016138076782, {'eval_loss': 1.4151179790496826, 'eval_accuracy': 0.1834862385321101, 'eval_runtime': 4.4588, 'eval_samples_per_second': 73.338}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.390309,0.5,0.05,80.027
2,No log,1.38681,0.5,0.0502,79.627
3,No log,1.380687,0.5,0.0495,80.769
4,No log,1.371911,0.5,0.0489,81.791


domiain medicine n_samples_per_class=5, repeat=3, time_elapsed=128.9613435268402, {'eval_loss': 1.4151179790496826, 'eval_accuracy': 0.1834862385321101, 'eval_runtime': 4.4062, 'eval_samples_per_second': 74.213}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.390309,0.5,0.0511,78.293
2,No log,1.38681,0.5,0.0513,78.027
3,No log,1.380687,0.5,0.0475,84.178
4,No log,1.371911,0.5,0.0475,84.146


domiain medicine n_samples_per_class=5, repeat=4, time_elapsed=129.13005471229553, {'eval_loss': 1.4151179790496826, 'eval_accuracy': 0.1834862385321101, 'eval_runtime': 4.405, 'eval_samples_per_second': 74.234}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.405459,0.25,0.0527,75.935
2,No log,1.400717,0.25,0.048,83.267
3,No log,1.392034,0.25,0.0492,81.261
4,No log,1.37973,0.25,0.049,81.579


domiain medicine n_samples_per_class=8, repeat=0, time_elapsed=133.4355664253235, {'eval_loss': 1.4144470691680908, 'eval_accuracy': 0.18654434250764526, 'eval_runtime': 4.482, 'eval_samples_per_second': 72.958}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.419398,0.25,0.0572,69.971
2,No log,1.413489,0.25,0.0513,77.963
3,No log,1.403285,0.25,0.0478,83.654
4,No log,1.388323,0.25,0.0486,82.317


domiain medicine n_samples_per_class=8, repeat=1, time_elapsed=132.94855093955994, {'eval_loss': 1.376428246498108, 'eval_accuracy': 0.37003058103975534, 'eval_runtime': 4.3984, 'eval_samples_per_second': 74.346}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.419398,0.25,0.0512,78.082
2,No log,1.413489,0.25,0.05,79.991
3,No log,1.403285,0.25,0.0487,82.168
4,No log,1.388323,0.25,0.0499,80.209


domiain medicine n_samples_per_class=8, repeat=2, time_elapsed=133.8161346912384, {'eval_loss': 1.376428246498108, 'eval_accuracy': 0.37003058103975534, 'eval_runtime': 4.4154, 'eval_samples_per_second': 74.059}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.419398,0.25,0.0501,79.792
2,No log,1.413489,0.25,0.0511,78.253
3,No log,1.403285,0.25,0.0488,81.956
4,No log,1.388323,0.25,0.0475,84.215


domiain medicine n_samples_per_class=8, repeat=3, time_elapsed=132.0053198337555, {'eval_loss': 1.376428246498108, 'eval_accuracy': 0.37003058103975534, 'eval_runtime': 4.369, 'eval_samples_per_second': 74.846}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.419398,0.25,0.0541,73.899
2,No log,1.413489,0.25,0.0522,76.63
3,No log,1.403285,0.25,0.0495,80.741
4,No log,1.388323,0.25,0.0478,83.607


domiain medicine n_samples_per_class=8, repeat=4, time_elapsed=136.54785799980164, {'eval_loss': 1.376428246498108, 'eval_accuracy': 0.37003058103975534, 'eval_runtime': 4.3926, 'eval_samples_per_second': 74.443}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.393415,0.25,0.0428,93.403
2,No log,1.382375,0.25,0.0405,98.871
3,No log,1.363729,0.5,0.0429,93.305
4,1.387300,1.340503,0.5,0.0405,98.752


domiain medicine n_samples_per_class=10, repeat=0, time_elapsed=120.49413824081421, {'eval_loss': 1.3603603839874268, 'eval_accuracy': 0.3058103975535168, 'eval_runtime': 4.4033, 'eval_samples_per_second': 74.263}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.379478,0.25,0.0414,96.6
2,No log,1.369163,0.5,0.0386,103.593
3,No log,1.351866,0.5,0.0433,92.284
4,1.406600,1.329386,0.5,0.051,78.467


domiain medicine n_samples_per_class=10, repeat=1, time_elapsed=148.1167778968811, {'eval_loss': 1.387322187423706, 'eval_accuracy': 0.24770642201834864, 'eval_runtime': 4.4499, 'eval_samples_per_second': 73.485}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.379478,0.25,0.0434,92.196
2,No log,1.369163,0.5,0.0414,96.633
3,No log,1.351866,0.5,0.0413,96.948
4,1.406600,1.329386,0.5,0.0406,98.511


domiain medicine n_samples_per_class=10, repeat=2, time_elapsed=134.01011109352112, {'eval_loss': 1.387322187423706, 'eval_accuracy': 0.24770642201834864, 'eval_runtime': 4.3895, 'eval_samples_per_second': 74.495}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.379478,0.25,0.0428,93.5
2,No log,1.369163,0.5,0.0423,94.621
3,No log,1.351866,0.5,0.0394,101.427
4,1.406600,1.329386,0.5,0.0417,95.837


domiain medicine n_samples_per_class=10, repeat=3, time_elapsed=132.71119213104248, {'eval_loss': 1.387322187423706, 'eval_accuracy': 0.24770642201834864, 'eval_runtime': 4.4098, 'eval_samples_per_second': 74.153}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.379478,0.25,0.0431,92.73
2,No log,1.369163,0.5,0.0445,89.92
3,No log,1.351866,0.5,0.0443,90.376
4,1.406600,1.329386,0.5,0.0412,97.132


domiain medicine n_samples_per_class=10, repeat=4, time_elapsed=133.2346465587616, {'eval_loss': 1.387322187423706, 'eval_accuracy': 0.24770642201834864, 'eval_runtime': 4.4615, 'eval_samples_per_second': 73.293}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.357672,0.25,0.1044,76.593
2,1.375400,1.328873,0.375,0.1007,79.458
3,1.375400,1.27991,0.75,0.0982,81.485
4,1.267900,1.21756,0.75,0.1,79.999


domiain medicine n_samples_per_class=20, repeat=0, time_elapsed=122.83195757865906, {'eval_loss': 1.314335584640503, 'eval_accuracy': 0.4648318042813456, 'eval_runtime': 4.3883, 'eval_samples_per_second': 74.516}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.434089,0.25,0.1035,77.284
2,1.389400,1.385116,0.375,0.0965,82.917
3,1.389400,1.321414,0.25,0.0963,83.079
4,1.274000,1.250357,0.5,0.0972,82.273


domiain medicine n_samples_per_class=20, repeat=1, time_elapsed=137.56970858573914, {'eval_loss': 1.1961129903793335, 'eval_accuracy': 0.6636085626911316, 'eval_runtime': 4.3386, 'eval_samples_per_second': 75.37}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.434089,0.25,0.1051,76.109
2,1.389400,1.385116,0.375,0.0967,82.731
3,1.389400,1.321414,0.25,0.099,80.777
4,1.274000,1.250357,0.5,0.0977,81.872


domiain medicine n_samples_per_class=20, repeat=2, time_elapsed=137.35815334320068, {'eval_loss': 1.1961129903793335, 'eval_accuracy': 0.6636085626911316, 'eval_runtime': 4.3841, 'eval_samples_per_second': 74.587}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.434089,0.25,0.0997,80.209
2,1.389400,1.385116,0.375,0.097,82.437
3,1.389400,1.321414,0.25,0.0995,80.423
4,1.274000,1.250357,0.5,0.0995,80.375


domiain medicine n_samples_per_class=20, repeat=3, time_elapsed=135.08347821235657, {'eval_loss': 1.1961129903793335, 'eval_accuracy': 0.6636085626911316, 'eval_runtime': 4.2858, 'eval_samples_per_second': 76.298}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.434089,0.25,0.1022,78.267
2,1.389400,1.385116,0.375,0.096,83.329
3,1.389400,1.321414,0.25,0.1009,79.294
4,1.274000,1.250357,0.5,0.1004,79.709


domiain medicine n_samples_per_class=20, repeat=4, time_elapsed=138.34950470924377, {'eval_loss': 1.1961129903793335, 'eval_accuracy': 0.6636085626911316, 'eval_runtime': 4.2597, 'eval_samples_per_second': 76.766}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.378149,0.333333,0.1679,71.476
2,1.399600,1.289061,0.5,0.1643,73.018
3,1.318400,1.183137,0.666667,0.1633,73.464
4,1.318400,1.028352,0.666667,0.1654,72.553


domiain medicine n_samples_per_class=30, repeat=0, time_elapsed=136.35421872138977, {'eval_loss': 1.1922376155853271, 'eval_accuracy': 0.6972477064220184, 'eval_runtime': 4.2546, 'eval_samples_per_second': 76.859}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.402927,0.083333,0.1685,71.207
2,1.400100,1.323248,0.333333,0.1618,74.148
3,1.299800,1.219827,0.583333,0.1576,76.119
4,1.299800,1.065017,0.75,0.1683,71.32


domiain medicine n_samples_per_class=30, repeat=1, time_elapsed=139.39728045463562, {'eval_loss': 1.1350666284561157, 'eval_accuracy': 0.5474006116207951, 'eval_runtime': 4.2406, 'eval_samples_per_second': 77.112}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.402927,0.083333,0.1655,72.503
2,1.400100,1.323248,0.333333,0.1638,73.282
3,1.299800,1.219827,0.583333,0.1599,75.038
4,1.299800,1.065017,0.75,0.1625,73.838


domiain medicine n_samples_per_class=30, repeat=2, time_elapsed=138.79141545295715, {'eval_loss': 1.1350666284561157, 'eval_accuracy': 0.5474006116207951, 'eval_runtime': 4.2532, 'eval_samples_per_second': 76.883}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.402927,0.083333,0.1654,72.54
2,1.400100,1.323248,0.333333,0.1672,71.759
3,1.299800,1.219827,0.583333,0.1627,73.772
4,1.299800,1.065017,0.75,0.1696,70.752


domiain medicine n_samples_per_class=30, repeat=3, time_elapsed=136.57934951782227, {'eval_loss': 1.1350666284561157, 'eval_accuracy': 0.5474006116207951, 'eval_runtime': 4.2395, 'eval_samples_per_second': 77.131}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.402927,0.083333,0.1645,72.964
2,1.400100,1.323248,0.333333,0.1675,71.638
3,1.299800,1.219827,0.583333,0.1525,78.69
4,1.299800,1.065017,0.75,0.1654,72.536


domiain medicine n_samples_per_class=30, repeat=4, time_elapsed=140.99081659317017, {'eval_loss': 1.1350666284561157, 'eval_accuracy': 0.5474006116207951, 'eval_runtime': 4.3483, 'eval_samples_per_second': 75.201}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.4099,1.337053,0.375,0.3272,73.345
2,1.3219,1.092196,0.666667,0.3092,77.608
3,0.9454,0.786871,0.75,0.3113,77.096
4,0.7943,0.698628,0.75,0.3245,73.963


domiain medicine n_samples_per_class=60, repeat=0, time_elapsed=135.8385510444641, {'eval_loss': 0.8146778345108032, 'eval_accuracy': 0.6666666666666666, 'eval_runtime': 4.2021, 'eval_samples_per_second': 77.819}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3874,1.312497,0.458333,0.324,74.065
2,1.3416,1.09338,0.583333,0.3211,74.735
3,0.9508,0.769849,0.75,0.3108,77.214
4,0.7453,0.62999,0.75,0.3222,74.477


domiain medicine n_samples_per_class=60, repeat=1, time_elapsed=138.96003007888794, {'eval_loss': 0.7968626618385315, 'eval_accuracy': 0.7339449541284404, 'eval_runtime': 4.175, 'eval_samples_per_second': 78.324}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3874,1.312497,0.458333,0.3175,75.589
2,1.3416,1.09338,0.583333,0.3177,75.539
3,0.9508,0.769849,0.75,0.3093,77.603
4,0.7453,0.62999,0.75,0.3287,73.012


domiain medicine n_samples_per_class=60, repeat=2, time_elapsed=139.4132330417633, {'eval_loss': 0.7968626618385315, 'eval_accuracy': 0.7339449541284404, 'eval_runtime': 4.1913, 'eval_samples_per_second': 78.019}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3874,1.312497,0.458333,0.3192,75.18
2,1.3416,1.09338,0.583333,0.3232,74.248
3,0.9508,0.769849,0.75,0.3132,76.635
4,0.7453,0.62999,0.75,0.3302,72.684


domiain medicine n_samples_per_class=60, repeat=3, time_elapsed=138.74025011062622, {'eval_loss': 0.7968626618385315, 'eval_accuracy': 0.7339449541284404, 'eval_runtime': 4.1762, 'eval_samples_per_second': 78.301}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3874,1.312497,0.458333,0.3236,74.169
2,1.3416,1.09338,0.583333,0.318,75.46
3,0.9508,0.769849,0.75,0.3103,77.333
4,0.7453,0.62999,0.75,0.3249,73.877


domiain medicine n_samples_per_class=60, repeat=4, time_elapsed=139.1879963874817, {'eval_loss': 0.7968626618385315, 'eval_accuracy': 0.7339449541284404, 'eval_runtime': 4.1604, 'eval_samples_per_second': 78.599}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3264,1.184859,0.6,0.5513,72.556
2,0.9871,0.662861,0.725,0.5312,75.299
3,0.5242,0.449265,0.775,0.5188,77.097
4,0.2795,0.954952,0.6,0.5611,71.283


domiain medicine n_samples_per_class=100, repeat=0, time_elapsed=139.34715342521667, {'eval_loss': 0.6134612560272217, 'eval_accuracy': 0.7339449541284404, 'eval_runtime': 4.1016, 'eval_samples_per_second': 79.725}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3322,1.163401,0.675,0.5439,73.538
2,0.9823,0.694052,0.825,0.5429,73.675
3,0.5791,0.511757,0.725,0.521,76.773
4,0.3238,0.760183,0.7,0.562,71.176


domiain medicine n_samples_per_class=100, repeat=1, time_elapsed=153.91272139549255, {'eval_loss': 0.7672492265701294, 'eval_accuracy': 0.6788990825688074, 'eval_runtime': 4.1367, 'eval_samples_per_second': 79.048}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3322,1.163401,0.675,0.5531,72.32
2,0.9823,0.694052,0.825,0.534,74.901
3,0.5791,0.511757,0.725,0.5172,77.335
4,0.3238,0.760183,0.7,0.5535,72.264


domiain medicine n_samples_per_class=100, repeat=2, time_elapsed=152.1360948085785, {'eval_loss': 0.7672492265701294, 'eval_accuracy': 0.6788990825688074, 'eval_runtime': 4.137, 'eval_samples_per_second': 79.043}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3322,1.163401,0.675,0.5504,72.678
2,0.9823,0.694052,0.825,0.5302,75.445
3,0.5791,0.511757,0.725,0.5202,76.899
4,0.3238,0.760183,0.7,0.5519,72.476


domiain medicine n_samples_per_class=100, repeat=3, time_elapsed=151.86953926086426, {'eval_loss': 0.7672492265701294, 'eval_accuracy': 0.6788990825688074, 'eval_runtime': 4.1314, 'eval_samples_per_second': 79.15}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3322,1.163401,0.675,0.5497,72.761
2,0.9823,0.694052,0.825,0.5313,75.289
3,0.5791,0.511757,0.725,0.5205,76.851
4,0.3238,0.760183,0.7,0.5604,71.381


domiain medicine n_samples_per_class=100, repeat=4, time_elapsed=152.63334774971008, {'eval_loss': 0.7672492265701294, 'eval_accuracy': 0.6788990825688074, 'eval_runtime': 4.1359, 'eval_samples_per_second': 79.064}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.037,0.724621,0.7125,1.0991,72.784
2,0.5229,0.437205,0.825,1.0439,76.634
3,0.2728,0.593906,0.825,1.07,74.767
4,0.0639,0.426724,0.9,1.0605,75.438


domiain medicine n_samples_per_class=200, repeat=0, time_elapsed=153.5443890094757, {'eval_loss': 0.5052164793014526, 'eval_accuracy': 0.8776758409785933, 'eval_runtime': 4.1355, 'eval_samples_per_second': 79.071}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.0422,0.713299,0.7375,1.1096,72.095
2,0.5415,0.452645,0.825,1.0353,77.274
3,0.3001,0.47013,0.8625,1.094,73.127
4,0.0849,0.510258,0.7875,1.0539,75.91


domiain medicine n_samples_per_class=200, repeat=1, time_elapsed=153.66070365905762, {'eval_loss': 0.40470170974731445, 'eval_accuracy': 0.8685015290519877, 'eval_runtime': 4.0786, 'eval_samples_per_second': 80.174}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.0422,0.713299,0.7375,1.1247,71.133
2,0.5415,0.452645,0.825,1.047,76.406
3,0.3001,0.47013,0.8625,1.0665,75.011
4,0.0849,0.510258,0.7875,1.0729,74.562


domiain medicine n_samples_per_class=200, repeat=2, time_elapsed=157.05446529388428, {'eval_loss': 0.40470170974731445, 'eval_accuracy': 0.8685015290519877, 'eval_runtime': 4.1268, 'eval_samples_per_second': 79.238}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.0422,0.713299,0.7375,1.1022,72.582
2,0.5415,0.452645,0.825,1.0407,76.87
3,0.3001,0.47013,0.8625,1.0891,73.454
4,0.0849,0.510258,0.7875,1.0616,75.356


domiain medicine n_samples_per_class=200, repeat=3, time_elapsed=152.50956463813782, {'eval_loss': 0.40470170974731445, 'eval_accuracy': 0.8685015290519877, 'eval_runtime': 4.0941, 'eval_samples_per_second': 79.872}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.0422,0.713299,0.7375,1.1088,72.147
2,0.5415,0.452645,0.825,1.0442,76.61
3,0.3001,0.47013,0.8625,1.0886,73.489
4,0.0849,0.510258,0.7875,1.0604,75.441


domiain medicine n_samples_per_class=200, repeat=4, time_elapsed=153.91765666007996, {'eval_loss': 0.40470170974731445, 'eval_accuracy': 0.8685015290519877, 'eval_runtime': 4.155, 'eval_samples_per_second': 78.7}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.443964,0.0,0.0653,61.3
2,No log,1.443179,0.0,0.06,66.669
3,No log,1.441593,0.0,0.0583,68.631
4,No log,1.439421,0.0,0.0619,64.592


domiain products n_samples_per_class=1, repeat=0, time_elapsed=130.9419629573822, {'eval_loss': 1.5651975870132446, 'eval_accuracy': 0.041666666666666664, 'eval_runtime': 0.6495, 'eval_samples_per_second': 73.9}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.429944,0.25,0.0551,72.538
2,No log,1.42855,0.25,0.0622,64.281
3,No log,1.425946,0.25,0.0715,55.982
4,No log,1.422269,0.25,0.0647,61.859


domiain products n_samples_per_class=1, repeat=1, time_elapsed=122.14682602882385, {'eval_loss': 1.4811333417892456, 'eval_accuracy': 0.041666666666666664, 'eval_runtime': 0.6399, 'eval_samples_per_second': 75.012}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.429944,0.25,0.0572,69.912
2,No log,1.42855,0.25,0.0591,67.631
3,No log,1.425946,0.25,0.0782,51.126
4,No log,1.422269,0.25,0.0554,72.252


domiain products n_samples_per_class=1, repeat=2, time_elapsed=120.16921973228455, {'eval_loss': 1.4811333417892456, 'eval_accuracy': 0.041666666666666664, 'eval_runtime': 0.6508, 'eval_samples_per_second': 73.754}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.429944,0.25,0.0586,68.316
2,No log,1.42855,0.25,0.0497,80.433
3,No log,1.425946,0.25,0.0623,64.186
4,No log,1.422269,0.25,0.0572,69.875


domiain products n_samples_per_class=1, repeat=3, time_elapsed=121.48539662361145, {'eval_loss': 1.4811333417892456, 'eval_accuracy': 0.041666666666666664, 'eval_runtime': 0.6713, 'eval_samples_per_second': 71.501}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.429944,0.25,0.0513,77.965
2,No log,1.42855,0.25,0.0786,50.918
3,No log,1.425946,0.25,0.057,70.134
4,No log,1.422269,0.25,0.0608,65.802


domiain products n_samples_per_class=1, repeat=4, time_elapsed=119.58675742149353, {'eval_loss': 1.4811333417892456, 'eval_accuracy': 0.041666666666666664, 'eval_runtime': 0.6667, 'eval_samples_per_second': 71.993}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.375997,0.25,0.0524,76.365
2,No log,1.37485,0.25,0.05,79.923
3,No log,1.372638,0.25,0.0505,79.241
4,No log,1.369575,0.25,0.0498,80.307


domiain products n_samples_per_class=3, repeat=0, time_elapsed=127.80231666564941, {'eval_loss': 1.4811333417892456, 'eval_accuracy': 0.041666666666666664, 'eval_runtime': 0.6642, 'eval_samples_per_second': 72.269}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.365625,0.5,0.0464,86.159
2,No log,1.365936,0.5,0.0518,77.161
3,No log,1.366613,0.5,0.05,79.983
4,No log,1.367679,0.5,0.0485,82.547


domiain products n_samples_per_class=3, repeat=1, time_elapsed=129.08791589736938, {'eval_loss': 1.2629450559616089, 'eval_accuracy': 0.375, 'eval_runtime': 0.6609, 'eval_samples_per_second': 72.631}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.365625,0.5,0.0463,86.319
2,No log,1.365936,0.5,0.0539,74.247
3,No log,1.366613,0.5,0.051,78.401
4,No log,1.367679,0.5,0.0518,77.266


domiain products n_samples_per_class=3, repeat=2, time_elapsed=129.99838781356812, {'eval_loss': 1.2629450559616089, 'eval_accuracy': 0.375, 'eval_runtime': 0.6623, 'eval_samples_per_second': 72.47}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.365625,0.5,0.0483,82.806
2,No log,1.365936,0.5,0.0503,79.515
3,No log,1.366613,0.5,0.0509,78.556
4,No log,1.367679,0.5,1.968,2.033


domiain products n_samples_per_class=3, repeat=3, time_elapsed=130.94184708595276, {'eval_loss': 1.2629450559616089, 'eval_accuracy': 0.375, 'eval_runtime': 0.6596, 'eval_samples_per_second': 72.772}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.365625,0.5,0.048,83.347
2,No log,1.365936,0.5,0.0512,78.049
3,No log,1.366613,0.5,0.0485,82.522
4,No log,1.367679,0.5,0.0512,78.114


domiain products n_samples_per_class=3, repeat=4, time_elapsed=129.91928887367249, {'eval_loss': 1.2629450559616089, 'eval_accuracy': 0.375, 'eval_runtime': 0.6509, 'eval_samples_per_second': 73.743}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.334357,0.5,0.0431,92.891
2,No log,1.334128,0.5,0.0501,79.865
3,No log,1.333748,0.5,0.0491,81.451
4,No log,1.333186,0.5,0.0494,80.899


domiain products n_samples_per_class=5, repeat=0, time_elapsed=130.26274991035461, {'eval_loss': 1.2629450559616089, 'eval_accuracy': 0.375, 'eval_runtime': 0.6482, 'eval_samples_per_second': 74.047}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.349897,0.25,0.0491,81.485
2,No log,1.349461,0.25,0.0513,78.008
3,No log,1.348601,0.25,0.0487,82.137
4,No log,1.347558,0.25,0.0508,78.81


domiain products n_samples_per_class=5, repeat=1, time_elapsed=132.35598993301392, {'eval_loss': 1.1907671689987183, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.6553, 'eval_samples_per_second': 73.247}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.349897,0.25,0.0474,84.377
2,No log,1.349461,0.25,0.0504,79.319
3,No log,1.348601,0.25,0.0572,69.967
4,No log,1.347558,0.25,0.0502,79.734


domiain products n_samples_per_class=5, repeat=2, time_elapsed=128.1763575077057, {'eval_loss': 1.1907671689987183, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.6625, 'eval_samples_per_second': 72.448}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.349897,0.25,0.0482,83.048
2,No log,1.349461,0.25,0.051,78.465
3,No log,1.348601,0.25,0.0505,79.275
4,No log,1.347558,0.25,0.0507,78.948


domiain products n_samples_per_class=5, repeat=3, time_elapsed=131.0585114955902, {'eval_loss': 1.1907671689987183, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.6598, 'eval_samples_per_second': 72.751}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.349897,0.25,0.0459,87.118
2,No log,1.349461,0.25,0.0548,73.017
3,No log,1.348601,0.25,0.0494,81.01
4,No log,1.347558,0.25,0.0515,77.655


domiain products n_samples_per_class=5, repeat=4, time_elapsed=131.3702998161316, {'eval_loss': 1.1907671689987183, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.6643, 'eval_samples_per_second': 72.261}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.268026,0.25,0.0533,75.111
2,No log,1.267238,0.25,0.0528,75.706
3,No log,1.266666,0.25,0.0497,80.505
4,No log,1.269456,0.25,0.0513,77.941


domiain products n_samples_per_class=8, repeat=0, time_elapsed=135.06623721122742, {'eval_loss': 1.189273476600647, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.6661, 'eval_samples_per_second': 72.066}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.445021,0.25,0.0514,77.77
2,No log,1.44312,0.25,0.052,76.966
3,No log,1.441493,0.25,0.0501,79.878
4,No log,1.441233,0.25,0.0517,77.35


domiain products n_samples_per_class=8, repeat=1, time_elapsed=136.9790599346161, {'eval_loss': 1.5029598474502563, 'eval_accuracy': 0.020833333333333332, 'eval_runtime': 0.6534, 'eval_samples_per_second': 73.46}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.445021,0.25,0.0535,74.784
2,No log,1.44312,0.25,0.0508,78.783
3,No log,1.441493,0.25,0.0501,79.85
4,No log,1.441233,0.25,0.0538,74.386


domiain products n_samples_per_class=8, repeat=2, time_elapsed=134.75473713874817, {'eval_loss': 1.5029598474502563, 'eval_accuracy': 0.020833333333333332, 'eval_runtime': 0.6571, 'eval_samples_per_second': 73.051}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.445021,0.25,0.0515,77.611
2,No log,1.44312,0.25,0.0519,77.098
3,No log,1.441493,0.25,0.051,78.385
4,No log,1.441233,0.25,0.0505,79.175


domiain products n_samples_per_class=8, repeat=3, time_elapsed=135.14386558532715, {'eval_loss': 1.5029598474502563, 'eval_accuracy': 0.020833333333333332, 'eval_runtime': 0.6509, 'eval_samples_per_second': 73.741}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.445021,0.25,0.0552,72.462
2,No log,1.44312,0.25,0.0505,79.284
3,No log,1.441493,0.25,0.0506,78.994
4,No log,1.441233,0.25,0.0498,80.4


domiain products n_samples_per_class=8, repeat=4, time_elapsed=132.49697184562683, {'eval_loss': 1.5029598474502563, 'eval_accuracy': 0.020833333333333332, 'eval_runtime': 0.6475, 'eval_samples_per_second': 74.133}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.46666,0.25,0.0401,99.836
2,No log,1.46464,0.25,0.039,102.671
3,No log,1.462338,0.25,0.038,105.252
4,No log,1.461767,0.25,0.0425,94.143


domiain products n_samples_per_class=10, repeat=0, time_elapsed=137.10663747787476, {'eval_loss': 1.5036996603012085, 'eval_accuracy': 0.020833333333333332, 'eval_runtime': 0.6688, 'eval_samples_per_second': 71.765}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.408318,0.5,0.0392,102.09
2,No log,1.40565,0.5,0.04,100.042
3,No log,1.40156,0.5,0.0385,103.775
4,No log,1.39771,0.5,0.0408,98.149


domiain products n_samples_per_class=10, repeat=1, time_elapsed=135.8829483985901, {'eval_loss': 1.1966705322265625, 'eval_accuracy': 0.5625, 'eval_runtime': 0.6568, 'eval_samples_per_second': 73.083}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.408318,0.5,0.0404,99.112
2,No log,1.40565,0.5,0.0402,99.445
3,No log,1.40156,0.5,0.0451,88.709
4,No log,1.39771,0.5,0.056,71.395


domiain products n_samples_per_class=10, repeat=2, time_elapsed=137.24582815170288, {'eval_loss': 1.1966705322265625, 'eval_accuracy': 0.5625, 'eval_runtime': 0.6352, 'eval_samples_per_second': 75.562}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.408318,0.5,0.0378,105.711
2,No log,1.40565,0.5,0.0395,101.338
3,No log,1.40156,0.5,0.0432,92.669
4,No log,1.39771,0.5,0.0452,88.543


domiain products n_samples_per_class=10, repeat=3, time_elapsed=135.97528648376465, {'eval_loss': 1.1966705322265625, 'eval_accuracy': 0.5625, 'eval_runtime': 0.6476, 'eval_samples_per_second': 74.126}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.408318,0.5,0.0396,101.002
2,No log,1.40565,0.5,0.0404,98.972
3,No log,1.40156,0.5,0.0438,91.328
4,No log,1.39771,0.5,0.0399,100.197


domiain products n_samples_per_class=10, repeat=4, time_elapsed=133.78014135360718, {'eval_loss': 1.1966705322265625, 'eval_accuracy': 0.5625, 'eval_runtime': 0.6504, 'eval_samples_per_second': 73.803}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.36458,0.375,0.0773,103.458
2,No log,1.358174,0.375,0.0791,101.135
3,1.341500,1.361621,0.375,0.0785,101.925
4,1.341500,1.387848,0.25,0.0744,107.525


domiain products n_samples_per_class=20, repeat=0, time_elapsed=141.77592062950134, {'eval_loss': 1.1876739263534546, 'eval_accuracy': 0.5625, 'eval_runtime': 0.6345, 'eval_samples_per_second': 75.651}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.409647,0.25,0.077,103.831
2,No log,1.399986,0.375,0.0761,105.123
3,1.445300,1.406078,0.375,0.0798,100.248
4,1.445300,1.444364,0.375,0.0759,105.366


domiain products n_samples_per_class=20, repeat=1, time_elapsed=140.91437697410583, {'eval_loss': 1.3859496116638184, 'eval_accuracy': 0.10416666666666667, 'eval_runtime': 0.6442, 'eval_samples_per_second': 74.508}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.409647,0.25,0.0763,104.784
2,No log,1.399986,0.375,0.0811,98.649
3,1.445300,1.406078,0.375,0.0762,105.018
4,1.445300,1.444364,0.375,0.0757,105.612


domiain products n_samples_per_class=20, repeat=2, time_elapsed=137.16486501693726, {'eval_loss': 1.3859496116638184, 'eval_accuracy': 0.10416666666666667, 'eval_runtime': 0.6401, 'eval_samples_per_second': 74.989}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.409647,0.25,0.0771,103.762
2,No log,1.399986,0.375,0.0779,102.757
3,1.445300,1.406078,0.375,0.0757,105.7
4,1.445300,1.444364,0.375,0.0765,104.509


domiain products n_samples_per_class=20, repeat=3, time_elapsed=140.59423661231995, {'eval_loss': 1.3859496116638184, 'eval_accuracy': 0.10416666666666667, 'eval_runtime': 0.6458, 'eval_samples_per_second': 74.321}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.409647,0.25,0.0763,104.85
2,No log,1.399986,0.375,0.0722,110.817
3,1.445300,1.406078,0.375,0.0806,99.208
4,1.445300,1.444364,0.375,0.0765,104.597


domiain products n_samples_per_class=20, repeat=4, time_elapsed=139.07681465148926, {'eval_loss': 1.3859496116638184, 'eval_accuracy': 0.10416666666666667, 'eval_runtime': 0.6431, 'eval_samples_per_second': 74.642}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.439589,0.25,0.1562,76.83
2,1.370200,1.439453,0.25,0.1663,72.139
3,1.370200,1.493102,0.166667,0.1529,78.508
4,1.224200,1.594452,0.416667,0.1615,74.324


domiain products n_samples_per_class=30, repeat=0, time_elapsed=127.36998558044434, {'eval_loss': 1.195607304573059, 'eval_accuracy': 0.25, 'eval_runtime': 0.6709, 'eval_samples_per_second': 71.543}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.46292,0.333333,0.1615,74.326
2,1.419300,1.440886,0.416667,0.1645,72.94
3,1.419300,1.482632,0.5,0.1653,72.608
4,1.265600,1.582213,0.25,0.1616,74.245


domiain products n_samples_per_class=30, repeat=1, time_elapsed=145.5337324142456, {'eval_loss': 1.1878706216812134, 'eval_accuracy': 0.3958333333333333, 'eval_runtime': 0.6586, 'eval_samples_per_second': 72.881}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.46292,0.333333,0.1628,73.727
2,1.419300,1.440886,0.416667,0.1709,70.237
3,1.419300,1.482632,0.5,0.1662,72.22
4,1.265600,1.582213,0.25,0.1675,71.622


domiain products n_samples_per_class=30, repeat=2, time_elapsed=140.83086681365967, {'eval_loss': 1.1878706216812134, 'eval_accuracy': 0.3958333333333333, 'eval_runtime': 0.6424, 'eval_samples_per_second': 74.715}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.46292,0.333333,0.1591,75.419
2,1.419300,1.440886,0.416667,0.1611,74.477
3,1.419300,1.482632,0.5,0.158,75.947
4,1.265600,1.582213,0.25,0.1622,73.985


domiain products n_samples_per_class=30, repeat=3, time_elapsed=141.2836811542511, {'eval_loss': 1.1878706216812134, 'eval_accuracy': 0.3958333333333333, 'eval_runtime': 0.6597, 'eval_samples_per_second': 72.764}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.46292,0.333333,0.1617,74.194
2,1.419300,1.440886,0.416667,0.1632,73.544
3,1.419300,1.482632,0.5,0.1589,75.542
4,1.265600,1.582213,0.25,0.1609,74.601


domiain products n_samples_per_class=30, repeat=4, time_elapsed=142.2856285572052, {'eval_loss': 1.1878706216812134, 'eval_accuracy': 0.3958333333333333, 'eval_runtime': 0.6448, 'eval_samples_per_second': 74.447}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.4318,1.444815,0.291667,0.3126,76.777
2,1.2009,1.581801,0.375,0.3117,77.003
3,1.0872,1.771036,0.458333,0.305,78.695
4,0.9594,1.779419,0.458333,0.3184,75.369


domiain products n_samples_per_class=60, repeat=0, time_elapsed=141.4438397884369, {'eval_loss': 1.0509508848190308, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.7099, 'eval_samples_per_second': 67.614}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3512,1.424335,0.291667,0.3095,77.534
2,1.2017,1.587291,0.208333,0.3163,75.889
3,1.0722,1.760105,0.375,0.3071,78.156
4,0.9388,1.743829,0.416667,0.3289,72.98


domiain products n_samples_per_class=60, repeat=1, time_elapsed=141.0459086894989, {'eval_loss': 0.8975055813789368, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.635, 'eval_samples_per_second': 75.592}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3512,1.424335,0.291667,0.3148,76.228
2,1.2017,1.587291,0.208333,0.3225,74.43
3,1.0722,1.760105,0.375,0.3011,79.706
4,0.9388,1.743829,0.416667,0.3365,71.327


domiain products n_samples_per_class=60, repeat=2, time_elapsed=144.36833238601685, {'eval_loss': 0.8975055813789368, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.6524, 'eval_samples_per_second': 73.578}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3512,1.424335,0.291667,0.3101,77.402
2,1.2017,1.587291,0.208333,0.3145,76.312
3,1.0722,1.760105,0.375,0.3086,77.77
4,0.9388,1.743829,0.416667,0.3179,75.488


domiain products n_samples_per_class=60, repeat=3, time_elapsed=141.31033039093018, {'eval_loss': 0.8975055813789368, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.6344, 'eval_samples_per_second': 75.66}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3512,1.424335,0.291667,0.3095,77.538
2,1.2017,1.587291,0.208333,0.3081,77.903
3,1.0722,1.760105,0.375,0.3048,78.731
4,0.9388,1.743829,0.416667,0.3161,75.924


domiain products n_samples_per_class=60, repeat=4, time_elapsed=142.75502562522888, {'eval_loss': 0.8975055813789368, 'eval_accuracy': 0.7291666666666666, 'eval_runtime': 0.6341, 'eval_samples_per_second': 75.702}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3603,1.514546,0.2,0.542,73.804
2,1.0795,1.790059,0.3,0.5427,73.708
3,0.7522,1.598663,0.35,0.5284,75.696
4,0.4689,1.496853,0.625,0.5531,72.326


domiain products n_samples_per_class=100, repeat=0, time_elapsed=139.7206733226776, {'eval_loss': 0.6606875061988831, 'eval_accuracy': 0.7708333333333334, 'eval_runtime': 0.6666, 'eval_samples_per_second': 72.006}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3702,1.460563,0.4,0.5358,74.655
2,1.0737,1.769721,0.3,0.5466,73.184
3,0.7727,1.640933,0.4,0.5269,75.917
4,0.5108,1.740799,0.475,0.5526,72.387


domiain products n_samples_per_class=100, repeat=1, time_elapsed=144.74015545845032, {'eval_loss': 0.5077115893363953, 'eval_accuracy': 0.8541666666666666, 'eval_runtime': 0.624, 'eval_samples_per_second': 76.92}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3702,1.460563,0.4,0.5357,74.675
2,1.0737,1.769721,0.3,0.5454,73.342
3,0.7727,1.640933,0.4,0.532,75.194
4,0.5108,1.740799,0.475,0.5576,71.732


domiain products n_samples_per_class=100, repeat=2, time_elapsed=139.99888467788696, {'eval_loss': 0.5077115893363953, 'eval_accuracy': 0.8541666666666666, 'eval_runtime': 0.625, 'eval_samples_per_second': 76.799}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3702,1.460563,0.4,0.5454,73.345
2,1.0737,1.769721,0.3,0.5445,73.458
3,0.7727,1.640933,0.4,0.5279,75.772
4,0.5108,1.740799,0.475,0.5521,72.454


domiain products n_samples_per_class=100, repeat=3, time_elapsed=144.1022539138794, {'eval_loss': 0.5077115893363953, 'eval_accuracy': 0.8541666666666666, 'eval_runtime': 0.6209, 'eval_samples_per_second': 77.313}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3702,1.460563,0.4,0.5338,74.929
2,1.0737,1.769721,0.3,0.5542,72.17
3,0.7727,1.640933,0.4,0.5286,75.673
4,0.5108,1.740799,0.475,0.5644,70.87


domiain products n_samples_per_class=100, repeat=4, time_elapsed=142.91394329071045, {'eval_loss': 0.5077115893363953, 'eval_accuracy': 0.8541666666666666, 'eval_runtime': 0.6368, 'eval_samples_per_second': 75.374}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1053,1.743729,0.4125,1.1298,70.81
2,0.5612,1.225292,0.625,1.0539,75.906
3,0.1452,1.798836,0.5125,1.086,73.666
4,0.0515,1.767499,0.5875,1.0909,73.333


domiain products n_samples_per_class=200, repeat=0, time_elapsed=160.48030996322632, {'eval_loss': 0.5667858123779297, 'eval_accuracy': 0.8541666666666666, 'eval_runtime': 0.622, 'eval_samples_per_second': 77.176}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1191,1.796985,0.35,1.0938,73.143
2,0.6232,1.229657,0.65,1.0699,74.775
3,0.292,2.157255,0.4875,1.0813,73.987
4,0.0502,1.94227,0.5625,1.1018,72.609


domiain products n_samples_per_class=200, repeat=1, time_elapsed=159.8447916507721, {'eval_loss': 0.5802013874053955, 'eval_accuracy': 0.8541666666666666, 'eval_runtime': 0.621, 'eval_samples_per_second': 77.293}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1191,1.796985,0.35,1.0978,72.872
2,0.6232,1.229657,0.65,1.0613,75.38
3,0.292,2.157255,0.4875,1.0747,74.44
4,0.0502,1.94227,0.5625,1.1272,70.973


domiain products n_samples_per_class=200, repeat=2, time_elapsed=160.12633776664734, {'eval_loss': 0.5802013874053955, 'eval_accuracy': 0.8541666666666666, 'eval_runtime': 0.6082, 'eval_samples_per_second': 78.918}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1191,1.796985,0.35,1.1066,72.291
2,0.6232,1.229657,0.65,1.0646,75.148
3,0.292,2.157255,0.4875,1.0801,74.064
4,0.0502,1.94227,0.5625,1.1053,72.376


domiain products n_samples_per_class=200, repeat=3, time_elapsed=160.8173108100891, {'eval_loss': 0.5802013874053955, 'eval_accuracy': 0.8541666666666666, 'eval_runtime': 0.6108, 'eval_samples_per_second': 78.589}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1191,1.796985,0.35,1.1048,72.413
2,0.6232,1.229657,0.65,1.0681,74.9
3,0.292,2.157255,0.4875,1.0783,74.188
4,0.0502,1.94227,0.5625,1.113,71.881


domiain products n_samples_per_class=200, repeat=4, time_elapsed=160.23822402954102, {'eval_loss': 0.5802013874053955, 'eval_accuracy': 0.8541666666666666, 'eval_runtime': 0.605, 'eval_samples_per_second': 79.345}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.42427,0.333333,0.0401,74.746
2,No log,1.422208,0.333333,0.0429,69.864
3,No log,1.418241,0.333333,0.0402,74.671
4,No log,1.412444,0.333333,0.0392,76.504


domiain reviews n_samples_per_class=1, repeat=0, time_elapsed=116.56334090232849, {'eval_loss': 1.3604148626327515, 'eval_accuracy': 0.36, 'eval_runtime': 0.6767, 'eval_samples_per_second': 73.889}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.582493,0.0,0.034,88.289
2,No log,1.579385,0.0,0.0394,76.189
3,No log,1.57327,0.0,0.0388,77.41
4,No log,1.564095,0.0,0.0313,95.939


domiain reviews n_samples_per_class=1, repeat=1, time_elapsed=118.14661478996277, {'eval_loss': 1.571073293685913, 'eval_accuracy': 0.0, 'eval_runtime': 0.6605, 'eval_samples_per_second': 75.7}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.582493,0.0,0.0376,79.831
2,No log,1.579385,0.0,0.0403,74.375
3,No log,1.57327,0.0,0.0451,66.48
4,No log,1.564095,0.0,0.0354,84.762


domiain reviews n_samples_per_class=1, repeat=2, time_elapsed=120.03650522232056, {'eval_loss': 1.571073293685913, 'eval_accuracy': 0.0, 'eval_runtime': 0.6868, 'eval_samples_per_second': 72.798}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.582493,0.0,0.035,85.743
2,No log,1.579385,0.0,0.0619,48.456
3,No log,1.57327,0.0,0.038,78.844
4,No log,1.564095,0.0,0.0296,101.343


domiain reviews n_samples_per_class=1, repeat=3, time_elapsed=117.55797052383423, {'eval_loss': 1.571073293685913, 'eval_accuracy': 0.0, 'eval_runtime': 0.6904, 'eval_samples_per_second': 72.423}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.582493,0.0,0.0389,77.073
2,No log,1.579385,0.0,0.0357,83.99
3,No log,1.57327,0.0,0.0357,83.981
4,No log,1.564095,0.0,0.0299,100.297


domiain reviews n_samples_per_class=1, repeat=4, time_elapsed=117.0456919670105, {'eval_loss': 1.571073293685913, 'eval_accuracy': 0.0, 'eval_runtime': 0.6739, 'eval_samples_per_second': 74.19}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.627674,0.0,0.0317,94.716
2,No log,1.622479,0.0,0.0319,94.069
3,No log,1.612103,0.0,0.0323,92.971
4,No log,1.597023,0.0,0.0383,78.357


domiain reviews n_samples_per_class=3, repeat=0, time_elapsed=123.95493936538696, {'eval_loss': 1.571073293685913, 'eval_accuracy': 0.0, 'eval_runtime': 0.6573, 'eval_samples_per_second': 76.07}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.236536,0.666667,0.0277,108.409
2,No log,1.233887,0.666667,0.0353,85.052
3,No log,1.228736,0.666667,0.0439,68.4
4,No log,1.221236,0.666667,0.0812,36.963


domiain reviews n_samples_per_class=3, repeat=1, time_elapsed=125.17446231842041, {'eval_loss': 1.377296805381775, 'eval_accuracy': 0.32, 'eval_runtime': 0.6567, 'eval_samples_per_second': 76.142}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.236536,0.666667,0.0319,94.112
2,No log,1.233887,0.666667,0.0346,86.7
3,No log,1.228736,0.666667,0.0335,89.623
4,No log,1.221236,0.666667,0.0353,84.906


domiain reviews n_samples_per_class=3, repeat=2, time_elapsed=122.67944717407227, {'eval_loss': 1.377296805381775, 'eval_accuracy': 0.32, 'eval_runtime': 0.659, 'eval_samples_per_second': 75.868}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.236536,0.666667,0.031,96.814
2,No log,1.233887,0.666667,0.0345,86.874
3,No log,1.228736,0.666667,0.032,93.659
4,No log,1.221236,0.666667,0.0281,106.666


domiain reviews n_samples_per_class=3, repeat=3, time_elapsed=124.06697702407837, {'eval_loss': 1.377296805381775, 'eval_accuracy': 0.32, 'eval_runtime': 0.6881, 'eval_samples_per_second': 72.66}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.236536,0.666667,0.0333,90.03
2,No log,1.233887,0.666667,0.0312,96.066
3,No log,1.228736,0.666667,0.0358,83.717
4,No log,1.221236,0.666667,0.0327,91.725


domiain reviews n_samples_per_class=3, repeat=4, time_elapsed=126.54019927978516, {'eval_loss': 1.377296805381775, 'eval_accuracy': 0.32, 'eval_runtime': 0.6819, 'eval_samples_per_second': 73.328}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.388705,0.333333,0.0386,77.643
2,No log,1.385273,0.333333,0.0415,72.372
3,No log,1.378381,0.333333,0.0385,77.946
4,No log,1.368522,0.333333,0.04,74.951


domiain reviews n_samples_per_class=5, repeat=0, time_elapsed=125.41866636276245, {'eval_loss': 1.377296805381775, 'eval_accuracy': 0.32, 'eval_runtime': 0.6936, 'eval_samples_per_second': 72.09}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.360052,0.333333,0.0416,72.151
2,No log,1.356477,0.333333,0.0392,76.482
3,No log,1.349425,0.333333,0.0391,76.801
4,No log,1.339202,0.333333,0.0384,78.111


domiain reviews n_samples_per_class=5, repeat=1, time_elapsed=124.67261672019958, {'eval_loss': 1.4594182968139648, 'eval_accuracy': 0.06, 'eval_runtime': 0.673, 'eval_samples_per_second': 74.298}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.360052,0.333333,0.0356,84.154
2,No log,1.356477,0.333333,0.0379,79.246
3,No log,1.349425,0.333333,0.0389,77.11
4,No log,1.339202,0.333333,0.037,81.123


domiain reviews n_samples_per_class=5, repeat=2, time_elapsed=127.44051170349121, {'eval_loss': 1.4594182968139648, 'eval_accuracy': 0.06, 'eval_runtime': 0.6642, 'eval_samples_per_second': 75.283}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.360052,0.333333,0.0382,78.462
2,No log,1.356477,0.333333,0.0375,79.906
3,No log,1.349425,0.333333,0.0401,74.771
4,No log,1.339202,0.333333,0.0374,80.219


domiain reviews n_samples_per_class=5, repeat=3, time_elapsed=125.97445797920227, {'eval_loss': 1.4594182968139648, 'eval_accuracy': 0.06, 'eval_runtime': 0.6864, 'eval_samples_per_second': 72.844}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.360052,0.333333,0.0378,79.45
2,No log,1.356477,0.333333,0.0383,78.254
3,No log,1.349425,0.333333,0.0362,82.898
4,No log,1.339202,0.333333,0.0403,74.431


domiain reviews n_samples_per_class=5, repeat=4, time_elapsed=127.90510439872742, {'eval_loss': 1.4594182968139648, 'eval_accuracy': 0.06, 'eval_runtime': 0.6763, 'eval_samples_per_second': 73.929}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.35688,0.333333,0.0346,86.632
2,No log,1.343771,0.333333,0.0352,85.226
3,No log,1.323033,0.333333,0.0326,91.999
4,No log,1.297732,0.333333,0.0333,90.152


domiain reviews n_samples_per_class=8, repeat=0, time_elapsed=128.4852294921875, {'eval_loss': 1.4561767578125, 'eval_accuracy': 0.06, 'eval_runtime': 0.684, 'eval_samples_per_second': 73.096}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.369868,0.333333,0.0343,87.357
2,No log,1.357421,0.333333,0.0341,87.872
3,No log,1.337541,0.333333,0.0323,92.901
4,No log,1.313052,0.333333,0.0432,69.445


domiain reviews n_samples_per_class=8, repeat=1, time_elapsed=131.60769748687744, {'eval_loss': 1.264765977859497, 'eval_accuracy': 0.52, 'eval_runtime': 0.6696, 'eval_samples_per_second': 74.672}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.369868,0.333333,0.0344,87.28
2,No log,1.357421,0.333333,0.0397,75.513
3,No log,1.337541,0.333333,0.0375,80.103
4,No log,1.313052,0.333333,0.0345,86.852


domiain reviews n_samples_per_class=8, repeat=2, time_elapsed=127.87924098968506, {'eval_loss': 1.264765977859497, 'eval_accuracy': 0.52, 'eval_runtime': 0.6854, 'eval_samples_per_second': 72.947}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.369868,0.333333,0.0401,74.741
2,No log,1.357421,0.333333,0.0327,91.817
3,No log,1.337541,0.333333,0.0327,91.837
4,No log,1.313052,0.333333,0.0345,86.959


domiain reviews n_samples_per_class=8, repeat=3, time_elapsed=131.17228174209595, {'eval_loss': 1.264765977859497, 'eval_accuracy': 0.52, 'eval_runtime': 0.6956, 'eval_samples_per_second': 71.878}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.369868,0.333333,0.0346,86.797
2,No log,1.357421,0.333333,0.0361,83.11
3,No log,1.337541,0.333333,0.0337,89.027
4,No log,1.313052,0.333333,0.0337,88.97


domiain reviews n_samples_per_class=8, repeat=4, time_elapsed=128.5118510723114, {'eval_loss': 1.264765977859497, 'eval_accuracy': 0.52, 'eval_runtime': 0.6825, 'eval_samples_per_second': 73.262}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.357052,0.333333,0.0324,92.472
2,No log,1.342563,0.333333,0.0341,87.849
3,No log,1.318151,0.333333,0.3352,8.949
4,No log,1.286598,0.333333,0.0345,87.08


domiain reviews n_samples_per_class=10, repeat=0, time_elapsed=133.71426796913147, {'eval_loss': 1.2644548416137695, 'eval_accuracy': 0.52, 'eval_runtime': 0.6713, 'eval_samples_per_second': 74.479}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.357892,0.333333,0.0353,84.877
2,No log,1.345521,0.333333,0.0346,86.597
3,No log,1.323069,0.333333,0.0338,88.778
4,No log,1.292978,0.333333,0.0328,91.383


domiain reviews n_samples_per_class=10, repeat=1, time_elapsed=129.42196226119995, {'eval_loss': 1.420040249824524, 'eval_accuracy': 0.4, 'eval_runtime': 0.6918, 'eval_samples_per_second': 72.28}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.357892,0.333333,0.0334,89.922
2,No log,1.345521,0.333333,0.0331,90.754
3,No log,1.323069,0.333333,0.0326,91.943
4,No log,1.292978,0.333333,0.0373,80.426


domiain reviews n_samples_per_class=10, repeat=2, time_elapsed=132.50163221359253, {'eval_loss': 1.420040249824524, 'eval_accuracy': 0.4, 'eval_runtime': 0.6736, 'eval_samples_per_second': 74.225}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.357892,0.333333,0.0431,69.645
2,No log,1.345521,0.333333,0.0339,88.508
3,No log,1.323069,0.333333,0.0381,78.731
4,No log,1.292978,0.333333,0.0337,88.977


domiain reviews n_samples_per_class=10, repeat=3, time_elapsed=129.0859296321869, {'eval_loss': 1.420040249824524, 'eval_accuracy': 0.4, 'eval_runtime': 0.6909, 'eval_samples_per_second': 72.364}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.357892,0.333333,0.0328,91.446
2,No log,1.345521,0.333333,0.0394,76.16
3,No log,1.323069,0.333333,0.0336,89.402
4,No log,1.292978,0.333333,0.0361,83.158


domiain reviews n_samples_per_class=10, repeat=4, time_elapsed=132.35012078285217, {'eval_loss': 1.420040249824524, 'eval_accuracy': 0.4, 'eval_runtime': 0.677, 'eval_samples_per_second': 73.853}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.324803,0.166667,0.0788,76.146
2,No log,1.278116,0.166667,0.0815,73.603
3,1.318700,1.215451,0.333333,0.0726,82.6
4,1.318700,1.154832,0.333333,0.0758,79.136


domiain reviews n_samples_per_class=20, repeat=0, time_elapsed=129.0963933467865, {'eval_loss': 1.2557663917541504, 'eval_accuracy': 0.38, 'eval_runtime': 0.6819, 'eval_samples_per_second': 73.324}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.411051,0.166667,0.0767,78.193
2,No log,1.352393,0.166667,0.0758,79.2
3,1.402400,1.278108,0.166667,0.0755,79.429
4,1.402400,1.207905,0.333333,0.0759,79.056


domiain reviews n_samples_per_class=20, repeat=1, time_elapsed=128.88323092460632, {'eval_loss': 1.1148871183395386, 'eval_accuracy': 0.54, 'eval_runtime': 0.653, 'eval_samples_per_second': 76.571}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.411051,0.166667,0.075,80.006
2,No log,1.352393,0.166667,0.0762,78.729
3,1.402400,1.278108,0.166667,0.0796,75.352
4,1.402400,1.207905,0.333333,0.0768,78.124


domiain reviews n_samples_per_class=20, repeat=2, time_elapsed=135.5227518081665, {'eval_loss': 1.1148871183395386, 'eval_accuracy': 0.54, 'eval_runtime': 0.6682, 'eval_samples_per_second': 74.833}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.411051,0.166667,0.0762,78.775
2,No log,1.352393,0.166667,0.0766,78.37
3,1.402400,1.278108,0.166667,0.0834,71.957
4,1.402400,1.207905,0.333333,0.0786,76.38


domiain reviews n_samples_per_class=20, repeat=3, time_elapsed=137.46962070465088, {'eval_loss': 1.1148871183395386, 'eval_accuracy': 0.54, 'eval_runtime': 0.6717, 'eval_samples_per_second': 74.433}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.411051,0.166667,0.0785,76.467
2,No log,1.352393,0.166667,0.0724,82.819
3,1.402400,1.278108,0.166667,0.0746,80.474
4,1.402400,1.207905,0.333333,0.0767,78.206


domiain reviews n_samples_per_class=20, repeat=4, time_elapsed=135.23779678344727, {'eval_loss': 1.1148871183395386, 'eval_accuracy': 0.54, 'eval_runtime': 0.6836, 'eval_samples_per_second': 73.14}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.377288,0.333333,0.0918,98.071
2,1.328200,1.277479,0.333333,0.0924,97.418
3,1.328200,1.167426,0.333333,0.0908,99.108
4,1.163800,1.086265,0.333333,0.0945,95.207


domiain reviews n_samples_per_class=30, repeat=0, time_elapsed=152.67887043952942, {'eval_loss': 1.3282934427261353, 'eval_accuracy': 0.48, 'eval_runtime': 0.6786, 'eval_samples_per_second': 73.678}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.457674,0.111111,0.0926,97.237
2,1.453300,1.315058,0.444444,0.0944,95.332
3,1.453300,1.180547,0.555556,0.0901,99.943
4,1.274200,1.105051,0.555556,0.0918,98.041


domiain reviews n_samples_per_class=30, repeat=1, time_elapsed=136.77154111862183, {'eval_loss': 1.139980435371399, 'eval_accuracy': 0.58, 'eval_runtime': 0.6668, 'eval_samples_per_second': 74.987}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.457674,0.111111,0.0922,97.584
2,1.453300,1.315058,0.444444,0.0939,95.814
3,1.453300,1.180547,0.555556,0.0948,94.985
4,1.274200,1.105051,0.555556,0.0932,96.541


domiain reviews n_samples_per_class=30, repeat=2, time_elapsed=128.5251452922821, {'eval_loss': 1.139980435371399, 'eval_accuracy': 0.58, 'eval_runtime': 0.6611, 'eval_samples_per_second': 75.632}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.457674,0.111111,0.0941,95.652
2,1.453300,1.315058,0.444444,0.0927,97.048
3,1.453300,1.180547,0.555556,0.0932,96.611
4,1.274200,1.105051,0.555556,0.0925,97.298


domiain reviews n_samples_per_class=30, repeat=3, time_elapsed=137.31349420547485, {'eval_loss': 1.139980435371399, 'eval_accuracy': 0.58, 'eval_runtime': 0.6764, 'eval_samples_per_second': 73.922}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,No log,1.457674,0.111111,0.0921,97.747
2,1.453300,1.315058,0.444444,0.096,93.721
3,1.453300,1.180547,0.555556,0.0919,97.953
4,1.274200,1.105051,0.555556,0.0918,97.999


domiain reviews n_samples_per_class=30, repeat=4, time_elapsed=137.22133255004883, {'eval_loss': 1.139980435371399, 'eval_accuracy': 0.58, 'eval_runtime': 0.6771, 'eval_samples_per_second': 73.849}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.4599,1.309835,0.555556,0.2287,78.713
2,1.2187,1.099038,0.611111,0.2224,80.92
3,1.0929,1.019083,0.5,0.2209,81.486
4,0.9745,0.913331,0.5,0.2312,77.845


domiain reviews n_samples_per_class=60, repeat=0, time_elapsed=150.3482530117035, {'eval_loss': 1.1157448291778564, 'eval_accuracy': 0.52, 'eval_runtime': 0.6744, 'eval_samples_per_second': 74.137}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.345,1.236779,0.5,0.2252,79.933
2,1.2044,1.091841,0.5,0.2259,79.69
3,1.0514,1.011402,0.5,0.2188,82.283
4,0.9543,0.893252,0.5,0.2268,79.375


domiain reviews n_samples_per_class=60, repeat=1, time_elapsed=143.17618322372437, {'eval_loss': 1.2228506803512573, 'eval_accuracy': 0.5, 'eval_runtime': 0.6747, 'eval_samples_per_second': 74.103}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.345,1.236779,0.5,0.2213,81.321
2,1.2044,1.091841,0.5,0.2285,78.764
3,1.0514,1.011402,0.5,0.227,79.297
4,0.9543,0.893252,0.5,0.2283,78.856


domiain reviews n_samples_per_class=60, repeat=2, time_elapsed=140.60077905654907, {'eval_loss': 1.2228506803512573, 'eval_accuracy': 0.5, 'eval_runtime': 0.6757, 'eval_samples_per_second': 73.998}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.345,1.236779,0.5,0.2254,79.866
2,1.2044,1.091841,0.5,0.2359,76.314
3,1.0514,1.011402,0.5,0.2218,81.147
4,0.9543,0.893252,0.5,0.2273,79.185


domiain reviews n_samples_per_class=60, repeat=3, time_elapsed=140.61745071411133, {'eval_loss': 1.2228506803512573, 'eval_accuracy': 0.5, 'eval_runtime': 0.7394, 'eval_samples_per_second': 67.623}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.345,1.236779,0.5,0.2235,80.524
2,1.2044,1.091841,0.5,0.2339,76.943
3,1.0514,1.011402,0.5,0.201,89.536
4,0.9543,0.893252,0.5,0.2286,78.728


domiain reviews n_samples_per_class=60, repeat=4, time_elapsed=143.79582905769348, {'eval_loss': 1.2228506803512573, 'eval_accuracy': 0.5, 'eval_runtime': 0.6519, 'eval_samples_per_second': 76.7}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.3482,1.140528,0.5,0.4115,72.906
2,1.0725,1.028953,0.366667,0.4041,74.241
3,0.7694,0.676516,0.866667,0.4038,74.301
4,0.4593,0.400806,0.866667,0.4027,74.498


domiain reviews n_samples_per_class=100, repeat=0, time_elapsed=142.5912914276123, {'eval_loss': 0.6051061749458313, 'eval_accuracy': 0.84, 'eval_runtime': 0.6419, 'eval_samples_per_second': 77.896}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.367,1.191251,0.266667,0.3922,76.492
2,1.0885,1.089029,0.466667,0.3914,76.642
3,0.8336,0.885897,0.6,0.3885,77.226
4,0.5446,0.47846,0.833333,0.3898,76.955


domiain reviews n_samples_per_class=100, repeat=1, time_elapsed=131.44003200531006, {'eval_loss': 0.46796292066574097, 'eval_accuracy': 0.84, 'eval_runtime': 0.7078, 'eval_samples_per_second': 70.639}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.367,1.191251,0.266667,0.3891,77.091
2,1.0885,1.089029,0.466667,0.4128,72.677
3,0.8336,0.885897,0.6,0.401,74.805
4,0.5446,0.47846,0.833333,0.4088,73.384


domiain reviews n_samples_per_class=100, repeat=2, time_elapsed=140.98705053329468, {'eval_loss': 0.46796292066574097, 'eval_accuracy': 0.84, 'eval_runtime': 0.6517, 'eval_samples_per_second': 76.721}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.367,1.191251,0.266667,0.4081,73.508
2,1.0885,1.089029,0.466667,0.4064,73.82
3,0.8336,0.885897,0.6,0.3991,75.16
4,0.5446,0.47846,0.833333,0.4162,72.08


domiain reviews n_samples_per_class=100, repeat=3, time_elapsed=141.0066990852356, {'eval_loss': 0.46796292066574097, 'eval_accuracy': 0.84, 'eval_runtime': 0.6686, 'eval_samples_per_second': 74.786}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.367,1.191251,0.266667,0.3989,75.207
2,1.0885,1.089029,0.466667,0.4125,72.725
3,0.8336,0.885897,0.6,0.4011,74.786
4,0.5446,0.47846,0.833333,0.4127,72.7


domiain reviews n_samples_per_class=100, repeat=4, time_elapsed=140.28080129623413, {'eval_loss': 0.46796292066574097, 'eval_accuracy': 0.84, 'eval_runtime': 0.7148, 'eval_samples_per_second': 69.953}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1087,1.033062,0.433333,0.9142,65.631
2,0.5669,0.469685,0.816667,0.7814,76.788
3,0.265,0.738197,0.766667,0.8005,74.954
4,0.0753,0.609273,0.766667,0.826,72.643


domiain reviews n_samples_per_class=200, repeat=0, time_elapsed=159.67815709114075, {'eval_loss': 0.557488203048706, 'eval_accuracy': 0.8, 'eval_runtime': 0.6354, 'eval_samples_per_second': 78.692}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1273,1.043343,0.433333,0.8189,73.266
2,0.5887,0.454654,0.716667,0.8013,74.878
3,0.2409,1.053047,0.766667,0.7976,75.226
4,0.0615,0.273802,0.95,0.8232,72.883


domiain reviews n_samples_per_class=200, repeat=1, time_elapsed=143.25465083122253, {'eval_loss': 0.8234228491783142, 'eval_accuracy': 0.76, 'eval_runtime': 0.637, 'eval_samples_per_second': 78.499}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1273,1.043343,0.433333,0.8175,73.392
2,0.5887,0.454654,0.716667,0.8008,74.925
3,0.2409,1.053047,0.766667,0.7948,75.487
4,0.0615,0.273802,0.95,0.8298,72.305


domiain reviews n_samples_per_class=200, repeat=2, time_elapsed=144.02732610702515, {'eval_loss': 0.8234228491783142, 'eval_accuracy': 0.76, 'eval_runtime': 0.7338, 'eval_samples_per_second': 68.136}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1273,1.043343,0.433333,0.8146,73.66
2,0.5887,0.454654,0.716667,0.8029,74.725
3,0.2409,1.053047,0.766667,0.8003,74.973
4,0.0615,0.273802,0.95,0.8275,72.505


domiain reviews n_samples_per_class=200, repeat=3, time_elapsed=142.8533284664154, {'eval_loss': 0.8234228491783142, 'eval_accuracy': 0.76, 'eval_runtime': 0.6454, 'eval_samples_per_second': 77.466}



Some weights of the model checkpoint at allegro/herbert-klej-cased-v1 were not used when initializing RobertaForSequenceClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at allegro/herbert-klej-cased-v1 and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream

Epoch,Training Loss,Validation Loss,Accuracy,Runtime,Samples Per Second
1,1.1273,1.043343,0.433333,0.8158,73.545
2,0.5887,0.454654,0.716667,0.7913,75.822
3,0.2409,1.053047,0.766667,0.8023,74.785
4,0.0615,0.273802,0.95,0.8217,73.023


domiain reviews n_samples_per_class=200, repeat=4, time_elapsed=142.56699228286743, {'eval_loss': 0.8234228491783142, 'eval_accuracy': 0.76, 'eval_runtime': 0.6641, 'eval_samples_per_second': 75.286}

