# Prohledávání hyperparametrů pro model BERT TINY nad datasetem TREC (fine) 

Tento notebook slouží k nalezení optimálních hyperparametrů nad datasetem TREC (fine) pro model BERRT TINY. Hyperparametry jsou hledány pro původní i augmentovaný dataset pro normální trénink i destilaci.

K prohledávání je využito knihovny Optuna s algoritmem Hyperband. Nejlepší konfigurace je volena na základě F1-skóre, zkoušeno je 150 kombinací hyperparametrů pro každou z variant.

## Import knihoven a základní nastavení

In [1]:
from transformers import Trainer, BertTokenizer, BertForSequenceClassification
from datasets import load_from_disk
import optuna
import torch
import math
import base

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


Resetování náhodného seedu pro replikovatelnost výsledků.

In [None]:
base.reset_seed()

Ověření dostupnosti GPU.

In [2]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Načtení datasetu a jeho základní předzpracování (tokenizace pomocí učitele).

In [3]:
DATASET = "trec"

In [4]:
train_data = load_from_disk(f"~/data/{DATASET}/train-logits_fine")
eval_data = load_from_disk(f"~/data/{DATASET}/eval-logits_fine")
test_data = load_from_disk(f"~/data/{DATASET}/test-logits_fine")

all_train_data = load_from_disk(f"~/data/{DATASET}/train-logits-augmented_fine")
tokenizer = BertTokenizer.from_pretrained("ndavid/autotrain-trec-fine-bert-739422530")

In [5]:
train = train_data.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length", return_tensors="pt", max_length=300), batched=True, desc="Tokenizing the train dataset")
eval = eval_data.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length", return_tensors="pt", max_length=300), batched=True, desc="Tokenizing the eval dataset")
test = test_data.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length", return_tensors="pt", max_length=300), batched=True, desc="Tokenizing the test dataset")

train_aug = all_train_data.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length", return_tensors="pt", max_length=300), batched=True, desc="Tokenizing the augmented dataset")

Základní konfigurace tréninku během prohledávání. Optuna nepracuje s epochami, ale s kroky. Níže je prováděn přepočet. 

Minimální délka tréninku je pět epochy, maximální 15 epoch. Maximální počet kroků pro warm up je nastaven na 10 % první epochy.

In [6]:
num_epochs = 15
batch_size = 128

In [7]:
data_length = len(train_data)
min_r = math.ceil(data_length/batch_size)*5
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

## Prohledávání s normálním tréninkem nad původním datasetem
Definice hledaných hyperparametrů a jejich rozmezí.

In [8]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 5e-4, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [9]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Získání předtrénovaného modelu. 

In [10]:
def get_Bert():
    return BertForSequenceClassification.from_pretrained("google/bert_uncased_L-2_H-128_A-2", num_labels=50)

In [11]:
base.reset_seed()

Konfigurace jednotlivých tréninků.

In [12]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bert-base_fine_hp-search", logging_dir=f"~/logs/{DATASET}/bert-base_fine_hp-search", epochs=num_epochs, batch_size=batch_size)

Konfigurace trenéra pro jednotlivé tréninky. 

In [13]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_Bert(),
)
  

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Nastavení prohledávání.

In [14]:
best_trial_normal = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Test-base",
    n_trials=150
)

[I 2025-03-26 08:36:11,640] A new study created in memory with name: Test-base


Trial 0 with params: {'learning_rate': 4.3284502212938785e-05, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.805,3.692317,0.183318,0.011379,0.021918,0.00918
2,3.6394,3.555955,0.180568,0.023551,0.021096,0.008109
3,3.5221,3.433531,0.188818,0.043581,0.02349,0.012242
4,3.404,3.322597,0.302475,0.074402,0.057608,0.051521
5,3.3203,3.225941,0.372136,0.069201,0.078112,0.06418
6,3.2222,3.145883,0.396884,0.079409,0.085359,0.067539
7,3.1416,3.074114,0.409716,0.095976,0.089143,0.069005
8,3.0839,3.01259,0.421632,0.093745,0.094901,0.075415
9,3.0242,2.960886,0.428048,0.089753,0.09902,0.07948
10,2.9806,2.919415,0.442713,0.088418,0.105445,0.084439


[I 2025-03-26 08:36:59,929] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010401663679887307, 'weight_decay': 0.001, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.697,3.504216,0.176902,0.003538,0.02,0.006012
2,3.39,3.226867,0.340055,0.070118,0.068843,0.059172
3,3.1517,3.000536,0.411549,0.052382,0.088745,0.063755
4,2.9292,2.80228,0.44088,0.086683,0.103688,0.078788
5,2.7692,2.629229,0.464711,0.104123,0.119232,0.09462


[I 2025-03-26 08:37:24,246] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 1.2551115172973821e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8632,3.816773,0.062328,0.006793,0.027464,0.006354
2,3.7949,3.7577,0.175985,0.028882,0.021404,0.009876
3,3.7493,3.709666,0.188818,0.015876,0.023822,0.011622
4,3.7044,3.670985,0.186984,0.015193,0.023014,0.010905
5,3.678,3.6365,0.185151,0.015594,0.022466,0.010184
6,3.6402,3.606443,0.182401,0.02071,0.021644,0.009055
7,3.6122,3.579803,0.180568,0.019561,0.021096,0.008097
8,3.592,3.556639,0.180568,0.023558,0.021096,0.008119
9,3.5696,3.536597,0.180568,0.023558,0.021096,0.008119
10,3.5543,3.520509,0.180568,0.023558,0.021096,0.008119


[I 2025-03-26 08:38:13,454] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00015958573588141273, 'weight_decay': 0.0, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6605,3.397073,0.176902,0.003538,0.02,0.006012
2,3.2335,3.021934,0.394134,0.096931,0.085111,0.064537
3,2.9033,2.704884,0.454629,0.10289,0.113201,0.089629
4,2.6035,2.448021,0.508708,0.132532,0.146713,0.12268
5,2.3829,2.238276,0.55912,0.230503,0.188159,0.173665


[I 2025-03-26 08:38:36,872] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.00025959425503112657, 'weight_decay': 0.002, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5129,3.156248,0.316224,0.069182,0.061776,0.049671
2,2.9322,2.65396,0.444546,0.062332,0.107967,0.077256
3,2.4867,2.264197,0.534372,0.192541,0.166706,0.15133
4,2.1136,1.976207,0.614115,0.248543,0.239376,0.221089
5,1.8409,1.759768,0.683776,0.317086,0.295897,0.280459
6,1.5929,1.593756,0.703941,0.34748,0.328189,0.313501
7,1.4026,1.480919,0.714024,0.345746,0.334254,0.317279
8,1.276,1.403965,0.724106,0.354945,0.363755,0.3399
9,1.1618,1.346276,0.72594,0.338757,0.362909,0.34074
10,1.0626,1.304305,0.745188,0.413861,0.399242,0.380763


[I 2025-03-26 08:39:49,541] Trial 4 finished with value: 0.39062566689300937 and parameters: {'learning_rate': 0.00025959425503112657, 'weight_decay': 0.002, 'warmup_steps': 0}. Best is trial 4 with value: 0.39062566689300937.


Trial 5 with params: {'learning_rate': 2.049268011541735e-05, 'weight_decay': 0.003, 'warmup_steps': 2}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8453,3.777779,0.160403,0.009358,0.019167,0.008802
2,3.746,3.694876,0.183318,0.011392,0.021918,0.009199
3,3.6828,3.630813,0.180568,0.019561,0.021096,0.008097
4,3.6196,3.575531,0.185151,0.021577,0.022466,0.010407
5,3.5783,3.519766,0.180568,0.023564,0.021096,0.008128


[I 2025-03-26 08:40:12,936] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 5.4182823195332406e-05, 'weight_decay': 0.003, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7876,3.65878,0.180568,0.019554,0.021096,0.008087
2,3.5945,3.494749,0.179652,0.023548,0.020822,0.007605
3,3.4517,3.346353,0.254812,0.073961,0.042836,0.037371
4,3.3098,3.215431,0.373969,0.069892,0.078417,0.064052
5,3.2095,3.104284,0.402383,0.076904,0.086331,0.066267
6,3.0933,3.007497,0.417965,0.093672,0.093164,0.073219
7,2.9988,2.925731,0.43538,0.089568,0.101749,0.081346
8,2.932,2.855522,0.448213,0.086628,0.107758,0.085167
9,2.8618,2.796388,0.450962,0.104554,0.109483,0.084949
10,2.8104,2.747743,0.463795,0.10385,0.1176,0.093526


[I 2025-03-26 08:41:24,347] Trial 6 finished with value: 0.10158803772931421 and parameters: {'learning_rate': 5.4182823195332406e-05, 'weight_decay': 0.003, 'warmup_steps': 3}. Best is trial 4 with value: 0.39062566689300937.


Trial 7 with params: {'learning_rate': 1.7258215396625005e-05, 'weight_decay': 0.003, 'warmup_steps': 1}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8509,3.792056,0.127406,0.009348,0.035343,0.008686
2,3.7637,3.71716,0.185151,0.017794,0.022906,0.010869
3,3.7074,3.660883,0.183318,0.01436,0.021918,0.009344
4,3.6524,3.612828,0.186068,0.018972,0.02274,0.010735
5,3.6175,3.565846,0.180568,0.019567,0.021096,0.008106


[I 2025-03-26 08:41:47,816] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 5.954553793888986e-05, 'weight_decay': 0.008, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7656,3.631785,0.177819,0.023541,0.020274,0.006558
2,3.5628,3.456809,0.178735,0.023545,0.020548,0.007089
3,3.4105,3.29886,0.313474,0.072143,0.060396,0.053038
4,3.2594,3.161469,0.395967,0.078367,0.085144,0.067003
5,3.1531,3.043002,0.411549,0.095497,0.089913,0.069143


[I 2025-03-26 08:42:11,791] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 7.475992999956501e-05, 'weight_decay': 0.006, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7405,3.586979,0.176902,0.003538,0.02,0.006012
2,3.5019,3.376416,0.1989,0.058621,0.026665,0.017604
3,3.3175,3.189541,0.373969,0.081765,0.078543,0.062526
4,3.1393,3.029068,0.415215,0.09496,0.091252,0.070746
5,3.012,2.890028,0.439963,0.087973,0.105263,0.084788
6,2.8682,2.771655,0.453712,0.084862,0.111823,0.086796
7,2.7533,2.67362,0.472044,0.105033,0.12191,0.098896
8,2.6736,2.588367,0.48396,0.103412,0.129572,0.103953
9,2.5842,2.51591,0.494959,0.145224,0.135276,0.110973
10,2.5168,2.459497,0.511457,0.161099,0.146057,0.12207


[I 2025-03-26 08:43:22,770] Trial 9 finished with value: 0.16718522010599593 and parameters: {'learning_rate': 7.475992999956501e-05, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 4 with value: 0.39062566689300937.


Trial 10 with params: {'learning_rate': 0.0004587604755149822, 'weight_decay': 0.002, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3349,2.821038,0.428048,0.069634,0.101353,0.078215
2,2.5098,2.169802,0.555454,0.190087,0.192818,0.174815
3,1.9234,1.725812,0.656279,0.263669,0.27957,0.258465
4,1.5014,1.474951,0.700275,0.321681,0.32126,0.299878
5,1.2106,1.332409,0.730522,0.347057,0.371468,0.34301
6,1.0017,1.205365,0.747021,0.418712,0.389768,0.378645
7,0.828,1.154799,0.752521,0.463703,0.421152,0.413087
8,0.7189,1.116093,0.764436,0.489412,0.453165,0.446063
9,0.6185,1.082686,0.767186,0.486941,0.469112,0.463983
10,0.5437,1.058945,0.775435,0.491266,0.479622,0.47502


[I 2025-03-26 08:44:09,774] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.00023012528778943483, 'weight_decay': 0.006, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5439,3.215007,0.270394,0.072445,0.047599,0.040656
2,3.0081,2.751146,0.433547,0.063026,0.103347,0.07561
3,2.5972,2.381151,0.495875,0.162543,0.133952,0.112757
4,2.2449,2.097574,0.582951,0.218672,0.203234,0.187191
5,1.9857,1.881629,0.64528,0.274573,0.263001,0.247162
6,1.7388,1.712232,0.687443,0.326792,0.298558,0.283174
7,1.5496,1.589851,0.698442,0.332245,0.307171,0.292481
8,1.4226,1.503094,0.710357,0.346433,0.339708,0.318357
9,1.3028,1.437114,0.714024,0.336704,0.344451,0.326372
10,1.2024,1.388275,0.725023,0.365034,0.365892,0.349671


[I 2025-03-26 08:45:21,547] Trial 11 finished with value: 0.38455838326622244 and parameters: {'learning_rate': 0.00023012528778943483, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 4 with value: 0.39062566689300937.


Trial 12 with params: {'learning_rate': 0.00035174585398257074, 'weight_decay': 0.007, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4225,2.986763,0.399633,0.05582,0.085718,0.063883
2,2.7162,2.399333,0.497709,0.165023,0.138835,0.117694
3,2.1902,1.967116,0.603116,0.27258,0.23914,0.230799
4,1.7815,1.696654,0.687443,0.328724,0.316743,0.295154
5,1.4931,1.499748,0.719523,0.352712,0.356342,0.336956
6,1.2602,1.358392,0.72319,0.364575,0.361462,0.342009
7,1.0773,1.279357,0.729606,0.394259,0.37053,0.355592
8,0.958,1.234847,0.751604,0.394353,0.409671,0.38609
9,0.8548,1.180493,0.753437,0.414718,0.421411,0.406487
10,0.7627,1.151671,0.754354,0.461944,0.414382,0.409505


[I 2025-03-26 08:46:08,796] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.00021976631986270965, 'weight_decay': 0.005, 'warmup_steps': 2}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5839,3.259404,0.209899,0.037364,0.029397,0.020848
2,3.0554,2.802318,0.428048,0.062495,0.100202,0.073642
3,2.6502,2.430311,0.494042,0.125047,0.136272,0.113491
4,2.3008,2.145451,0.572869,0.205328,0.195434,0.178796
5,2.042,1.926934,0.629698,0.284142,0.245843,0.232361
6,1.7939,1.753069,0.681943,0.344908,0.298265,0.290378
7,1.6038,1.630626,0.696609,0.365033,0.314898,0.306895
8,1.4769,1.540251,0.707608,0.340133,0.332964,0.314228
9,1.3532,1.469929,0.714024,0.345485,0.342928,0.325118
10,1.2527,1.419049,0.721357,0.342666,0.356833,0.337043


[I 2025-03-26 08:47:20,275] Trial 13 finished with value: 0.36319399646392175 and parameters: {'learning_rate': 0.00021976631986270965, 'weight_decay': 0.005, 'warmup_steps': 2}. Best is trial 4 with value: 0.39062566689300937.


Trial 14 with params: {'learning_rate': 0.00031411022790590827, 'weight_decay': 0.01, 'warmup_steps': 2}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4975,3.092148,0.345555,0.067749,0.071066,0.054419
2,2.829,2.520367,0.489459,0.107759,0.134841,0.110561
3,2.3195,2.087965,0.586618,0.279339,0.221655,0.213006
4,1.9124,1.801026,0.666361,0.32248,0.291496,0.276884
5,1.6196,1.592015,0.707608,0.369763,0.342785,0.326873
6,1.3735,1.430977,0.72044,0.346953,0.353026,0.334875
7,1.1838,1.340407,0.725023,0.34871,0.358275,0.338836
8,1.0617,1.292037,0.744271,0.400734,0.398372,0.381591
9,0.9546,1.235971,0.745188,0.42731,0.409346,0.399514
10,0.8591,1.207509,0.753437,0.44166,0.425075,0.4151


[I 2025-03-26 08:48:08,484] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.00016615243906338922, 'weight_decay': 0.0, 'warmup_steps': 2}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6399,3.36884,0.176902,0.003538,0.02,0.006012
2,3.2022,2.98619,0.406966,0.094691,0.088316,0.065965
3,2.8659,2.665046,0.451879,0.102819,0.11154,0.084569
4,2.5614,2.404691,0.51604,0.148468,0.15019,0.128091
5,2.3381,2.195745,0.572869,0.252454,0.194917,0.1815
6,2.1122,2.031802,0.600367,0.248059,0.222327,0.2081
7,1.9387,1.905526,0.644363,0.338859,0.266221,0.259333
8,1.8171,1.805561,0.673694,0.331242,0.291218,0.278835
9,1.692,1.718866,0.68561,0.337082,0.298281,0.287507
10,1.5931,1.656978,0.695692,0.373749,0.323883,0.314621


[I 2025-03-26 08:49:21,926] Trial 15 finished with value: 0.325457686379596 and parameters: {'learning_rate': 0.00016615243906338922, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 4 with value: 0.39062566689300937.


Trial 16 with params: {'learning_rate': 0.00017787180744793134, 'weight_decay': 0.004, 'warmup_steps': 1}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6195,3.337117,0.176902,0.003538,0.02,0.006012
2,3.1634,2.941094,0.409716,0.072295,0.090012,0.067169
3,2.8132,2.607853,0.461045,0.103236,0.117867,0.091296
4,2.4981,2.341437,0.537122,0.164627,0.167441,0.150337
5,2.2677,2.130326,0.583868,0.255458,0.206034,0.192716
6,2.0362,1.964589,0.615949,0.259729,0.241063,0.22839
7,1.8588,1.838026,0.655362,0.34269,0.273758,0.265795
8,1.7351,1.738304,0.679193,0.345037,0.296771,0.281808
9,1.6089,1.654349,0.688359,0.354263,0.314361,0.305677
10,1.5093,1.594637,0.704858,0.357112,0.328516,0.313995


[I 2025-03-26 08:50:09,394] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.00023041229790746586, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5915,3.256339,0.203483,0.038389,0.027443,0.018431
2,3.0442,2.780378,0.43538,0.10165,0.105751,0.080121
3,2.6203,2.396228,0.502291,0.142005,0.142039,0.119971
4,2.2591,2.102841,0.579285,0.244787,0.205743,0.1931
5,1.9907,1.878407,0.64528,0.322377,0.27161,0.260729
6,1.7377,1.701202,0.688359,0.382197,0.310049,0.304831
7,1.5433,1.57867,0.699358,0.35666,0.318256,0.309572
8,1.4161,1.492959,0.719523,0.362282,0.352316,0.332748
9,1.2926,1.42542,0.718607,0.360454,0.355252,0.33983
10,1.1909,1.377159,0.729606,0.385118,0.374261,0.360199


[I 2025-03-26 08:51:23,068] Trial 17 finished with value: 0.38638693150760745 and parameters: {'learning_rate': 0.00023041229790746586, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 4 with value: 0.39062566689300937.


Trial 18 with params: {'learning_rate': 0.00018354219503651724, 'weight_decay': 0.007, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6357,3.344867,0.176902,0.003538,0.02,0.006012
2,3.1642,2.933316,0.411549,0.093006,0.091519,0.07024
3,2.7998,2.587908,0.469294,0.103965,0.121768,0.097516
4,2.4749,2.315372,0.544455,0.206883,0.172583,0.157218
5,2.2365,2.099953,0.593034,0.29598,0.217237,0.206868
6,1.9995,1.928801,0.622365,0.261186,0.247875,0.233943
7,1.8175,1.800434,0.660862,0.36498,0.279537,0.273468
8,1.6909,1.700317,0.691109,0.374723,0.322083,0.313388
9,1.5622,1.616367,0.692026,0.357864,0.319996,0.311361
10,1.4617,1.557776,0.707608,0.360214,0.334644,0.319948


[I 2025-03-26 08:52:34,191] Trial 18 finished with value: 0.3455949344657317 and parameters: {'learning_rate': 0.00018354219503651724, 'weight_decay': 0.007, 'warmup_steps': 4}. Best is trial 4 with value: 0.39062566689300937.


Trial 19 with params: {'learning_rate': 4.803338746667814e-05, 'weight_decay': 0.006, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7999,3.679646,0.182401,0.014483,0.021644,0.008922
2,3.6216,3.531225,0.179652,0.023548,0.020822,0.007605
3,3.4927,3.396883,0.20165,0.063631,0.027272,0.01813
4,3.3637,3.275743,0.333639,0.070204,0.066654,0.057369
5,3.2719,3.172733,0.392301,0.079549,0.083838,0.066762


[I 2025-03-26 08:52:58,266] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0003701999625244894, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4757,3.024232,0.371219,0.062391,0.078619,0.059372
2,2.7288,2.395754,0.502291,0.14511,0.147056,0.126295
3,2.1693,1.933851,0.617782,0.288525,0.245669,0.234517
4,1.7414,1.659887,0.695692,0.308797,0.314316,0.296645
5,1.4406,1.466697,0.726856,0.367286,0.364935,0.343963
6,1.2027,1.323581,0.733272,0.39356,0.376397,0.359972
7,1.0188,1.25093,0.731439,0.392582,0.378996,0.367354
8,0.9021,1.211282,0.759853,0.473224,0.435392,0.426793
9,0.7997,1.157365,0.75802,0.465545,0.441152,0.436297
10,0.7111,1.140239,0.76352,0.493457,0.458345,0.458914


[I 2025-03-26 08:54:09,772] Trial 20 finished with value: 0.48649600533906434 and parameters: {'learning_rate': 0.0003701999625244894, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 20 with value: 0.48649600533906434.


Trial 21 with params: {'learning_rate': 0.0004794538719449015, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4017,2.873354,0.416132,0.073138,0.09507,0.07387
2,2.5402,2.183897,0.557287,0.204342,0.201261,0.189003
3,1.9186,1.71781,0.661778,0.323261,0.295682,0.278947
4,1.4741,1.454987,0.703941,0.320664,0.327614,0.309269
5,1.1788,1.326247,0.736939,0.422215,0.394329,0.374891
6,0.9588,1.210208,0.745188,0.423347,0.403472,0.390249
7,0.7923,1.168362,0.749771,0.471726,0.432778,0.427525
8,0.6877,1.129672,0.761687,0.485004,0.465576,0.455702
9,0.5848,1.085122,0.761687,0.49154,0.470577,0.466198
10,0.5089,1.08115,0.766269,0.494347,0.479605,0.473905


[I 2025-03-26 08:55:21,870] Trial 21 finished with value: 0.4941557351757956 and parameters: {'learning_rate': 0.0004794538719449015, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 22 with params: {'learning_rate': 0.0004866184455144315, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3971,2.864541,0.419798,0.07308,0.097244,0.075822
2,2.5288,2.171468,0.558203,0.204267,0.200698,0.188367
3,1.9032,1.704066,0.661778,0.319037,0.296873,0.280115
4,1.4585,1.444144,0.703941,0.318737,0.322993,0.304732
5,1.1634,1.317177,0.737855,0.423678,0.396147,0.37862
6,0.9439,1.20449,0.747021,0.425028,0.397842,0.387553
7,0.7796,1.164753,0.747938,0.470434,0.435223,0.430689
8,0.6763,1.123661,0.759853,0.484826,0.463455,0.455272
9,0.5737,1.080023,0.762603,0.48804,0.470815,0.465763
10,0.4988,1.078047,0.767186,0.501555,0.479278,0.476078


[I 2025-03-26 08:56:36,557] Trial 22 finished with value: 0.4936970922783857 and parameters: {'learning_rate': 0.0004866184455144315, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 23 with params: {'learning_rate': 0.00028990806473082564, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5369,3.148067,0.308891,0.071816,0.059392,0.048645
2,2.8967,2.59427,0.452796,0.106064,0.112829,0.082733
3,2.4045,2.168176,0.562786,0.224142,0.189234,0.176338
4,2.0043,1.872677,0.647113,0.295134,0.27246,0.25846
5,1.7135,1.656423,0.705775,0.347762,0.334801,0.32065
6,1.4647,1.489464,0.71494,0.347344,0.343847,0.328378
7,1.2692,1.391256,0.721357,0.343236,0.341099,0.325618
8,1.1459,1.32951,0.732356,0.359862,0.379743,0.35626
9,1.035,1.275305,0.741522,0.373438,0.387411,0.367759
10,0.9385,1.241952,0.75527,0.424962,0.41385,0.400787


[I 2025-03-26 08:57:23,331] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.00048502028060946255, 'weight_decay': 0.008, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3851,2.862287,0.416132,0.070653,0.096518,0.074809
2,2.5277,2.179507,0.545371,0.228923,0.185154,0.175236
3,1.912,1.704316,0.668194,0.31698,0.294197,0.276489
4,1.4677,1.445502,0.698442,0.309195,0.319427,0.297959
5,1.1645,1.310141,0.727773,0.3592,0.372687,0.346193
6,0.9477,1.192653,0.749771,0.448124,0.413333,0.402683
7,0.7794,1.145782,0.75802,0.47312,0.446682,0.437179
8,0.6748,1.109141,0.762603,0.473019,0.461929,0.451493
9,0.5749,1.059847,0.768103,0.48112,0.475141,0.468308
10,0.5004,1.059043,0.772686,0.48553,0.486628,0.476271


[I 2025-03-26 08:58:36,786] Trial 24 finished with value: 0.4796380766961073 and parameters: {'learning_rate': 0.00048502028060946255, 'weight_decay': 0.008, 'warmup_steps': 3}. Best is trial 21 with value: 0.4941557351757956.


Trial 25 with params: {'learning_rate': 0.00041570382804127483, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4316,2.948727,0.380385,0.075102,0.082885,0.062711
2,2.642,2.302356,0.523373,0.198313,0.166124,0.151347
3,2.062,1.83395,0.64528,0.316163,0.269319,0.25415
4,1.6261,1.573059,0.694775,0.317127,0.329854,0.30636
5,1.3234,1.409988,0.721357,0.348304,0.359916,0.332752
6,1.0944,1.261513,0.735105,0.389522,0.373305,0.355287
7,0.9157,1.198389,0.745188,0.418166,0.411108,0.395521
8,0.804,1.157405,0.756187,0.461649,0.442866,0.433461
9,0.6996,1.11349,0.765353,0.48612,0.465579,0.45855
10,0.6179,1.107653,0.767186,0.490361,0.481165,0.470691


[I 2025-03-26 08:59:50,138] Trial 25 finished with value: 0.47479777344692564 and parameters: {'learning_rate': 0.00041570382804127483, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 21 with value: 0.4941557351757956.


Trial 26 with params: {'learning_rate': 0.00037532124067673404, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4722,3.016555,0.373052,0.061365,0.079073,0.059453
2,2.7189,2.38431,0.503208,0.144189,0.148065,0.127263
3,2.156,1.921165,0.624198,0.318474,0.253591,0.244643
4,1.7266,1.647821,0.695692,0.306633,0.314192,0.29568
5,1.4252,1.458171,0.726856,0.361896,0.361299,0.338875
6,1.188,1.315199,0.735105,0.394382,0.37745,0.360797
7,1.005,1.244647,0.735105,0.407044,0.391973,0.381558
8,0.8889,1.204748,0.761687,0.475938,0.439963,0.430838
9,0.7864,1.150949,0.759853,0.471533,0.449864,0.446536
10,0.6991,1.136135,0.76352,0.492336,0.46313,0.46032


[I 2025-03-26 09:01:04,341] Trial 26 finished with value: 0.48598931076512913 and parameters: {'learning_rate': 0.00037532124067673404, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 27 with params: {'learning_rate': 0.0004960625705146884, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3911,2.853414,0.419798,0.069594,0.097244,0.07504
2,2.5146,2.155805,0.56187,0.2049,0.20458,0.19148
3,1.8835,1.68571,0.663611,0.315538,0.296604,0.279487
4,1.4383,1.430553,0.704858,0.339746,0.325688,0.308871
5,1.143,1.306513,0.737855,0.402448,0.394462,0.37323
6,0.9236,1.198707,0.746104,0.425949,0.397978,0.388294
7,0.7625,1.160062,0.748854,0.466242,0.441683,0.434849
8,0.6609,1.116987,0.756187,0.483877,0.468307,0.460304
9,0.5586,1.07305,0.765353,0.497115,0.475613,0.4716
10,0.4849,1.073508,0.767186,0.499375,0.475766,0.473236


[I 2025-03-26 09:02:19,101] Trial 27 finished with value: 0.4823961180430102 and parameters: {'learning_rate': 0.0004960625705146884, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 28 with params: {'learning_rate': 0.00040721180681103567, 'weight_decay': 0.007, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4504,2.969739,0.384968,0.058414,0.082277,0.060855
2,2.6614,2.316953,0.517874,0.194266,0.157886,0.14178
3,2.0768,1.84799,0.649863,0.323432,0.273491,0.262515
4,1.6406,1.577158,0.695692,0.327643,0.329129,0.3094
5,1.3369,1.411075,0.724106,0.357032,0.36541,0.341878
6,1.1057,1.269354,0.733272,0.386766,0.378184,0.359508
7,0.928,1.211988,0.743355,0.43654,0.406334,0.393124
8,0.8155,1.169317,0.762603,0.470987,0.449307,0.439337
9,0.7134,1.120841,0.764436,0.483231,0.46694,0.460238
10,0.6314,1.114129,0.768103,0.488932,0.480439,0.472951


[I 2025-03-26 09:03:36,909] Trial 28 finished with value: 0.4825436951247559 and parameters: {'learning_rate': 0.00040721180681103567, 'weight_decay': 0.007, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 29 with params: {'learning_rate': 9.99180549137411e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7231,3.534925,0.176902,0.003538,0.02,0.006012
2,3.4216,3.261812,0.306141,0.071986,0.058671,0.050425
3,3.1859,3.035987,0.406966,0.074389,0.087849,0.064949
4,2.9654,2.838049,0.439963,0.087388,0.104043,0.081393
5,2.8061,2.666983,0.466544,0.104777,0.119899,0.096405
6,2.6314,2.526194,0.493126,0.123551,0.135164,0.110791
7,2.4921,2.414025,0.506874,0.152944,0.146193,0.127303
8,2.3968,2.318798,0.557287,0.208818,0.186126,0.172247
9,2.2933,2.238494,0.573786,0.270427,0.203152,0.192653
10,2.2106,2.176896,0.588451,0.285,0.215508,0.206218


[I 2025-03-26 09:04:25,181] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.00015706256872557984, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6632,3.402663,0.176902,0.003538,0.02,0.006012
2,3.241,3.031281,0.393217,0.097589,0.084896,0.064527
3,2.9144,2.717604,0.453712,0.103198,0.112746,0.089174
4,2.6175,2.462946,0.505958,0.130622,0.144588,0.119452
5,2.3989,2.253691,0.555454,0.228113,0.185628,0.170914
6,2.1771,2.089202,0.591201,0.246839,0.212768,0.199471
7,2.0045,1.961069,0.615032,0.298154,0.247604,0.240718
8,1.8835,1.86001,0.661778,0.329349,0.282757,0.272667
9,1.7582,1.771767,0.675527,0.342564,0.293282,0.284599
10,1.6607,1.708465,0.689276,0.355629,0.313888,0.304813


[I 2025-03-26 09:05:12,759] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0003755942696637551, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.472,3.01623,0.373052,0.061365,0.079073,0.059453
2,2.7185,2.383773,0.503208,0.144189,0.148065,0.127263
3,2.1554,1.920595,0.625115,0.318522,0.253865,0.244784
4,1.7258,1.647218,0.695692,0.306883,0.314192,0.296031
5,1.4243,1.457563,0.726856,0.361244,0.361299,0.338285
6,1.1872,1.31478,0.735105,0.394382,0.37745,0.360797
7,1.0043,1.24431,0.736022,0.407595,0.392461,0.382063
8,0.8882,1.204416,0.761687,0.475938,0.439963,0.430838
9,0.7857,1.150665,0.759853,0.4702,0.449864,0.446685
10,0.6984,1.135947,0.76352,0.492336,0.46313,0.46032


[I 2025-03-26 09:05:59,629] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.00042882561463163685, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4358,2.939557,0.394134,0.074932,0.085413,0.063976
2,2.6251,2.277167,0.52429,0.179516,0.161961,0.145536
3,2.0271,1.803301,0.647113,0.316898,0.27799,0.262567
4,1.588,1.536119,0.693859,0.322622,0.328277,0.309373
5,1.2844,1.384767,0.72594,0.374839,0.374681,0.352117
6,1.0573,1.246436,0.738772,0.406911,0.384094,0.366665
7,0.8841,1.194887,0.751604,0.471143,0.430302,0.422675
8,0.7739,1.152072,0.764436,0.473026,0.456682,0.4465
9,0.6716,1.105621,0.764436,0.486148,0.471983,0.464751
10,0.5907,1.097707,0.767186,0.485184,0.48164,0.473093


[I 2025-03-26 09:07:11,054] Trial 32 finished with value: 0.48574138414339885 and parameters: {'learning_rate': 0.00042882561463163685, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 33 with params: {'learning_rate': 0.000497203887698053, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3904,2.852013,0.419798,0.069745,0.097244,0.075134
2,2.5132,2.154164,0.56187,0.204389,0.20482,0.191392
3,1.8812,1.683223,0.665445,0.313179,0.297657,0.280047
4,1.4358,1.428625,0.705775,0.340483,0.326597,0.309628
5,1.1404,1.305566,0.737855,0.402892,0.394081,0.373331
6,0.9208,1.197471,0.746104,0.427553,0.397978,0.388596
7,0.7603,1.15891,0.749771,0.487525,0.444709,0.440338
8,0.6589,1.115855,0.757104,0.484306,0.468522,0.460678
9,0.5568,1.071169,0.766269,0.498161,0.478113,0.474863
10,0.4833,1.072338,0.766269,0.49843,0.474072,0.471928


[I 2025-03-26 09:08:24,877] Trial 33 finished with value: 0.4866972638398252 and parameters: {'learning_rate': 0.000497203887698053, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 34 with params: {'learning_rate': 0.000377653012761153, 'weight_decay': 0.006, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4706,3.013122,0.372136,0.060973,0.078857,0.058932
2,2.7146,2.379199,0.503208,0.144189,0.148065,0.127263
3,2.15,1.915567,0.626948,0.318689,0.254755,0.246067
4,1.72,1.642492,0.695692,0.306633,0.314192,0.29568
5,1.4183,1.454388,0.727773,0.36231,0.361753,0.339144
6,1.1815,1.311539,0.734189,0.382404,0.369946,0.352389
7,0.999,1.241972,0.734189,0.412353,0.391735,0.381127
8,0.8831,1.201893,0.762603,0.478201,0.44282,0.434151
9,0.7805,1.148367,0.758937,0.477873,0.449626,0.445386
10,0.6939,1.134562,0.766269,0.491358,0.46826,0.464743


[I 2025-03-26 09:09:37,612] Trial 34 finished with value: 0.48568880571462403 and parameters: {'learning_rate': 0.000377653012761153, 'weight_decay': 0.006, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 35 with params: {'learning_rate': 0.00038017331791589156, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4688,3.009405,0.373052,0.060485,0.079073,0.058886
2,2.7099,2.373741,0.505958,0.14954,0.149314,0.129285
3,2.1436,1.909665,0.633364,0.320922,0.258587,0.250355
4,1.713,1.636817,0.697525,0.307312,0.317397,0.297982
5,1.411,1.450274,0.726856,0.362066,0.36165,0.338941
6,1.1745,1.307767,0.734189,0.388912,0.374824,0.356145
7,0.9925,1.239114,0.736939,0.414886,0.393771,0.383449
8,0.8769,1.198983,0.762603,0.478279,0.446717,0.438784
9,0.7742,1.145741,0.758937,0.478254,0.449626,0.445577
10,0.6881,1.132788,0.767186,0.496584,0.469688,0.466947


[I 2025-03-26 09:10:26,994] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 1.0625556226593494e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8659,3.824595,0.047663,0.012284,0.026678,0.006903
2,3.8064,3.774522,0.15857,0.008461,0.01914,0.009051
3,3.7673,3.730125,0.186984,0.023561,0.023454,0.011483
4,3.7264,3.695626,0.186068,0.016315,0.023,0.010848
5,3.7032,3.665717,0.186984,0.015833,0.023014,0.010952
6,3.6699,3.639775,0.183318,0.01436,0.021918,0.009344
7,3.6459,3.616942,0.183318,0.01913,0.021918,0.009479
8,3.6286,3.597194,0.180568,0.019561,0.021096,0.008097
9,3.6093,3.580069,0.180568,0.019561,0.021096,0.008097
10,3.5957,3.566005,0.180568,0.019561,0.021096,0.008097


[I 2025-03-26 09:11:16,250] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0004856127413147924, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3978,2.865802,0.419798,0.072784,0.097244,0.075681
2,2.5303,2.173166,0.558203,0.204267,0.200698,0.188367
3,1.9054,1.706191,0.660862,0.318084,0.295444,0.278348
4,1.4606,1.445722,0.703941,0.318737,0.322993,0.304732
5,1.1655,1.318094,0.737855,0.423368,0.396147,0.37848
6,0.946,1.205121,0.747021,0.42538,0.397842,0.387102
7,0.7814,1.165099,0.747938,0.470808,0.43264,0.427022
8,0.678,1.12442,0.76077,0.485034,0.463909,0.455596
9,0.5753,1.080755,0.762603,0.489714,0.470815,0.466947
10,0.5002,1.078417,0.766269,0.500952,0.47879,0.475556


[I 2025-03-26 09:12:32,107] Trial 37 finished with value: 0.4939308180319757 and parameters: {'learning_rate': 0.0004856127413147924, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 38 with params: {'learning_rate': 0.00045882704483516664, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4154,2.899936,0.406966,0.074292,0.089886,0.068308
2,2.5742,2.220956,0.544455,0.194986,0.180959,0.168632
3,1.9624,1.75234,0.656279,0.323885,0.290575,0.273818
4,1.5203,1.486891,0.699358,0.314936,0.321141,0.302079
5,1.221,1.351937,0.733272,0.407336,0.397178,0.377404
6,0.9982,1.224952,0.744271,0.434515,0.399112,0.387894
7,0.8282,1.179716,0.748854,0.459363,0.431624,0.423113
8,0.7198,1.140464,0.759853,0.474774,0.460408,0.448408
9,0.6178,1.090718,0.761687,0.475426,0.470355,0.460906
10,0.5399,1.084871,0.765353,0.498143,0.477034,0.472714


[I 2025-03-26 09:13:50,777] Trial 38 finished with value: 0.48364209495694566 and parameters: {'learning_rate': 0.00045882704483516664, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 39 with params: {'learning_rate': 1.1310667716871232e-05, 'weight_decay': 0.002, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8667,3.823488,0.050412,0.012808,0.02734,0.007485
2,3.8037,3.769873,0.164986,0.009308,0.020036,0.009466
3,3.7619,3.723614,0.187901,0.03661,0.023728,0.011831
4,3.7192,3.687466,0.186984,0.012435,0.023014,0.010641
5,3.6947,3.655786,0.186984,0.016545,0.023014,0.010996
6,3.6597,3.628323,0.183318,0.016311,0.021918,0.009412
7,3.6342,3.603974,0.181485,0.020231,0.02137,0.008582
8,3.6157,3.582831,0.180568,0.019561,0.021096,0.008097
9,3.5952,3.564559,0.180568,0.019561,0.021096,0.008097
10,3.5808,3.549675,0.180568,0.023558,0.021096,0.008119


[I 2025-03-26 09:14:41,070] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.000493809491819338, 'weight_decay': 0.004, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3925,2.856036,0.419798,0.069487,0.097244,0.074954
2,2.518,2.159417,0.560037,0.204458,0.20415,0.191075
3,1.8881,1.690125,0.663611,0.318071,0.29881,0.282187
4,1.4432,1.433793,0.705775,0.340223,0.325755,0.309065
5,1.148,1.309021,0.737855,0.403164,0.394174,0.373489
6,0.9283,1.200567,0.745188,0.424878,0.397524,0.387537
7,0.7664,1.161449,0.748854,0.466242,0.441683,0.434849
8,0.6644,1.118575,0.757104,0.484411,0.468762,0.460839
9,0.562,1.074578,0.766269,0.497215,0.476068,0.471717
10,0.488,1.074532,0.766269,0.498869,0.475278,0.472714


[I 2025-03-26 09:15:56,558] Trial 40 finished with value: 0.4829726237776648 and parameters: {'learning_rate': 0.000493809491819338, 'weight_decay': 0.004, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 41 with params: {'learning_rate': 1.2431112024586663e-05, 'weight_decay': 0.0, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8598,3.813889,0.074244,0.008795,0.029162,0.007685
2,3.7927,3.755856,0.176902,0.028741,0.021507,0.009844
3,3.7481,3.708866,0.188818,0.015876,0.023822,0.011622
4,3.7039,3.670822,0.186984,0.014621,0.023014,0.010863
5,3.6781,3.636848,0.186984,0.016542,0.023014,0.010992
6,3.6407,3.607296,0.182401,0.02071,0.021644,0.009055
7,3.6133,3.581077,0.180568,0.019561,0.021096,0.008097
8,3.5934,3.558298,0.180568,0.019561,0.021096,0.008097
9,3.5713,3.538527,0.180568,0.023558,0.021096,0.008119
10,3.5562,3.522637,0.180568,0.023558,0.021096,0.008119


[I 2025-03-26 09:16:45,271] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0004528082032294011, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4195,2.907979,0.405133,0.074402,0.089026,0.067227
2,2.5848,2.232486,0.535289,0.187896,0.170344,0.155217
3,1.9753,1.762247,0.653529,0.317078,0.2855,0.269344
4,1.5344,1.497237,0.699358,0.322124,0.330074,0.311041
5,1.2336,1.360037,0.732356,0.409573,0.393214,0.375221
6,1.0104,1.228489,0.744271,0.427725,0.392761,0.377565
7,0.8395,1.182731,0.751604,0.468478,0.434866,0.427781
8,0.7302,1.141786,0.761687,0.477573,0.459777,0.449713
9,0.6282,1.092634,0.76352,0.472719,0.470984,0.460933
10,0.5498,1.086638,0.76352,0.488795,0.475001,0.468406


[I 2025-03-26 09:17:36,973] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.00048128243053382715, 'weight_decay': 0.007, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3876,2.866922,0.416132,0.071164,0.096518,0.075011
2,2.5337,2.185558,0.537122,0.203207,0.172807,0.158928
3,1.9194,1.709703,0.667278,0.315413,0.293742,0.275473
4,1.4751,1.451049,0.699358,0.309178,0.321341,0.29896
5,1.1714,1.313849,0.72594,0.374731,0.371077,0.346349
6,0.9541,1.194206,0.748854,0.441568,0.407437,0.396169
7,0.7853,1.148916,0.757104,0.472657,0.445426,0.437271
8,0.6801,1.112455,0.759853,0.470093,0.460409,0.449332
9,0.5801,1.062226,0.769019,0.483599,0.47729,0.470874
10,0.5055,1.06242,0.772686,0.486347,0.48722,0.477122


[I 2025-03-26 09:18:50,992] Trial 43 finished with value: 0.48009534112680624 and parameters: {'learning_rate': 0.00048128243053382715, 'weight_decay': 0.007, 'warmup_steps': 3}. Best is trial 21 with value: 0.4941557351757956.


Trial 44 with params: {'learning_rate': 0.0004988643633335212, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3894,2.850171,0.419798,0.069605,0.097244,0.075081
2,2.5109,2.151657,0.56187,0.204372,0.20482,0.191388
3,1.878,1.680018,0.667278,0.313959,0.298973,0.281214
4,1.4322,1.426125,0.706691,0.342357,0.32872,0.311683
5,1.1364,1.303505,0.736939,0.40196,0.392748,0.372296
6,0.9169,1.19581,0.745188,0.426984,0.397452,0.387937
7,0.7573,1.157375,0.750687,0.488418,0.445086,0.440992
8,0.656,1.114802,0.75802,0.487801,0.468737,0.462692
9,0.554,1.069286,0.767186,0.498497,0.478639,0.4753
10,0.4807,1.070981,0.766269,0.498189,0.474072,0.471829


[I 2025-03-26 09:19:41,921] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.00045976935310664983, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4148,2.898626,0.406966,0.074292,0.089886,0.068308
2,2.5726,2.219146,0.545371,0.195035,0.181414,0.168951
3,1.9605,1.750954,0.658112,0.326724,0.293909,0.277337
4,1.5183,1.485331,0.699358,0.31806,0.321141,0.302242
5,1.2192,1.350712,0.733272,0.396783,0.397178,0.375999
6,0.9965,1.224263,0.745188,0.435224,0.400445,0.38886
7,0.8265,1.179198,0.750687,0.460338,0.432113,0.423779
8,0.7183,1.14039,0.759853,0.475038,0.460408,0.448551
9,0.6163,1.090885,0.76077,0.474985,0.4699,0.460468
10,0.5384,1.084808,0.765353,0.49861,0.477034,0.472989


[I 2025-03-26 09:20:55,902] Trial 45 finished with value: 0.4838726142810639 and parameters: {'learning_rate': 0.00045976935310664983, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 46 with params: {'learning_rate': 0.00016820682795996844, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6515,3.377964,0.176902,0.003538,0.02,0.006012
2,3.2083,2.989679,0.4033,0.096499,0.087863,0.067038
3,2.8655,2.661681,0.462878,0.103673,0.118054,0.093982
4,2.5561,2.398129,0.519707,0.150169,0.154023,0.13273
5,2.3287,2.186463,0.574702,0.235376,0.200497,0.187771


[I 2025-03-26 09:21:20,512] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0003180656037351257, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5045,3.092984,0.348304,0.068804,0.071698,0.058277
2,2.8291,2.516599,0.460128,0.116385,0.116296,0.086313
3,2.3131,2.077429,0.574702,0.256477,0.206015,0.195886
4,1.9031,1.792439,0.666361,0.322707,0.291247,0.277512
5,1.6113,1.583154,0.71769,0.37616,0.35447,0.337417
6,1.3665,1.426204,0.724106,0.350373,0.360783,0.341954
7,1.176,1.338154,0.726856,0.348647,0.361542,0.343174
8,1.0543,1.285047,0.739688,0.388744,0.391529,0.370256
9,0.9497,1.23417,0.750687,0.419278,0.41389,0.397669
10,0.8536,1.202983,0.753437,0.44205,0.414522,0.405129


[I 2025-03-26 09:22:35,601] Trial 47 finished with value: 0.456653426586453 and parameters: {'learning_rate': 0.0003180656037351257, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 21 with value: 0.4941557351757956.


Trial 48 with params: {'learning_rate': 0.00040801976151097626, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4498,2.968532,0.384968,0.05839,0.082277,0.060828
2,2.6599,2.315238,0.516957,0.193618,0.15736,0.141184
3,2.0749,1.846135,0.648946,0.323936,0.273217,0.262538
4,1.6386,1.575462,0.696609,0.327832,0.329492,0.309728
5,1.3349,1.409848,0.72319,0.354257,0.364231,0.340101
6,1.1039,1.268127,0.733272,0.386766,0.378184,0.359508
7,0.9263,1.211042,0.743355,0.438976,0.406334,0.394362
8,0.8137,1.16826,0.762603,0.471039,0.449307,0.439355
9,0.7117,1.119789,0.765353,0.483758,0.467428,0.460711
10,0.6299,1.113168,0.769019,0.488671,0.480894,0.473053


[I 2025-03-26 09:23:48,503] Trial 48 finished with value: 0.4828269144867581 and parameters: {'learning_rate': 0.00040801976151097626, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 49 with params: {'learning_rate': 0.0004621344276435703, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4131,2.895418,0.410632,0.074661,0.092036,0.071326
2,2.5686,2.214771,0.547204,0.195101,0.183319,0.170247
3,1.9554,1.746956,0.660862,0.323972,0.295875,0.278773
4,1.5129,1.481451,0.699358,0.317764,0.32516,0.3064
5,1.2143,1.347602,0.732356,0.399371,0.39694,0.377598
6,0.9917,1.222584,0.745188,0.434911,0.400445,0.388698
7,0.822,1.178025,0.750687,0.460431,0.432113,0.423836
8,0.7143,1.139552,0.759853,0.47604,0.460648,0.44929
9,0.6122,1.090764,0.76077,0.480337,0.469612,0.460304
10,0.5345,1.085037,0.765353,0.499135,0.477034,0.473245


[I 2025-03-26 09:25:03,721] Trial 49 finished with value: 0.48432403857028183 and parameters: {'learning_rate': 0.0004621344276435703, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 50 with params: {'learning_rate': 0.00048339295789459613, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3992,2.86851,0.418882,0.072621,0.096599,0.075012
2,2.5338,2.177004,0.558203,0.204267,0.200698,0.188367
3,1.9101,1.710538,0.660862,0.318084,0.295444,0.278348
4,1.4653,1.448992,0.704858,0.320924,0.323902,0.305495
5,1.1703,1.320873,0.737855,0.423368,0.396147,0.37848
6,0.9508,1.206956,0.747021,0.431677,0.406304,0.394044
7,0.7853,1.166124,0.749771,0.472005,0.433129,0.427842
8,0.6815,1.126141,0.76077,0.484923,0.463909,0.455536
9,0.5787,1.082352,0.761687,0.489174,0.470577,0.466453
10,0.5033,1.079544,0.766269,0.499903,0.47879,0.474663


[I 2025-03-26 09:26:22,413] Trial 50 finished with value: 0.4918401682546774 and parameters: {'learning_rate': 0.00048339295789459613, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 51 with params: {'learning_rate': 4.412575130718341e-05, 'weight_decay': 0.007, 'warmup_steps': 2}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8011,3.687384,0.183318,0.014353,0.021918,0.009335
2,3.634,3.549369,0.180568,0.023551,0.021096,0.008109
3,3.5151,3.424933,0.192484,0.043594,0.024586,0.013842
4,3.3953,3.312511,0.311641,0.073056,0.06023,0.053325
5,3.3104,3.215288,0.379468,0.070355,0.08005,0.065405
6,3.2113,3.134316,0.398717,0.078671,0.086029,0.068194
7,3.1298,3.06177,0.408799,0.095124,0.088666,0.067797
8,3.0714,2.999783,0.423465,0.093233,0.095761,0.076178
9,3.0111,2.947738,0.434464,0.089289,0.101773,0.081737
10,2.9669,2.905807,0.444546,0.088429,0.106066,0.084416


[I 2025-03-26 09:27:14,209] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.00048322592072392375, 'weight_decay': 0.0, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3863,2.864502,0.416132,0.071169,0.096518,0.075007
2,2.5306,2.182378,0.541705,0.226191,0.182081,0.172096
3,1.9154,1.70674,0.668194,0.316929,0.294197,0.276212
4,1.4712,1.448171,0.699358,0.312363,0.321341,0.299247
5,1.1677,1.311846,0.726856,0.360581,0.372505,0.345611
6,0.9507,1.193459,0.747938,0.434126,0.407333,0.395289
7,0.7821,1.147598,0.75802,0.47546,0.44579,0.437469
8,0.6772,1.111119,0.761687,0.471334,0.460876,0.449982
9,0.5772,1.061705,0.768103,0.48112,0.475141,0.468308
10,0.5027,1.063396,0.772686,0.485867,0.48679,0.476626


[I 2025-03-26 09:28:36,204] Trial 52 finished with value: 0.4787857305346394 and parameters: {'learning_rate': 0.00048322592072392375, 'weight_decay': 0.0, 'warmup_steps': 3}. Best is trial 21 with value: 0.4941557351757956.


Trial 53 with params: {'learning_rate': 3.5160624970107914e-05, 'weight_decay': 0.0, 'warmup_steps': 2}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8168,3.71807,0.186984,0.02073,0.023274,0.011077
2,3.6733,3.601305,0.181485,0.020224,0.02137,0.008572
3,3.5759,3.499463,0.179652,0.023548,0.020822,0.007605
4,3.4764,3.407908,0.211732,0.08369,0.030352,0.021958
5,3.4076,3.323691,0.305225,0.074117,0.058322,0.051992
6,3.3234,3.253652,0.350137,0.069797,0.071954,0.061456
7,3.2543,3.192242,0.386801,0.075399,0.082182,0.066782
8,3.2053,3.140561,0.404216,0.078348,0.087753,0.069381
9,3.156,3.095836,0.402383,0.076448,0.086696,0.067029
10,3.1187,3.059156,0.416132,0.095333,0.091946,0.072062


[I 2025-03-26 09:29:29,332] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0004991151617272938, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3759,2.843028,0.422548,0.069042,0.098602,0.07581
2,2.5052,2.156743,0.551787,0.221227,0.19644,0.182852
3,1.8854,1.687091,0.663611,0.309629,0.291591,0.273492
4,1.4428,1.42853,0.699358,0.331414,0.317212,0.298908
5,1.1389,1.298304,0.729606,0.375654,0.377573,0.353403
6,0.9248,1.187461,0.747938,0.434107,0.413052,0.402621
7,0.761,1.147781,0.758937,0.472577,0.457547,0.448989
8,0.6582,1.107459,0.76077,0.476294,0.460218,0.451014
9,0.5609,1.061358,0.769019,0.474747,0.48044,0.470313
10,0.4851,1.058946,0.773602,0.482566,0.486572,0.47311


[I 2025-03-26 09:30:51,095] Trial 54 finished with value: 0.48001965941076447 and parameters: {'learning_rate': 0.0004991151617272938, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 21 with value: 0.4941557351757956.


Trial 55 with params: {'learning_rate': 0.0004835938854895277, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3991,2.868231,0.418882,0.072621,0.096599,0.075012
2,2.5335,2.17664,0.558203,0.204267,0.200698,0.188367
3,1.9097,1.71006,0.660862,0.318084,0.295444,0.278348
4,1.4649,1.448666,0.704858,0.320924,0.323902,0.305495
5,1.1698,1.320541,0.737855,0.423368,0.396147,0.37848
6,0.9502,1.206679,0.747021,0.431677,0.406304,0.394044
7,0.7849,1.166004,0.749771,0.472005,0.433129,0.427842
8,0.6811,1.12603,0.76077,0.484367,0.463909,0.455573
9,0.5783,1.081939,0.761687,0.489174,0.470577,0.466453
10,0.503,1.079233,0.766269,0.499903,0.47879,0.475057


[I 2025-03-26 09:32:11,295] Trial 55 finished with value: 0.4924077988464638 and parameters: {'learning_rate': 0.0004835938854895277, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 56 with params: {'learning_rate': 0.0004559917486250674, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4173,2.903767,0.406049,0.074713,0.089241,0.067467
2,2.5792,2.226501,0.542621,0.194224,0.179519,0.16721
3,1.9686,1.757105,0.655362,0.31974,0.289666,0.272824
4,1.5269,1.491687,0.699358,0.314411,0.320528,0.301436
5,1.227,1.35563,0.731439,0.408282,0.394035,0.375608
6,1.004,1.226462,0.743355,0.429891,0.394112,0.380944
7,0.8334,1.180883,0.751604,0.472098,0.434703,0.427843
8,0.7246,1.140702,0.761687,0.476231,0.460967,0.449176
9,0.6225,1.091219,0.762603,0.472394,0.470458,0.460487
10,0.5444,1.085677,0.76352,0.494717,0.475001,0.470708


[I 2025-03-26 09:33:01,843] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0004987902977163654, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3894,2.850225,0.419798,0.069605,0.097244,0.075081
2,2.511,2.151697,0.56187,0.204372,0.20482,0.191388
3,1.8781,1.6802,0.668194,0.314228,0.300002,0.281998
4,1.4324,1.426306,0.706691,0.342156,0.32872,0.311509
5,1.1367,1.303592,0.736939,0.40196,0.392748,0.372296
6,0.9171,1.195813,0.745188,0.426984,0.397452,0.387937
7,0.7574,1.157368,0.749771,0.487946,0.444982,0.440703
8,0.6561,1.114834,0.75802,0.487801,0.468737,0.462692
9,0.5541,1.06922,0.767186,0.498497,0.478639,0.4753
10,0.4808,1.071012,0.765353,0.497727,0.473617,0.471398


[I 2025-03-26 09:34:17,119] Trial 57 finished with value: 0.4858672535150261 and parameters: {'learning_rate': 0.0004987902977163654, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 58 with params: {'learning_rate': 0.00047895074665422327, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4021,2.873928,0.416132,0.073266,0.09507,0.073949
2,2.541,2.184784,0.557287,0.204342,0.201261,0.189003
3,1.9196,1.718619,0.661778,0.323261,0.295682,0.278947
4,1.4751,1.455713,0.703941,0.320664,0.327614,0.309269
5,1.1798,1.326803,0.736939,0.422487,0.394329,0.375017
6,0.9597,1.210347,0.745188,0.423344,0.403472,0.390289
7,0.7931,1.168407,0.749771,0.471726,0.432778,0.427525
8,0.6884,1.1301,0.761687,0.485004,0.465576,0.455702
9,0.5856,1.085029,0.761687,0.49173,0.470577,0.466264
10,0.5096,1.081293,0.766269,0.49561,0.479605,0.474606


[I 2025-03-26 09:35:33,083] Trial 58 finished with value: 0.49383561910564167 and parameters: {'learning_rate': 0.00047895074665422327, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 59 with params: {'learning_rate': 0.000488100307012158, 'weight_decay': 0.01, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3155,2.78538,0.433547,0.066307,0.105201,0.078387
2,2.4634,2.122011,0.56462,0.19604,0.196344,0.176392
3,1.861,1.672881,0.669111,0.289201,0.288776,0.266438
4,1.4329,1.42263,0.698442,0.288413,0.313763,0.288882
5,1.1448,1.297298,0.736022,0.354585,0.376015,0.349071
6,0.9396,1.176829,0.751604,0.424943,0.397503,0.384695
7,0.7745,1.141194,0.75802,0.465776,0.435763,0.428928
8,0.6719,1.096997,0.766269,0.462658,0.449302,0.438619
9,0.572,1.069403,0.769019,0.502966,0.485589,0.479754
10,0.4997,1.04548,0.772686,0.480307,0.474137,0.467153


[I 2025-03-26 09:36:24,521] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.00020829754141218318, 'weight_decay': 0.008, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6042,3.289712,0.181485,0.043554,0.021262,0.008482
2,3.093,2.845417,0.424381,0.062844,0.099022,0.073424
3,2.698,2.479123,0.494042,0.128819,0.137496,0.115308
4,2.3541,2.196081,0.562786,0.204248,0.189064,0.174469
5,2.0999,1.976115,0.616865,0.260873,0.23115,0.21537
6,1.8533,1.80054,0.664528,0.294787,0.27439,0.260382
7,1.6648,1.674917,0.684693,0.361024,0.308406,0.302447
8,1.537,1.580776,0.705775,0.360757,0.333245,0.320436
9,1.411,1.506524,0.711274,0.342759,0.334489,0.318468
10,1.3098,1.451842,0.719523,0.373365,0.348911,0.332223


[I 2025-03-26 09:37:15,568] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.00048529426982403095, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.398,2.866196,0.419798,0.072935,0.097244,0.075732
2,2.5308,2.173715,0.558203,0.204267,0.200698,0.188367
3,1.906,1.706685,0.660862,0.318084,0.295444,0.278348
4,1.4612,1.446126,0.703941,0.318737,0.322993,0.304732
5,1.1661,1.318361,0.737855,0.423368,0.396147,0.37848
6,0.9466,1.205321,0.747021,0.424959,0.397842,0.386679
7,0.782,1.165175,0.747021,0.469926,0.432366,0.426433
8,0.6785,1.124534,0.76077,0.485034,0.463909,0.455596
9,0.5759,1.080729,0.762603,0.489423,0.470815,0.466713
10,0.5006,1.078487,0.766269,0.500952,0.47879,0.475556


[I 2025-03-26 09:38:34,444] Trial 61 finished with value: 0.49386700141077666 and parameters: {'learning_rate': 0.00048529426982403095, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 62 with params: {'learning_rate': 0.0004042204591106829, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4524,2.974175,0.383135,0.058315,0.081847,0.060415
2,2.6667,2.323149,0.517874,0.195563,0.157886,0.141917
3,2.084,1.854537,0.648029,0.322341,0.272405,0.261898
4,1.6482,1.58342,0.695692,0.327934,0.328706,0.309221
5,1.3446,1.415047,0.722273,0.348121,0.362648,0.337914
6,1.1128,1.273406,0.732356,0.3845,0.37685,0.357697
7,0.9346,1.214909,0.743355,0.43643,0.406334,0.39321
8,0.8219,1.171805,0.762603,0.471006,0.449307,0.439393
9,0.7199,1.123815,0.762603,0.481011,0.46584,0.459154
10,0.6374,1.11678,0.768103,0.487867,0.478177,0.471061


[I 2025-03-26 09:39:49,737] Trial 62 finished with value: 0.4832600395945819 and parameters: {'learning_rate': 0.0004042204591106829, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 63 with params: {'learning_rate': 0.00028128766321431964, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5445,3.163717,0.297892,0.071948,0.055997,0.046409
2,2.9171,2.618862,0.452796,0.106915,0.112829,0.082683
3,2.4334,2.197815,0.557287,0.225435,0.185448,0.172123
4,2.0378,1.901154,0.641613,0.291164,0.268602,0.255058
5,1.7484,1.682342,0.702108,0.353649,0.330106,0.31839


[I 2025-03-26 09:40:15,185] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.0003835965347689684, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4665,3.004269,0.374885,0.059988,0.079503,0.059011
2,2.7036,2.366359,0.509624,0.152197,0.151389,0.131559
3,2.1349,1.901414,0.638863,0.318682,0.263496,0.255553
4,1.7035,1.629198,0.697525,0.306318,0.317397,0.297823
5,1.4012,1.44498,0.725023,0.36334,0.360109,0.336492
6,1.1651,1.302728,0.733272,0.384971,0.368158,0.350375
7,0.9838,1.23537,0.737855,0.416117,0.394226,0.38426
8,0.8685,1.194982,0.76352,0.478088,0.447205,0.439431
9,0.7659,1.142384,0.761687,0.484929,0.458536,0.454253
10,0.6805,1.130463,0.767186,0.496632,0.469688,0.46696


[I 2025-03-26 09:41:30,641] Trial 64 finished with value: 0.48293505167971773 and parameters: {'learning_rate': 0.0003835965347689684, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 65 with params: {'learning_rate': 1.8851658776032287e-05, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8512,3.788128,0.138405,0.009175,0.036587,0.008752
2,3.7573,3.707941,0.183318,0.015282,0.022178,0.009776
3,3.6967,3.647385,0.181485,0.012675,0.02137,0.00846
4,3.6374,3.595562,0.185151,0.021577,0.022466,0.010407
5,3.599,3.543805,0.180568,0.023564,0.021096,0.008128


[I 2025-03-26 09:41:55,166] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0004903441684092179, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3948,2.860147,0.419798,0.069788,0.097244,0.075065
2,2.5231,2.165082,0.558203,0.203576,0.200698,0.18806
3,1.8953,1.696767,0.661778,0.317307,0.296783,0.28056
4,1.4506,1.438924,0.705775,0.340495,0.325755,0.309223
5,1.1555,1.312948,0.737855,0.402891,0.394174,0.373847
6,0.9358,1.202665,0.745188,0.424952,0.397524,0.387323
7,0.7726,1.163166,0.748854,0.474008,0.440335,0.435657
8,0.67,1.120962,0.759853,0.484122,0.464906,0.456348
9,0.5675,1.077509,0.765353,0.493125,0.472068,0.46775
10,0.4931,1.076074,0.766269,0.498145,0.475278,0.472348


[I 2025-03-26 09:43:11,308] Trial 66 finished with value: 0.4920728027120156 and parameters: {'learning_rate': 0.0004903441684092179, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 67 with params: {'learning_rate': 0.00047284803693767644, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3932,2.877563,0.412466,0.07154,0.095658,0.074498
2,2.5476,2.199779,0.535289,0.204393,0.17277,0.158855
3,1.9366,1.723357,0.665445,0.31654,0.290899,0.272115
4,1.493,1.464839,0.702108,0.31394,0.324296,0.302541
5,1.1879,1.323156,0.72319,0.375722,0.36898,0.345179
6,0.9699,1.199636,0.745188,0.449031,0.401648,0.389803
7,0.8007,1.151457,0.75527,0.481712,0.439929,0.433289
8,0.6946,1.115017,0.761687,0.468209,0.463273,0.452502
9,0.5934,1.066974,0.768103,0.484116,0.476538,0.470464
10,0.518,1.06749,0.770852,0.484148,0.486843,0.475725


[I 2025-03-26 09:44:25,847] Trial 67 finished with value: 0.48022710362500504 and parameters: {'learning_rate': 0.00047284803693767644, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 21 with value: 0.4941557351757956.


Trial 68 with params: {'learning_rate': 0.00040020752135845226, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4551,2.979875,0.380385,0.058705,0.081202,0.059816
2,2.6736,2.331261,0.515124,0.174051,0.155917,0.138177
3,2.0936,1.863199,0.646196,0.320044,0.270341,0.260043
4,1.6586,1.592125,0.696609,0.326385,0.32907,0.30914
5,1.3552,1.420557,0.72044,0.34725,0.361208,0.337085


[I 2025-03-26 09:44:51,731] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0002158221755884315, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.605,3.283001,0.183318,0.043561,0.021738,0.009359
2,3.081,2.828342,0.426214,0.103051,0.101138,0.077044
3,2.6756,2.454337,0.497709,0.128102,0.139628,0.117036
4,2.3243,2.166302,0.566453,0.222409,0.194732,0.179598
5,2.0644,1.943241,0.622365,0.293238,0.244452,0.231625
6,1.8146,1.765617,0.675527,0.344379,0.296665,0.289572
7,1.6231,1.640222,0.694775,0.360978,0.315991,0.308408
8,1.495,1.549388,0.712191,0.371888,0.345028,0.331071
9,1.37,1.476985,0.711274,0.344996,0.345903,0.328344
10,1.2683,1.424232,0.72319,0.362816,0.361583,0.34435


[I 2025-03-26 09:45:41,247] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.000499752359065925, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3888,2.849096,0.419798,0.069468,0.097244,0.075029
2,2.5096,2.150357,0.56187,0.204292,0.20482,0.191348
3,1.8762,1.678373,0.669111,0.3143,0.300366,0.282353
4,1.4304,1.424935,0.707608,0.329694,0.329084,0.311705
5,1.1344,1.302712,0.736022,0.400878,0.391414,0.37088
6,0.9149,1.194897,0.745188,0.426857,0.397452,0.387732
7,0.7558,1.156682,0.749771,0.487391,0.444946,0.440355
8,0.6545,1.114518,0.75802,0.487756,0.468737,0.46251
9,0.5526,1.068286,0.767186,0.498596,0.478788,0.475435
10,0.4794,1.07037,0.765353,0.497727,0.473617,0.471398


[I 2025-03-26 09:46:56,180] Trial 70 finished with value: 0.4861917323142149 and parameters: {'learning_rate': 0.000499752359065925, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 71 with params: {'learning_rate': 0.00046924068881510023, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4084,2.885947,0.411549,0.07397,0.092682,0.071857
2,2.5571,2.20225,0.553621,0.195044,0.189851,0.176802
3,1.9409,1.73591,0.659945,0.320783,0.294831,0.277903
4,1.4971,1.470665,0.702108,0.319815,0.326341,0.308155
5,1.2006,1.338844,0.734189,0.401674,0.398314,0.378957
6,0.9792,1.217407,0.746104,0.43011,0.402634,0.389384
7,0.8101,1.173893,0.750687,0.456922,0.434166,0.425318
8,0.7036,1.136667,0.761687,0.488319,0.462737,0.452773
9,0.6011,1.09022,0.76077,0.490264,0.46992,0.464268
10,0.524,1.084486,0.76352,0.491458,0.474443,0.470855


[I 2025-03-26 09:47:47,981] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 1.847006633877252e-05, 'weight_decay': 0.005, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8457,3.7835,0.146654,0.009458,0.017793,0.008953
2,3.7544,3.706388,0.183318,0.015282,0.022178,0.009776
3,3.6961,3.647616,0.182401,0.013587,0.021644,0.008907
4,3.6383,3.597261,0.185151,0.019944,0.022466,0.01036
5,3.6013,3.547011,0.180568,0.023564,0.021096,0.008128
6,3.5506,3.50575,0.180568,0.023564,0.021096,0.008128
7,3.5112,3.46873,0.183318,0.023581,0.021918,0.009574
8,3.4832,3.437335,0.193401,0.063618,0.024801,0.014301
9,3.4536,3.410389,0.208983,0.083687,0.029476,0.020556
10,3.4327,3.387177,0.227314,0.076444,0.034892,0.028131


[I 2025-03-26 09:48:36,721] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 6.1000955731656155e-05, 'weight_decay': 0.001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7803,3.641916,0.177819,0.013545,0.020274,0.006555
2,3.5697,3.460017,0.179652,0.023548,0.020822,0.007605
3,3.4111,3.296248,0.31439,0.071914,0.060545,0.052884
4,3.255,3.154732,0.396884,0.078753,0.085359,0.067347
5,3.1447,3.032179,0.413382,0.095132,0.090343,0.06916


[I 2025-03-26 09:49:01,720] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0004889447518022103, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3957,2.86176,0.419798,0.070584,0.097244,0.075237
2,2.5252,2.167434,0.558203,0.203463,0.200698,0.187992
3,1.8982,1.699468,0.662695,0.319027,0.29845,0.282462
4,1.4536,1.440982,0.704858,0.338758,0.324326,0.307207
5,1.1586,1.314555,0.738772,0.423432,0.396674,0.378628
6,0.9389,1.203373,0.746104,0.425223,0.397627,0.387535
7,0.7753,1.163926,0.748854,0.474008,0.440335,0.435657
8,0.6724,1.12187,0.759853,0.484038,0.463455,0.45529
9,0.57,1.07843,0.764436,0.492598,0.47158,0.467286
10,0.4953,1.076767,0.766269,0.498145,0.475278,0.472348


[I 2025-03-26 09:50:16,266] Trial 74 finished with value: 0.4933209616956229 and parameters: {'learning_rate': 0.0004889447518022103, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 21 with value: 0.4941557351757956.


Trial 75 with params: {'learning_rate': 0.0004928703331115589, 'weight_decay': 0.008, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.38,2.852261,0.417049,0.069731,0.096733,0.074754
2,2.5147,2.166527,0.549038,0.231765,0.192074,0.18171
3,1.8969,1.694638,0.665445,0.307226,0.291606,0.273776
4,1.453,1.434914,0.702108,0.32831,0.320984,0.300324
5,1.15,1.304522,0.730522,0.376182,0.37719,0.35248
6,0.9347,1.189385,0.750687,0.443533,0.413966,0.403542
7,0.7684,1.145531,0.759853,0.456314,0.450053,0.438244
8,0.6649,1.106858,0.761687,0.469371,0.460728,0.44986
9,0.5667,1.060932,0.768103,0.472848,0.474304,0.46474
10,0.4918,1.058014,0.773602,0.485481,0.486312,0.47586


[I 2025-03-26 09:51:32,665] Trial 75 finished with value: 0.48472228234703374 and parameters: {'learning_rate': 0.0004928703331115589, 'weight_decay': 0.008, 'warmup_steps': 3}. Best is trial 21 with value: 0.4941557351757956.


Trial 76 with params: {'learning_rate': 0.00047612995586855563, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4039,2.877421,0.414299,0.073256,0.09421,0.073154
2,2.5457,2.189941,0.557287,0.204342,0.201261,0.189003
3,1.9258,1.723883,0.660862,0.324082,0.294254,0.27699
4,1.4814,1.459874,0.703941,0.320664,0.327614,0.309269
5,1.1858,1.330277,0.736022,0.422646,0.393965,0.374772
6,0.9652,1.212445,0.746104,0.425314,0.403998,0.391065
7,0.798,1.169914,0.748854,0.470136,0.430892,0.425342
8,0.6927,1.132273,0.762603,0.485332,0.465679,0.455927
9,0.5899,1.086637,0.758937,0.491568,0.469453,0.464948
10,0.5137,1.082238,0.765353,0.486008,0.475073,0.470392


[I 2025-03-26 09:52:47,859] Trial 76 finished with value: 0.49512573819371475 and parameters: {'learning_rate': 0.00047612995586855563, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 77 with params: {'learning_rate': 0.0001975196754254445, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6149,3.310712,0.178735,0.023545,0.020476,0.006952
2,3.1216,2.881856,0.419798,0.08561,0.096896,0.073722
3,2.7402,2.524101,0.486709,0.127296,0.132943,0.110068
4,2.4042,2.245338,0.55637,0.206678,0.18387,0.169196
5,2.1567,2.026995,0.605866,0.293097,0.227978,0.217181


[I 2025-03-26 09:53:13,275] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.0002253508677540182, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5962,3.26553,0.19615,0.038621,0.025323,0.015322
2,3.057,2.797117,0.430797,0.102147,0.103415,0.077735
3,2.6396,2.416522,0.500458,0.13623,0.14091,0.118284
4,2.2819,2.125012,0.572869,0.223192,0.198138,0.183374
5,2.0164,1.900874,0.63703,0.319819,0.263962,0.25204


[I 2025-03-26 09:53:38,690] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0002478881905648277, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5754,3.225302,0.24473,0.058259,0.039431,0.032436
2,3.0006,2.723456,0.445463,0.102622,0.111929,0.083752
3,2.5546,2.325396,0.515124,0.157747,0.15272,0.13325
4,2.1801,2.027538,0.604033,0.263806,0.229246,0.213411
5,1.9025,1.802718,0.671861,0.340285,0.296111,0.28828
6,1.6483,1.628891,0.699358,0.376841,0.321876,0.314342
7,1.452,1.510914,0.706691,0.343911,0.328322,0.314353
8,1.3243,1.430447,0.72594,0.350905,0.359051,0.338152
9,1.2027,1.367634,0.729606,0.356541,0.367877,0.348622
10,1.1024,1.3258,0.745188,0.409806,0.396251,0.379899


[I 2025-03-26 09:54:26,982] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0004997002486542191, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3889,2.84913,0.419798,0.0696,0.097244,0.075105
2,2.5097,2.150355,0.56187,0.204292,0.20482,0.191348
3,1.8763,1.678547,0.669111,0.314499,0.300366,0.282516
4,1.4305,1.424955,0.706691,0.327956,0.327655,0.309689
5,1.1345,1.302668,0.736022,0.40164,0.391414,0.371554
6,0.915,1.194955,0.745188,0.427181,0.397452,0.387904
7,0.7558,1.156766,0.749771,0.487208,0.444946,0.440245
8,0.6545,1.114794,0.75802,0.487756,0.468737,0.46251
9,0.5526,1.068362,0.768103,0.498771,0.479003,0.475639
10,0.4795,1.070223,0.765353,0.497727,0.473617,0.471398


[I 2025-03-26 09:55:18,498] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.000466748580049297, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.41,2.889289,0.411549,0.073974,0.092682,0.071863
2,2.5612,2.206781,0.551787,0.19431,0.187086,0.173253
3,1.9462,1.740095,0.659945,0.320518,0.295417,0.278389
4,1.5028,1.474415,0.702108,0.319498,0.326341,0.307896
5,1.2055,1.341983,0.734189,0.401962,0.397139,0.378488
6,0.9837,1.219099,0.746104,0.430472,0.402634,0.389645
7,0.8143,1.175585,0.749771,0.461115,0.432348,0.425122
8,0.7072,1.138255,0.761687,0.488391,0.462737,0.452831
9,0.6049,1.090913,0.76077,0.489955,0.469795,0.464082
10,0.5276,1.085286,0.764436,0.492305,0.475982,0.471981


[I 2025-03-26 09:56:32,574] Trial 81 finished with value: 0.4947309199146923 and parameters: {'learning_rate': 0.000466748580049297, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 82 with params: {'learning_rate': 0.0004583601020505076, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4029,2.896107,0.407883,0.073533,0.093167,0.072746
2,2.5717,2.224407,0.532539,0.19916,0.170892,0.155785
3,1.9679,1.749473,0.659945,0.313153,0.281316,0.263723
4,1.5264,1.491183,0.702108,0.337994,0.333765,0.312175
5,1.2202,1.344835,0.725023,0.371987,0.367861,0.343281
6,0.9999,1.212319,0.739688,0.444,0.389216,0.379435
7,0.8288,1.158887,0.754354,0.441145,0.436871,0.426104
8,0.721,1.122881,0.761687,0.458494,0.455102,0.441611
9,0.6189,1.077683,0.768103,0.487437,0.475063,0.469269
10,0.5422,1.077562,0.767186,0.483769,0.485327,0.474976


[I 2025-03-26 09:57:48,460] Trial 82 finished with value: 0.4794185783682232 and parameters: {'learning_rate': 0.0004583601020505076, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 76 with value: 0.49512573819371475.


Trial 83 with params: {'learning_rate': 0.00045585643905809184, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4174,2.903936,0.406049,0.074713,0.089241,0.067467
2,2.5794,2.226751,0.541705,0.193688,0.178567,0.166292
3,1.9689,1.757299,0.654445,0.31897,0.287166,0.271381
4,1.5273,1.491902,0.699358,0.314411,0.320528,0.301436
5,1.2272,1.355674,0.731439,0.408282,0.394035,0.375608
6,1.0042,1.226498,0.744271,0.431878,0.394476,0.381225
7,0.8337,1.180923,0.751604,0.472098,0.434703,0.427843
8,0.7247,1.140689,0.761687,0.476231,0.460967,0.449176
9,0.6226,1.091231,0.762603,0.472394,0.470458,0.460487
10,0.5446,1.085639,0.76352,0.494717,0.475001,0.470708


[I 2025-03-26 09:59:15,378] Trial 83 finished with value: 0.4832865107939125 and parameters: {'learning_rate': 0.00045585643905809184, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 84 with params: {'learning_rate': 0.00047960083104339106, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4016,2.873136,0.416132,0.073138,0.09507,0.07387
2,2.5399,2.183604,0.557287,0.204342,0.201261,0.189003
3,1.9182,1.717427,0.661778,0.318419,0.295682,0.278669
4,1.4737,1.454749,0.704858,0.324502,0.330114,0.310749
5,1.1784,1.325935,0.736939,0.422487,0.394329,0.375017
6,0.9584,1.210055,0.745188,0.423347,0.403472,0.390249
7,0.7919,1.168174,0.750687,0.472172,0.433233,0.42799
8,0.6873,1.129322,0.76077,0.484358,0.463909,0.454785
9,0.5845,1.084709,0.76077,0.491341,0.470474,0.466041
10,0.5087,1.080993,0.767186,0.495176,0.479843,0.474416


[I 2025-03-26 10:00:33,949] Trial 84 finished with value: 0.4941557351757956 and parameters: {'learning_rate': 0.00047960083104339106, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 85 with params: {'learning_rate': 0.0004504462285923686, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4211,2.911146,0.404216,0.074356,0.088787,0.067151
2,2.5889,2.23698,0.535289,0.191229,0.170344,0.15534
3,1.9804,1.766363,0.653529,0.317197,0.2855,0.269348
4,1.5399,1.50111,0.699358,0.319177,0.330074,0.310083
5,1.2385,1.362696,0.732356,0.410145,0.393547,0.375603
6,1.0152,1.23026,0.744271,0.427764,0.392761,0.377585
7,0.844,1.184253,0.751604,0.468781,0.434866,0.427943
8,0.7345,1.142835,0.762603,0.478426,0.461316,0.45082
9,0.6326,1.093915,0.765353,0.473476,0.471586,0.461711
10,0.5539,1.087768,0.764436,0.488837,0.475105,0.468451


[I 2025-03-26 10:01:55,588] Trial 85 finished with value: 0.48295363502244654 and parameters: {'learning_rate': 0.0004504462285923686, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 86 with params: {'learning_rate': 4.0534446710776905e-05, 'weight_decay': 0.01, 'warmup_steps': 1}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8048,3.697174,0.185151,0.012607,0.022466,0.009983
2,3.6476,3.568282,0.180568,0.023551,0.021096,0.008109
3,3.5377,3.452966,0.181485,0.023554,0.02137,0.0086
4,3.4262,3.348852,0.274977,0.075659,0.049545,0.043865
5,3.3477,3.256502,0.358387,0.069593,0.074302,0.062718


[I 2025-03-26 10:02:23,043] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.00020640839054860755, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6138,3.300375,0.178735,0.023545,0.020476,0.006952
2,3.1047,2.858563,0.423465,0.083463,0.099158,0.074432
3,2.7108,2.491558,0.493126,0.129101,0.136966,0.114938
4,2.3664,2.207397,0.560037,0.205071,0.187686,0.172887
5,2.112,1.985918,0.613199,0.28145,0.234254,0.221572
6,1.865,1.809112,0.664528,0.295206,0.275176,0.262563
7,1.6757,1.68226,0.686526,0.360895,0.308964,0.302828
8,1.547,1.587507,0.705775,0.373479,0.337236,0.327508
9,1.4203,1.512,0.710357,0.342735,0.332686,0.316991
10,1.319,1.457126,0.718607,0.35536,0.347352,0.330168


[I 2025-03-26 10:03:12,893] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 7.076325261453758e-05, 'weight_decay': 0.004, 'warmup_steps': 2}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7591,3.609309,0.176902,0.003538,0.02,0.006012
2,3.5266,3.404759,0.185151,0.043567,0.022394,0.010485
3,3.3483,3.22305,0.362053,0.063667,0.074947,0.060811
4,3.1755,3.067666,0.407883,0.074791,0.087895,0.066215
5,3.0523,2.931618,0.431714,0.089597,0.100075,0.080149
6,2.9123,2.817206,0.447296,0.084184,0.108399,0.084182
7,2.8011,2.722059,0.457379,0.104357,0.113237,0.088294
8,2.7229,2.638457,0.476627,0.10361,0.12586,0.100681
9,2.6365,2.567607,0.484876,0.103677,0.128491,0.102569
10,2.5717,2.511946,0.501375,0.126735,0.139712,0.115057


[I 2025-03-26 10:04:03,306] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0004894091180005273, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3954,2.861219,0.419798,0.070584,0.097244,0.075237
2,2.5244,2.166587,0.558203,0.203576,0.200698,0.18806
3,1.8972,1.698675,0.662695,0.319027,0.29845,0.282462
4,1.4527,1.44038,0.704858,0.338667,0.324326,0.30721
5,1.1576,1.314144,0.738772,0.423122,0.396674,0.378488
6,0.938,1.203286,0.746104,0.425223,0.397627,0.387535
7,0.7745,1.163654,0.747938,0.473494,0.437478,0.431398
8,0.6717,1.121604,0.759853,0.484449,0.463455,0.455722
9,0.5692,1.078324,0.764436,0.492598,0.47158,0.467286
10,0.4946,1.076611,0.766269,0.498016,0.475278,0.472287


[I 2025-03-26 10:04:53,648] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0004741937479140062, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4051,2.879817,0.413382,0.073578,0.093565,0.072541
2,2.5489,2.193315,0.557287,0.195008,0.193816,0.179808
3,1.9301,1.727408,0.660862,0.323667,0.295319,0.278259
4,1.4858,1.462934,0.703025,0.320021,0.326705,0.308551
5,1.19,1.332866,0.736022,0.42267,0.393965,0.374813
6,0.9694,1.214021,0.745188,0.429682,0.40218,0.388804
7,0.8015,1.170931,0.747021,0.457095,0.428022,0.418655
8,0.6958,1.133717,0.762603,0.484291,0.464225,0.45447
9,0.5931,1.08757,0.759853,0.491887,0.469557,0.465154
10,0.5166,1.082722,0.764436,0.485781,0.474547,0.469967


[I 2025-03-26 10:05:42,236] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0004949145994118729, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3919,2.854751,0.419798,0.069485,0.097244,0.07495
2,2.5164,2.157598,0.560037,0.204309,0.20415,0.19099
3,1.8858,1.687967,0.665445,0.3184,0.2997,0.283232
4,1.4408,1.432213,0.704858,0.339763,0.325688,0.308979
5,1.1455,1.307761,0.736939,0.402636,0.393936,0.373109
6,0.926,1.199835,0.746104,0.425949,0.397978,0.388294
7,0.7643,1.160894,0.748854,0.466242,0.441683,0.434849
8,0.6626,1.117915,0.757104,0.484411,0.468762,0.460839
9,0.5603,1.074147,0.766269,0.497215,0.476068,0.471717
10,0.4866,1.074198,0.766269,0.498869,0.475278,0.472714


[I 2025-03-26 10:06:56,122] Trial 91 finished with value: 0.48263297400373917 and parameters: {'learning_rate': 0.0004949145994118729, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 92 with params: {'learning_rate': 0.00025569988534885664, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5681,3.211849,0.264895,0.054291,0.045691,0.038428
2,2.9815,2.699388,0.446379,0.102381,0.112144,0.083702
3,2.5266,2.294776,0.52154,0.184713,0.158035,0.14143
4,2.146,1.996227,0.614115,0.270426,0.239685,0.224631
5,1.8641,1.771764,0.67736,0.341888,0.303413,0.294524


[I 2025-03-26 10:07:21,065] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.00047159518812246964, 'weight_decay': 0.008, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3939,2.879185,0.412466,0.072657,0.095658,0.074841
2,2.5496,2.201767,0.533456,0.20191,0.171602,0.157311
3,1.9391,1.725411,0.665445,0.31654,0.290899,0.272115
4,1.4958,1.467122,0.701192,0.313243,0.323841,0.301859
5,1.1906,1.324724,0.72319,0.375596,0.36898,0.345118
6,0.9724,1.200426,0.745188,0.448886,0.401648,0.389694
7,0.803,1.152056,0.757104,0.483814,0.440566,0.434351
8,0.6969,1.115727,0.76352,0.469101,0.464182,0.453404
9,0.5955,1.067989,0.769936,0.484514,0.477207,0.470915
10,0.52,1.068514,0.770852,0.484148,0.486843,0.475725


[I 2025-03-26 10:08:35,466] Trial 93 finished with value: 0.480156444559878 and parameters: {'learning_rate': 0.00047159518812246964, 'weight_decay': 0.008, 'warmup_steps': 3}. Best is trial 76 with value: 0.49512573819371475.


Trial 94 with params: {'learning_rate': 0.0004657598377890262, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4107,2.890473,0.411549,0.074158,0.092682,0.071924
2,2.5627,2.20837,0.551787,0.194567,0.187086,0.173419
3,1.9481,1.741512,0.659028,0.3214,0.295202,0.278388
4,1.5049,1.47585,0.702108,0.319337,0.326341,0.307849
5,1.2075,1.343088,0.734189,0.401962,0.397139,0.378488
6,0.9855,1.219933,0.746104,0.429945,0.402634,0.389274
7,0.816,1.176131,0.749771,0.465364,0.432348,0.425524
8,0.7088,1.138652,0.759853,0.475079,0.460522,0.449528
9,0.6065,1.090976,0.76077,0.489975,0.469795,0.464073
10,0.5291,1.085325,0.764436,0.49199,0.475982,0.471733


[I 2025-03-26 10:09:24,157] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0004769585927044506, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4034,2.876386,0.415215,0.07327,0.094855,0.073856
2,2.5443,2.188382,0.557287,0.204342,0.201261,0.189003
3,1.9241,1.722416,0.661778,0.324121,0.295682,0.279361
4,1.4796,1.4588,0.703941,0.320664,0.327614,0.309269
5,1.1842,1.329448,0.736022,0.422106,0.393965,0.374588
6,0.9637,1.211916,0.745188,0.424947,0.403472,0.39059
7,0.7966,1.169444,0.750687,0.471802,0.432926,0.427596
8,0.6915,1.131598,0.761687,0.485004,0.465576,0.455702
9,0.5887,1.086339,0.76077,0.492386,0.470123,0.465707
10,0.5125,1.081951,0.767186,0.493896,0.47658,0.472499


[I 2025-03-26 10:10:38,086] Trial 95 finished with value: 0.49512573819371475 and parameters: {'learning_rate': 0.0004769585927044506, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 96 with params: {'learning_rate': 0.000260765287717221, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5634,3.202812,0.272227,0.052264,0.047937,0.039965
2,2.9688,2.683421,0.446379,0.088664,0.112215,0.083538
3,2.5081,2.274837,0.537122,0.217373,0.170631,0.157455
4,2.1238,1.976311,0.618698,0.300879,0.253125,0.242502
5,1.8398,1.752609,0.68286,0.34158,0.3076,0.297941


[I 2025-03-26 10:11:02,275] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0004910817625469622, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3943,2.859261,0.419798,0.069788,0.097244,0.075065
2,2.5219,2.163796,0.55912,0.203731,0.203198,0.190175
3,1.8938,1.695439,0.661778,0.317476,0.296783,0.280638
4,1.4492,1.437913,0.705775,0.340143,0.325755,0.309033
5,1.154,1.312078,0.737855,0.40304,0.394174,0.373753
6,0.9344,1.202306,0.745188,0.424536,0.397524,0.387185
7,0.7713,1.162846,0.748854,0.474008,0.440335,0.435657
8,0.6688,1.120351,0.758937,0.483404,0.464452,0.455797
9,0.5663,1.076873,0.765353,0.493009,0.472068,0.467673
10,0.492,1.075673,0.766269,0.498145,0.475278,0.472348


[I 2025-03-26 10:12:16,521] Trial 97 finished with value: 0.4928190274687641 and parameters: {'learning_rate': 0.0004910817625469622, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 98 with params: {'learning_rate': 0.00048420804645289174, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3987,2.867554,0.419798,0.072784,0.097244,0.075681
2,2.5325,2.175591,0.558203,0.204267,0.200698,0.188367
3,1.9083,1.708828,0.660862,0.318084,0.295444,0.278348
4,1.4635,1.447744,0.703941,0.318829,0.322993,0.304729
5,1.1684,1.319812,0.737855,0.423368,0.396147,0.37848
6,0.9489,1.206146,0.747021,0.424959,0.397842,0.386679
7,0.7839,1.165832,0.749771,0.472005,0.433129,0.427842
8,0.6802,1.125629,0.76077,0.484477,0.463909,0.455633
9,0.5775,1.081648,0.761687,0.489174,0.470577,0.466453
10,0.5022,1.079064,0.766269,0.500049,0.47879,0.475122


[I 2025-03-26 10:13:32,424] Trial 98 finished with value: 0.4927893408163375 and parameters: {'learning_rate': 0.00048420804645289174, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 99 with params: {'learning_rate': 0.00022815292178642897, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5936,3.260383,0.197984,0.037749,0.025871,0.016137
2,3.0499,2.78782,0.433547,0.101722,0.104688,0.079209
3,2.6289,2.405306,0.502291,0.144395,0.14235,0.120777
4,2.2692,2.112741,0.574702,0.223545,0.201164,0.185981
5,2.0022,1.88837,0.640697,0.320896,0.268758,0.257609


[I 2025-03-26 10:13:56,867] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0003524736859308233, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4781,3.041611,0.371219,0.063347,0.078619,0.060112
2,2.7557,2.431396,0.497709,0.158886,0.14439,0.124257
3,2.2118,1.97698,0.604033,0.292187,0.23561,0.225075
4,1.7904,1.702494,0.684693,0.309638,0.303895,0.28761
5,1.4936,1.502443,0.715857,0.368847,0.356926,0.3369


[I 2025-03-26 10:14:21,417] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0004903536318893038, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3947,2.860117,0.419798,0.069788,0.097244,0.075065
2,2.523,2.164997,0.558203,0.203463,0.200698,0.187992
3,1.8952,1.696679,0.661778,0.317767,0.296783,0.280791
4,1.4506,1.438818,0.704858,0.338758,0.324326,0.307207
5,1.1555,1.312894,0.737855,0.402619,0.394174,0.373721
6,0.9358,1.202537,0.745188,0.424952,0.397524,0.387323
7,0.7726,1.163026,0.748854,0.474008,0.440335,0.435657
8,0.6699,1.120805,0.758937,0.483995,0.46324,0.455366
9,0.5675,1.077348,0.765353,0.493125,0.472068,0.46775
10,0.4931,1.076036,0.766269,0.498145,0.475278,0.472348


[I 2025-03-26 10:15:36,706] Trial 101 finished with value: 0.4927688811433882 and parameters: {'learning_rate': 0.0004903536318893038, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 102 with params: {'learning_rate': 0.00043456777714885706, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4319,2.93189,0.3978,0.074897,0.086422,0.064601
2,2.6157,2.267116,0.527956,0.180882,0.164311,0.147838
3,2.0145,1.793206,0.648946,0.315845,0.278755,0.26357
4,1.5752,1.526486,0.698442,0.325759,0.331958,0.313155
5,1.2717,1.37978,0.728689,0.405937,0.387162,0.369864
6,1.046,1.241865,0.743355,0.439662,0.394382,0.381486
7,0.8731,1.192084,0.753437,0.47488,0.436821,0.431021
8,0.763,1.149675,0.76077,0.466131,0.453251,0.441921
9,0.6609,1.103553,0.76352,0.482542,0.471619,0.463426
10,0.5804,1.09594,0.76352,0.486274,0.475215,0.46729


[I 2025-03-26 10:16:27,082] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.000448014133753709, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4227,2.914295,0.402383,0.074484,0.087927,0.065991
2,2.5931,2.241637,0.532539,0.188304,0.167972,0.15247
3,1.9855,1.77018,0.653529,0.317357,0.285461,0.269546
4,1.5453,1.504905,0.699358,0.319107,0.330074,0.310143
5,1.2433,1.365524,0.730522,0.391675,0.385984,0.366835
6,1.0199,1.23164,0.744271,0.425953,0.39267,0.377224
7,0.8485,1.185344,0.752521,0.468181,0.43514,0.428927
8,0.7388,1.143458,0.762603,0.474977,0.456531,0.447493
9,0.637,1.094866,0.76352,0.472917,0.471119,0.461024
10,0.5578,1.088723,0.764436,0.488832,0.475105,0.468299


[I 2025-03-26 10:17:44,038] Trial 103 finished with value: 0.48324648858550234 and parameters: {'learning_rate': 0.000448014133753709, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 104 with params: {'learning_rate': 1.1873161138364599e-05, 'weight_decay': 0.006, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8647,3.819999,0.056829,0.008467,0.027544,0.007037
2,3.7994,3.763962,0.172319,0.009675,0.021216,0.009971
3,3.7558,3.716954,0.188818,0.015684,0.023822,0.011584
4,3.7122,3.679777,0.186984,0.013624,0.023014,0.010768
5,3.6869,3.646876,0.185151,0.015597,0.022466,0.010189


[I 2025-03-26 10:18:08,534] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.00047549851929469956, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4043,2.878247,0.414299,0.07342,0.09421,0.073212
2,2.5467,2.191033,0.557287,0.204325,0.201261,0.188999
3,1.9272,1.725059,0.661778,0.324258,0.295682,0.278925
4,1.4828,1.460971,0.704858,0.321851,0.328069,0.310056
5,1.1872,1.331135,0.735105,0.422437,0.393727,0.374698
6,0.9666,1.213017,0.746104,0.425314,0.403998,0.391065
7,0.7992,1.170205,0.747938,0.46869,0.42984,0.424057
8,0.6937,1.132875,0.762603,0.485332,0.465679,0.455927
9,0.5909,1.086952,0.758937,0.491568,0.469453,0.464948
10,0.5145,1.082442,0.764436,0.485762,0.474547,0.469967


[I 2025-03-26 10:19:24,293] Trial 105 finished with value: 0.49512573819371475 and parameters: {'learning_rate': 0.00047549851929469956, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 106 with params: {'learning_rate': 0.00042340575293188244, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4264,2.939136,0.384051,0.074993,0.084324,0.064095
2,2.6291,2.287687,0.527039,0.198606,0.167736,0.152769
3,2.0453,1.818323,0.652612,0.318321,0.276093,0.261404
4,1.6086,1.558016,0.694775,0.315541,0.329309,0.305785
5,1.3051,1.399441,0.719523,0.347455,0.358902,0.332263


[I 2025-03-26 10:19:48,525] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.000499408831202765, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3891,2.849584,0.419798,0.069468,0.097244,0.075029
2,2.5102,2.150933,0.56187,0.204292,0.20482,0.191348
3,1.8769,1.678954,0.669111,0.3143,0.300366,0.282353
4,1.4311,1.425422,0.705775,0.330825,0.326117,0.308447
5,1.1351,1.302993,0.736022,0.398658,0.391414,0.369751
6,0.9154,1.195227,0.745188,0.426857,0.397452,0.387732
7,0.7563,1.156968,0.749771,0.487391,0.444946,0.440355
8,0.655,1.114769,0.75802,0.487756,0.468737,0.46251
9,0.5531,1.06835,0.767186,0.498611,0.478788,0.475424
10,0.4799,1.070426,0.765353,0.497727,0.473617,0.471398


[I 2025-03-26 10:21:05,913] Trial 107 finished with value: 0.4861917323142149 and parameters: {'learning_rate': 0.000499408831202765, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 108 with params: {'learning_rate': 0.00022466446675269706, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5968,3.266782,0.19615,0.038621,0.025323,0.015322
2,3.0587,2.799362,0.428964,0.101733,0.102532,0.077136
3,2.6422,2.419271,0.500458,0.13623,0.14091,0.118284
4,2.2849,2.127923,0.571952,0.222723,0.197774,0.182597
5,2.0198,1.903769,0.636114,0.319721,0.263599,0.25151
6,1.7678,1.726031,0.684693,0.366909,0.306616,0.301081
7,1.5743,1.60215,0.696609,0.35637,0.316815,0.308051
8,1.4466,1.51443,0.71769,0.362957,0.349741,0.332088
9,1.3227,1.44504,0.71769,0.35994,0.353585,0.338581
10,1.2209,1.394933,0.727773,0.358977,0.369398,0.350056


[I 2025-03-26 10:21:55,396] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0004750920561240741, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4045,2.878686,0.414299,0.073427,0.09421,0.073224
2,2.5474,2.191712,0.558203,0.204852,0.202213,0.189769
3,1.9281,1.725806,0.660862,0.323667,0.295319,0.278259
4,1.4838,1.46154,0.703941,0.320955,0.327614,0.309567
5,1.1882,1.331767,0.736022,0.423004,0.393965,0.375083
6,0.9675,1.213403,0.746104,0.425314,0.403998,0.391065
7,0.7999,1.170374,0.747938,0.45869,0.42984,0.420723
8,0.6944,1.133077,0.761687,0.48352,0.462796,0.45349
9,0.5916,1.087189,0.758937,0.491455,0.469453,0.464871
10,0.5152,1.082521,0.764436,0.485762,0.474547,0.469967


[I 2025-03-26 10:22:45,167] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.00024102013405689706, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5817,3.237311,0.224565,0.057088,0.033637,0.026337
2,3.0175,2.745327,0.442713,0.102886,0.110539,0.083156
3,2.5799,2.353059,0.511457,0.160065,0.150479,0.131614
4,2.2109,2.056427,0.593951,0.242461,0.218267,0.204398
5,1.9368,1.831587,0.670027,0.341423,0.293918,0.287191


[I 2025-03-26 10:23:09,338] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0004382225933607684, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4295,2.927171,0.398717,0.075085,0.086637,0.064833
2,2.6098,2.260418,0.528873,0.18531,0.165263,0.149358
3,2.0066,1.78697,0.649863,0.316333,0.280798,0.265356
4,1.567,1.520569,0.696609,0.321407,0.328407,0.309038
5,1.2639,1.376361,0.727773,0.406115,0.386708,0.369632
6,1.0392,1.238982,0.745188,0.447488,0.396668,0.383644
7,0.8665,1.190572,0.753437,0.474831,0.433276,0.428277
8,0.7564,1.148309,0.75802,0.465319,0.452556,0.441204
9,0.6545,1.101532,0.764436,0.490535,0.47347,0.467553
10,0.5744,1.094042,0.76352,0.487418,0.475215,0.467846


[I 2025-03-26 10:24:25,113] Trial 111 finished with value: 0.4811397270126018 and parameters: {'learning_rate': 0.0004382225933607684, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 112 with params: {'learning_rate': 0.00011817202461154217, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6775,3.465125,0.176902,0.003538,0.02,0.006012
2,3.3392,3.163355,0.367553,0.061644,0.07698,0.061103
3,3.0789,2.916276,0.425298,0.071065,0.095093,0.070916
4,2.8362,2.702029,0.449129,0.093432,0.108095,0.081493
5,2.6602,2.515345,0.485793,0.103619,0.130782,0.106173


[I 2025-03-26 10:24:50,221] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.00046353462963312427, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3994,2.889542,0.408799,0.072206,0.093507,0.072723
2,2.5631,2.215429,0.532539,0.199129,0.17061,0.155833
3,1.9564,1.739833,0.664528,0.317566,0.290627,0.271956
4,1.5141,1.481449,0.704858,0.339829,0.33531,0.313769
5,1.2081,1.336276,0.72594,0.381953,0.369038,0.344519
6,0.9887,1.207406,0.741522,0.448707,0.397433,0.387004
7,0.8184,1.156215,0.75527,0.46546,0.438917,0.431643
8,0.7115,1.119901,0.762603,0.468032,0.460102,0.44782
9,0.6097,1.073786,0.769936,0.483591,0.477184,0.470078
10,0.5336,1.074701,0.768103,0.484314,0.485645,0.475521


[I 2025-03-26 10:26:04,196] Trial 113 finished with value: 0.48032163599691813 and parameters: {'learning_rate': 0.00046353462963312427, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 76 with value: 0.49512573819371475.


Trial 114 with params: {'learning_rate': 0.0004945026061552231, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3921,2.855276,0.419798,0.069487,0.097244,0.074954
2,2.5169,2.158192,0.560953,0.204346,0.204365,0.191093
3,1.8867,1.688871,0.663611,0.317956,0.29881,0.282154
4,1.4418,1.433057,0.704858,0.339954,0.32554,0.308843
5,1.1467,1.308578,0.736939,0.402908,0.393936,0.373236
6,0.9272,1.200456,0.745188,0.424773,0.397524,0.387342
7,0.7653,1.161206,0.749771,0.466654,0.441898,0.435175
8,0.6634,1.118216,0.75802,0.484968,0.468977,0.46127
9,0.561,1.074441,0.767186,0.498167,0.478568,0.47492
10,0.4871,1.074346,0.766269,0.498869,0.475278,0.472714


[I 2025-03-26 10:27:32,006] Trial 114 finished with value: 0.49368548676919993 and parameters: {'learning_rate': 0.0004945026061552231, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 115 with params: {'learning_rate': 0.00026416703012146944, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5602,3.196414,0.280477,0.052879,0.050474,0.042164
2,2.9601,2.672429,0.445463,0.08848,0.111977,0.083375
3,2.4955,2.261434,0.543538,0.224037,0.175602,0.162423
4,2.1089,1.96306,0.618698,0.297387,0.25445,0.244191
5,1.8237,1.740118,0.687443,0.343676,0.310678,0.300505
6,1.5709,1.568848,0.709441,0.349397,0.334214,0.31907
7,1.3745,1.457842,0.715857,0.328214,0.326805,0.310457
8,1.2479,1.385273,0.733272,0.356894,0.370347,0.34951
9,1.1304,1.325227,0.733272,0.358044,0.373017,0.35439
10,1.0317,1.287468,0.748854,0.400837,0.400018,0.382145


[I 2025-03-26 10:28:21,059] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0004916800693220012, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3939,2.858529,0.419798,0.069635,0.097244,0.075009
2,2.5211,2.162836,0.55912,0.203697,0.203198,0.190146
3,1.8924,1.694171,0.663611,0.318071,0.29881,0.282187
4,1.4478,1.437011,0.705775,0.340223,0.325755,0.309065
5,1.1528,1.311699,0.737855,0.40304,0.394174,0.373753
6,0.933,1.202207,0.745188,0.424536,0.397524,0.387185
7,0.7703,1.16264,0.749771,0.466777,0.441898,0.435244
8,0.6679,1.119958,0.759853,0.485712,0.469705,0.462099
9,0.5655,1.076307,0.765353,0.493125,0.472068,0.46775
10,0.4912,1.075393,0.766269,0.498145,0.475278,0.472348


[I 2025-03-26 10:29:36,237] Trial 116 finished with value: 0.49350096410753225 and parameters: {'learning_rate': 0.0004916800693220012, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 117 with params: {'learning_rate': 0.0004992299509401581, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3758,2.842826,0.422548,0.069045,0.098602,0.075809
2,2.5051,2.156591,0.551787,0.221227,0.19644,0.182852
3,1.8852,1.686922,0.664528,0.309956,0.292118,0.273914
4,1.4427,1.428576,0.699358,0.331414,0.317212,0.298908
5,1.1388,1.298303,0.729606,0.375654,0.377573,0.353403
6,0.9247,1.187511,0.747938,0.434147,0.413052,0.40258
7,0.7609,1.147887,0.75802,0.490556,0.461495,0.457366
8,0.6581,1.107538,0.761687,0.47698,0.460492,0.451546
9,0.5608,1.061445,0.769936,0.47503,0.480804,0.470651
10,0.4849,1.058895,0.772686,0.481667,0.486357,0.472574


[I 2025-03-26 10:30:51,774] Trial 117 finished with value: 0.4804248634263685 and parameters: {'learning_rate': 0.0004992299509401581, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 76 with value: 0.49512573819371475.


Trial 118 with params: {'learning_rate': 0.0004786243541734559, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4023,2.874341,0.416132,0.073266,0.09507,0.073949
2,2.5415,2.185417,0.557287,0.204342,0.201261,0.189003
3,1.9204,1.719289,0.661778,0.323261,0.295682,0.278947
4,1.4758,1.456142,0.703941,0.320664,0.327614,0.309269
5,1.1805,1.327179,0.736939,0.422487,0.394329,0.375017
6,0.9603,1.210731,0.745188,0.423344,0.403472,0.390289
7,0.7937,1.168683,0.749771,0.471726,0.432778,0.427525
8,0.6889,1.130394,0.761687,0.485004,0.465576,0.455702
9,0.5861,1.085466,0.761687,0.492553,0.470577,0.466042
10,0.5102,1.081427,0.766269,0.495043,0.479605,0.474287


[I 2025-03-26 10:32:06,337] Trial 118 finished with value: 0.49383561910564167 and parameters: {'learning_rate': 0.0004786243541734559, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 119 with params: {'learning_rate': 0.00040445359844411326, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4523,2.97383,0.383135,0.058315,0.081847,0.060415
2,2.6662,2.322635,0.517874,0.195563,0.157886,0.141917
3,2.0835,1.854123,0.648946,0.322769,0.27262,0.262265
4,1.6477,1.583,0.694775,0.327569,0.328603,0.309047
5,1.344,1.41474,0.722273,0.348121,0.362648,0.337914


[I 2025-03-26 10:32:31,492] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.00024478411375852566, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5783,3.230712,0.23648,0.057807,0.036983,0.029795
2,3.0082,2.73318,0.444546,0.102438,0.111399,0.083169
3,2.5659,2.337779,0.513291,0.161138,0.151182,0.132538
4,2.1939,2.040452,0.600367,0.265057,0.224762,0.211012
5,1.9179,1.815497,0.671861,0.34192,0.296111,0.288936
6,1.6637,1.641236,0.694775,0.372852,0.31762,0.309977
7,1.4678,1.522181,0.705775,0.354919,0.327958,0.314816
8,1.3402,1.440637,0.725023,0.351831,0.358146,0.338073
9,1.218,1.377206,0.726856,0.357916,0.36449,0.346889
10,1.1174,1.334143,0.742438,0.388714,0.392359,0.374888


[I 2025-03-26 10:33:24,074] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.00048378175700342476, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.399,2.86806,0.418882,0.072621,0.096599,0.075012
2,2.5333,2.176404,0.558203,0.204267,0.200698,0.188367
3,1.9093,1.709809,0.660862,0.318084,0.295444,0.278348
4,1.4645,1.448442,0.703941,0.318829,0.322993,0.304729
5,1.1694,1.320303,0.737855,0.423368,0.396147,0.37848
6,0.9498,1.206619,0.747021,0.431677,0.406304,0.394044
7,0.7845,1.16598,0.749771,0.472005,0.433129,0.427842
8,0.6808,1.125892,0.76077,0.484923,0.463909,0.455536
9,0.5781,1.081894,0.761687,0.489174,0.470577,0.466453
10,0.5027,1.079244,0.766269,0.499903,0.47879,0.474663


[I 2025-03-26 10:34:38,541] Trial 121 finished with value: 0.49274666962222324 and parameters: {'learning_rate': 0.00048378175700342476, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 122 with params: {'learning_rate': 0.0004856841372806926, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3977,2.865701,0.419798,0.072784,0.097244,0.075681
2,2.5302,2.173035,0.558203,0.204267,0.200698,0.188367
3,1.9052,1.706056,0.660862,0.318084,0.295444,0.278348
4,1.4603,1.445594,0.703941,0.318829,0.322993,0.304729
5,1.1653,1.318086,0.737855,0.423368,0.396147,0.37848
6,0.946,1.205174,0.747021,0.424959,0.397842,0.386679
7,0.7813,1.165063,0.747938,0.470808,0.43264,0.427022
8,0.6779,1.124496,0.76077,0.485034,0.463909,0.455596
9,0.5752,1.08086,0.762603,0.489714,0.470815,0.466947
10,0.5,1.078531,0.766269,0.500952,0.47879,0.475556


[I 2025-03-26 10:35:52,643] Trial 122 finished with value: 0.4939308180319757 and parameters: {'learning_rate': 0.0004856841372806926, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 123 with params: {'learning_rate': 0.0004503996218356186, 'weight_decay': 0.007, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4211,2.911178,0.404216,0.074356,0.088787,0.067151
2,2.5889,2.237041,0.535289,0.191229,0.170344,0.15534
3,1.9805,1.766498,0.653529,0.317197,0.2855,0.269348
4,1.54,1.501245,0.699358,0.319688,0.330074,0.310374
5,1.2386,1.362768,0.732356,0.403279,0.393547,0.376189
6,1.0153,1.230258,0.744271,0.427764,0.392761,0.377585
7,0.8441,1.184173,0.751604,0.468781,0.434866,0.427943
8,0.7344,1.142923,0.761687,0.478159,0.461078,0.450575
9,0.6325,1.093905,0.765353,0.472806,0.471586,0.461346
10,0.5539,1.087588,0.764436,0.488917,0.475105,0.468478


[I 2025-03-26 10:36:41,363] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.00047790473433313534, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4027,2.87519,0.416132,0.073324,0.09507,0.073998
2,2.5427,2.186659,0.557287,0.204342,0.201261,0.189003
3,1.9219,1.720538,0.661778,0.323261,0.295682,0.278947
4,1.4774,1.457179,0.703941,0.320664,0.327614,0.309269
5,1.1821,1.328163,0.736939,0.422487,0.394329,0.375017
6,0.9618,1.211097,0.745188,0.423344,0.403472,0.390289
7,0.795,1.168859,0.748854,0.47127,0.432324,0.427053
8,0.69,1.130732,0.761687,0.485004,0.465576,0.455702
9,0.5872,1.085546,0.76077,0.49233,0.470123,0.465686
10,0.5111,1.081478,0.768103,0.496111,0.48033,0.475565


[I 2025-03-26 10:37:56,028] Trial 124 finished with value: 0.49512573819371475 and parameters: {'learning_rate': 0.00047790473433313534, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 125 with params: {'learning_rate': 0.0003123229260555052, 'weight_decay': 0.007, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.509,3.101861,0.344638,0.069685,0.07069,0.057467
2,2.8415,2.531314,0.457379,0.118427,0.11534,0.084898
3,2.3308,2.095087,0.571036,0.244782,0.196157,0.185201
4,1.923,1.808116,0.665445,0.325007,0.290044,0.277391
5,1.6316,1.597876,0.715857,0.375561,0.352731,0.336322


[I 2025-03-26 10:38:21,190] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.00016493249354712237, 'weight_decay': 0.003, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6549,3.385263,0.176902,0.003538,0.02,0.006012
2,3.2179,3.001925,0.3978,0.096155,0.086424,0.065903
3,2.8799,2.67803,0.459212,0.103277,0.115855,0.092144
4,2.574,2.416856,0.515124,0.13586,0.151612,0.129451
5,2.3492,2.205925,0.568286,0.234598,0.192913,0.178294
6,2.1226,2.039168,0.597617,0.248448,0.222096,0.20789
7,1.9468,1.910382,0.63703,0.33395,0.261839,0.254531
8,1.8235,1.809226,0.671861,0.35187,0.291112,0.28071
9,1.6964,1.721316,0.686526,0.358326,0.305196,0.296959
10,1.5978,1.659485,0.696609,0.375801,0.32441,0.315528


[I 2025-03-26 10:39:15,002] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0004766896385689301, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4035,2.876704,0.415215,0.07327,0.094855,0.073856
2,2.5447,2.188869,0.557287,0.204342,0.201261,0.189003
3,1.9246,1.722777,0.660862,0.324082,0.294254,0.27699
4,1.4801,1.459115,0.704858,0.321851,0.328069,0.310056
5,1.1847,1.329719,0.736022,0.422646,0.393965,0.374772
6,0.9643,1.212033,0.746104,0.425314,0.403998,0.391065
7,0.7971,1.16961,0.749771,0.471361,0.432711,0.427201
8,0.6919,1.131784,0.761687,0.485114,0.465576,0.455763
9,0.5891,1.086402,0.76077,0.492386,0.470123,0.465707
10,0.5129,1.082123,0.765353,0.486008,0.475073,0.470392


[I 2025-03-26 10:40:04,261] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.000493278523740113, 'weight_decay': 0.008, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3797,2.851677,0.417049,0.069731,0.096733,0.074754
2,2.514,2.165776,0.549954,0.23191,0.193026,0.182253
3,1.8961,1.694144,0.666361,0.307872,0.292515,0.274638
4,1.4523,1.434355,0.703025,0.335061,0.322318,0.302572
5,1.1492,1.304049,0.731439,0.36731,0.377677,0.35254
6,0.9339,1.189078,0.750687,0.444403,0.413966,0.403576
7,0.7678,1.145568,0.759853,0.456534,0.450053,0.438433
8,0.6643,1.106674,0.761687,0.469371,0.460728,0.44986
9,0.5664,1.06092,0.769019,0.47329,0.474407,0.464983
10,0.4913,1.057953,0.773602,0.48518,0.486312,0.475717


[I 2025-03-26 10:41:18,373] Trial 128 finished with value: 0.48472228234703374 and parameters: {'learning_rate': 0.000493278523740113, 'weight_decay': 0.008, 'warmup_steps': 3}. Best is trial 76 with value: 0.49512573819371475.


Trial 129 with params: {'learning_rate': 0.0004549691178367605, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.418,2.905128,0.405133,0.07447,0.089026,0.067183
2,2.581,2.22841,0.539872,0.192726,0.176076,0.162951
3,1.9707,1.758725,0.654445,0.319288,0.287166,0.27152
4,1.5294,1.493522,0.699358,0.314411,0.320528,0.301436
5,1.2291,1.357176,0.731439,0.409601,0.39294,0.375134
6,1.006,1.227088,0.744271,0.430262,0.392761,0.378316
7,0.8354,1.181456,0.751604,0.472014,0.434703,0.427736
8,0.7264,1.141078,0.761687,0.477854,0.460967,0.449671
9,0.6243,1.091576,0.762603,0.472394,0.470458,0.460487
10,0.5461,1.085747,0.76352,0.49478,0.475001,0.470778


[I 2025-03-26 10:42:33,814] Trial 129 finished with value: 0.4823436487486139 and parameters: {'learning_rate': 0.0004549691178367605, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 76 with value: 0.49512573819371475.


Trial 130 with params: {'learning_rate': 0.00023807843458688047, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5844,3.242517,0.219982,0.057301,0.032374,0.024981
2,3.0249,2.754919,0.441797,0.102391,0.109703,0.083101
3,2.591,2.365017,0.509624,0.145469,0.145952,0.124114
4,2.2242,2.069152,0.587534,0.249377,0.213992,0.201064
5,1.9517,1.844462,0.659945,0.343028,0.289832,0.283872
6,1.6979,1.668775,0.692942,0.373803,0.315035,0.309227
7,1.5026,1.54805,0.700275,0.36333,0.322359,0.313557
8,1.3756,1.464579,0.722273,0.36274,0.355773,0.335587
9,1.2526,1.399572,0.722273,0.354188,0.358438,0.341383
10,1.1514,1.353789,0.736939,0.395128,0.387592,0.374138


[I 2025-03-26 10:43:23,537] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.00049949628158781, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.389,2.849451,0.419798,0.069605,0.097244,0.075081
2,2.51,2.150704,0.56187,0.204292,0.20482,0.191348
3,1.8768,1.678876,0.669111,0.314499,0.300366,0.282516
4,1.4309,1.425327,0.706691,0.332563,0.32872,0.311374
5,1.1351,1.302924,0.736022,0.401236,0.391414,0.371324
6,0.9155,1.195235,0.745188,0.426857,0.397452,0.387732
7,0.7562,1.156881,0.749771,0.487391,0.444946,0.440355
8,0.6549,1.114696,0.75802,0.487801,0.468737,0.462692
9,0.553,1.0686,0.768103,0.498785,0.479003,0.475628
10,0.4798,1.070531,0.765353,0.497727,0.473617,0.471398


[I 2025-03-26 10:44:13,156] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.0004675471848767979, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4095,2.888221,0.411549,0.07397,0.092682,0.071857
2,2.5599,2.205234,0.552704,0.194681,0.188624,0.175186
3,1.9444,1.738576,0.660862,0.320815,0.296369,0.27889
4,1.5009,1.47316,0.702108,0.319815,0.326341,0.308155
5,1.2039,1.341038,0.734189,0.401697,0.397139,0.378235
6,0.9821,1.218439,0.746104,0.430428,0.402634,0.389604
7,0.8129,1.174995,0.749771,0.461115,0.432348,0.425122
8,0.706,1.137654,0.761687,0.488717,0.462737,0.453068
9,0.6037,1.090633,0.761687,0.490293,0.470158,0.464413
10,0.5265,1.084952,0.764436,0.492305,0.475982,0.471981


[I 2025-03-26 10:45:26,822] Trial 132 finished with value: 0.49578714001038604 and parameters: {'learning_rate': 0.0004675471848767979, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 132 with value: 0.49578714001038604.


Trial 133 with params: {'learning_rate': 0.0004933610668245446, 'weight_decay': 0.008, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3929,2.856595,0.419798,0.069635,0.097244,0.075009
2,2.5187,2.160136,0.560037,0.2041,0.20415,0.190879
3,1.889,1.690896,0.663611,0.317747,0.29881,0.282198
4,1.4442,1.434391,0.705775,0.340223,0.325755,0.309065
5,1.149,1.309461,0.737855,0.403436,0.394174,0.373616
6,0.9293,1.20073,0.745188,0.424838,0.397524,0.3875
7,0.7672,1.16173,0.749771,0.466654,0.441898,0.435175
8,0.6652,1.118586,0.75802,0.484968,0.468977,0.46127
9,0.5628,1.074739,0.767186,0.498167,0.478568,0.47492
10,0.4888,1.074812,0.766269,0.498869,0.475278,0.472714


[I 2025-03-26 10:46:41,348] Trial 133 finished with value: 0.48285814794428317 and parameters: {'learning_rate': 0.0004933610668245446, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 132 with value: 0.49578714001038604.


Trial 134 with params: {'learning_rate': 0.00026163044533641407, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5626,3.20124,0.275894,0.052406,0.048997,0.040694
2,2.9666,2.680697,0.446379,0.088603,0.112215,0.083507
3,2.505,2.271494,0.540788,0.225197,0.17321,0.160181
4,2.1201,1.972983,0.618698,0.300965,0.253125,0.242533
5,1.8357,1.749437,0.683776,0.3426,0.308054,0.298627
6,1.5824,1.577621,0.706691,0.347171,0.330967,0.316112
7,1.3859,1.465477,0.714024,0.328567,0.324424,0.307413
8,1.2589,1.391637,0.732356,0.358871,0.36868,0.348624
9,1.1409,1.331168,0.730522,0.35551,0.370775,0.351909
10,1.0419,1.292896,0.747021,0.409533,0.397897,0.380615


[I 2025-03-26 10:47:31,132] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.00030066582407411676, 'weight_decay': 0.0, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4715,3.079023,0.36297,0.064193,0.075872,0.059196
2,2.8338,2.534681,0.458295,0.10248,0.115284,0.085854
3,2.3475,2.119664,0.566453,0.252127,0.195513,0.183027
4,1.9517,1.835075,0.660862,0.303987,0.290429,0.273408
5,1.6684,1.624766,0.704858,0.318372,0.3333,0.31136


[I 2025-03-26 10:47:55,632] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0004961775342162183, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3911,2.853329,0.419798,0.069746,0.097244,0.075138
2,2.5147,2.155771,0.560953,0.204655,0.204365,0.191268
3,1.8833,1.685282,0.664528,0.315859,0.297131,0.279914
4,1.438,1.430247,0.704858,0.34038,0.325688,0.309137
5,1.1427,1.306573,0.737855,0.403511,0.395986,0.376479
6,0.9231,1.198587,0.746104,0.427553,0.397978,0.388596
7,0.7622,1.159909,0.748854,0.466242,0.441683,0.434849
8,0.6606,1.116778,0.757104,0.484107,0.468522,0.460526
9,0.5585,1.072493,0.765353,0.497246,0.475613,0.471668
10,0.4848,1.073292,0.765353,0.498111,0.473584,0.47152


[I 2025-03-26 10:49:09,797] Trial 136 finished with value: 0.48265287481239383 and parameters: {'learning_rate': 0.0004961775342162183, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 132 with value: 0.49578714001038604.


Trial 137 with params: {'learning_rate': 0.00024563915235833295, 'weight_decay': 0.01, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5677,3.219443,0.252979,0.057361,0.041825,0.034175
2,2.9967,2.723921,0.437214,0.0815,0.105918,0.077144
3,2.5571,2.329564,0.511457,0.16003,0.150489,0.131474
4,2.1876,2.036695,0.598533,0.247818,0.218699,0.201192
5,1.913,1.814748,0.67461,0.338216,0.29542,0.286773


[I 2025-03-26 10:49:34,190] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.0004890656805517279, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3956,2.861651,0.419798,0.070584,0.097244,0.075237
2,2.525,2.167234,0.558203,0.203576,0.200698,0.18806
3,1.898,1.699414,0.662695,0.318927,0.29845,0.282437
4,1.4535,1.440872,0.704858,0.338758,0.324326,0.307207
5,1.1584,1.314498,0.738772,0.423492,0.396674,0.378865
6,0.9387,1.203434,0.746104,0.425223,0.397627,0.387535
7,0.7751,1.163663,0.747938,0.47395,0.44012,0.435552
8,0.6722,1.121691,0.759853,0.484269,0.463455,0.455437
9,0.5697,1.07843,0.76352,0.4924,0.471342,0.467045
10,0.4951,1.076642,0.766269,0.498016,0.475278,0.472287


[I 2025-03-26 10:50:48,561] Trial 138 finished with value: 0.4934719183378417 and parameters: {'learning_rate': 0.0004890656805517279, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 132 with value: 0.49578714001038604.


Trial 139 with params: {'learning_rate': 2.930023125468448e-05, 'weight_decay': 0.002, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8221,3.734516,0.184235,0.013499,0.022452,0.009866
2,3.6963,3.632925,0.181485,0.020224,0.02137,0.008572
3,3.6137,3.546848,0.179652,0.023548,0.020822,0.007605
4,3.5285,3.470025,0.189734,0.023591,0.023836,0.012525
5,3.4715,3.397675,0.221815,0.076119,0.033401,0.026167
6,3.3992,3.336673,0.283226,0.075469,0.051883,0.046804
7,3.3394,3.281438,0.337305,0.071495,0.067673,0.059487
8,3.2965,3.236298,0.371219,0.070144,0.077874,0.064581
9,3.2545,3.198483,0.378552,0.080542,0.079503,0.064352
10,3.223,3.166936,0.394134,0.078899,0.084417,0.067028


[I 2025-03-26 10:51:39,901] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.00010458865274842525, 'weight_decay': 0.0, 'warmup_steps': 1}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7074,3.514256,0.176902,0.003538,0.02,0.006012
2,3.3979,3.234144,0.331806,0.070146,0.066611,0.056455
3,3.1562,3.003327,0.409716,0.093559,0.088948,0.065923
4,2.9304,2.801327,0.442713,0.086981,0.104836,0.081401
5,2.7671,2.626106,0.473877,0.104787,0.123943,0.099372


[I 2025-03-26 10:52:04,468] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.00046777951373956557, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4094,2.887951,0.411549,0.07397,0.092682,0.071857
2,2.5596,2.204983,0.551787,0.194078,0.188098,0.174609
3,1.944,1.73829,0.659945,0.320383,0.295417,0.278249
4,1.5004,1.472903,0.702108,0.319815,0.326341,0.308155
5,1.2035,1.340961,0.734189,0.401697,0.397139,0.378235
6,0.9817,1.218353,0.746104,0.430472,0.402634,0.389645
7,0.8126,1.174815,0.749771,0.455503,0.432348,0.423172
8,0.7057,1.137512,0.762603,0.489216,0.463011,0.453536
9,0.6033,1.090465,0.76077,0.489813,0.469671,0.463929
10,0.5261,1.084877,0.764436,0.492305,0.475982,0.471981


[I 2025-03-26 10:53:19,813] Trial 141 finished with value: 0.4947309199146923 and parameters: {'learning_rate': 0.00046777951373956557, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 132 with value: 0.49578714001038604.


Trial 142 with params: {'learning_rate': 0.0002262274050533502, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5954,3.26393,0.19615,0.038621,0.025323,0.015322
2,3.0548,2.794199,0.430797,0.101511,0.103415,0.077352
3,2.6363,2.413028,0.500458,0.13623,0.14091,0.118284
4,2.2779,2.121135,0.573786,0.22323,0.200638,0.185455
5,2.012,1.896913,0.63703,0.319594,0.263962,0.251945


[I 2025-03-26 10:53:44,629] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.00048428307531783873, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3987,2.867469,0.419798,0.072784,0.097244,0.075681
2,2.5324,2.175511,0.558203,0.204267,0.200698,0.188367
3,1.9083,1.708814,0.660862,0.318084,0.295444,0.278348
4,1.4634,1.447688,0.703941,0.318829,0.322993,0.304729
5,1.1683,1.319729,0.737855,0.423368,0.396147,0.37848
6,0.9488,1.206169,0.747021,0.431677,0.406304,0.394044
7,0.7837,1.165707,0.749771,0.472005,0.433129,0.427842
8,0.6801,1.125183,0.76077,0.484923,0.463909,0.455536
9,0.5774,1.081793,0.761687,0.489174,0.470577,0.466453
10,0.502,1.07895,0.766269,0.500767,0.47879,0.475153


[I 2025-03-26 10:54:59,006] Trial 143 finished with value: 0.49252149568167924 and parameters: {'learning_rate': 0.00048428307531783873, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 132 with value: 0.49578714001038604.


Trial 144 with params: {'learning_rate': 0.000492396450081582, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3935,2.857728,0.419798,0.069635,0.097244,0.075009
2,2.52,2.161645,0.55912,0.203449,0.203198,0.190062
3,1.891,1.692889,0.663611,0.318287,0.29881,0.282355
4,1.4463,1.436004,0.705775,0.340223,0.325755,0.309065
5,1.1512,1.310771,0.737855,0.40304,0.394174,0.373753
6,0.9316,1.201804,0.745188,0.424773,0.397524,0.387342
7,0.7689,1.162169,0.748854,0.466084,0.44166,0.434867
8,0.6666,1.119345,0.75802,0.484909,0.469036,0.461323
9,0.5643,1.075928,0.766269,0.493962,0.474568,0.470876
10,0.4902,1.074912,0.766269,0.498212,0.475278,0.472371


[I 2025-03-26 10:56:13,723] Trial 144 finished with value: 0.49392507044548145 and parameters: {'learning_rate': 0.000492396450081582, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 132 with value: 0.49578714001038604.


Trial 145 with params: {'learning_rate': 0.0004909740599334299, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3944,2.859439,0.419798,0.069788,0.097244,0.075065
2,2.5221,2.163976,0.55912,0.20381,0.203198,0.190214
3,1.8939,1.695677,0.661778,0.317476,0.296783,0.280638
4,1.4494,1.438079,0.705775,0.340143,0.325755,0.309033
5,1.1542,1.312299,0.737855,0.402768,0.394174,0.373626
6,0.9346,1.202417,0.745188,0.424952,0.397524,0.387323
7,0.7715,1.162883,0.748854,0.474008,0.440335,0.435657
8,0.669,1.1205,0.758937,0.483404,0.464452,0.455797
9,0.5665,1.077065,0.765353,0.493009,0.472068,0.467673
10,0.4922,1.075704,0.766269,0.498145,0.475278,0.472348


[I 2025-03-26 10:57:30,527] Trial 145 finished with value: 0.4928190274687641 and parameters: {'learning_rate': 0.0004909740599334299, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 132 with value: 0.49578714001038604.


Trial 146 with params: {'learning_rate': 0.00027227806674792895, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5527,3.180876,0.293309,0.072405,0.054394,0.045248
2,2.9394,2.646488,0.446379,0.088192,0.111638,0.082222
3,2.4654,2.230405,0.549038,0.221097,0.180269,0.165922
4,2.0742,1.932677,0.626031,0.290212,0.25924,0.246193
5,1.7868,1.71162,0.693859,0.35072,0.319279,0.307603


[I 2025-03-26 10:57:54,962] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.0004985795475783355, 'weight_decay': 0.01, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3896,2.850571,0.419798,0.069605,0.097244,0.075081
2,2.5113,2.152169,0.56187,0.204372,0.20482,0.191388
3,1.8786,1.680641,0.667278,0.313959,0.298973,0.281214
4,1.4328,1.426522,0.705775,0.340553,0.327291,0.309842
5,1.1371,1.303748,0.736939,0.40196,0.392748,0.372296
6,0.9176,1.196183,0.746104,0.427335,0.397978,0.388413
7,0.7578,1.157718,0.749771,0.487946,0.444982,0.440703
8,0.6564,1.115189,0.75802,0.487962,0.468737,0.462821
9,0.5545,1.069561,0.767186,0.498497,0.478639,0.4753
10,0.4812,1.071465,0.765353,0.498025,0.473617,0.471556


[I 2025-03-26 10:59:10,706] Trial 147 finished with value: 0.4858513231737946 and parameters: {'learning_rate': 0.0004985795475783355, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 132 with value: 0.49578714001038604.


Trial 148 with params: {'learning_rate': 2.579669642889317e-05, 'weight_decay': 0.003, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8364,3.755167,0.182401,0.032313,0.022255,0.010211
2,3.7191,3.660502,0.181485,0.012668,0.02137,0.00845
3,3.6438,3.582736,0.179652,0.018551,0.020822,0.00759
4,3.5665,3.51379,0.182401,0.023564,0.021644,0.009088
5,3.5152,3.447174,0.191567,0.043607,0.024312,0.013479
6,3.4495,3.392573,0.217232,0.075407,0.031906,0.024579
7,3.3958,3.341892,0.28506,0.07592,0.051913,0.047296
8,3.3568,3.299406,0.326306,0.072226,0.064017,0.057151
9,3.3181,3.26444,0.343721,0.068664,0.069079,0.060126
10,3.2894,3.235104,0.366636,0.071101,0.076432,0.064504


[I 2025-03-26 11:00:00,073] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.00032676157417641373, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4977,3.079931,0.35472,0.06749,0.073719,0.058914
2,2.8104,2.494599,0.466544,0.141277,0.123796,0.098519
3,2.2867,2.051042,0.582035,0.274789,0.218098,0.209279
4,1.8737,1.769104,0.670944,0.314039,0.293602,0.279259
5,1.581,1.561745,0.716774,0.371025,0.353825,0.335868


[I 2025-03-26 11:00:25,314] Trial 149 pruned. 


In [15]:
print(best_trial_normal)

BestRun(run_id='132', objective=0.49578714001038604, hyperparameters={'learning_rate': 0.0004675471848767979, 'weight_decay': 0.01, 'warmup_steps': 4}, run_summary=None)


In [16]:
base.reset_seed()

## Prohledávání s destilací nad původním datasetem
Konfigurace jednotlivých tréninků.

In [17]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bert-distill_fine_hp-search", logging_dir=f"~/logs/{DATASET}/bert-distill_fine_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.


In [18]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 5e-4, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [19]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [20]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_Bert(),
)
  

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Nastavení prohledávání.

In [21]:
best_trial_distill = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distilation",
    n_trials=150
)

[I 2025-03-26 11:00:25,776] A new study created in memory with name: Distilation


Trial 0 with params: {'learning_rate': 4.3284502212938785e-05, 'weight_decay': 0.01, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4178,2.340572,0.182401,0.013581,0.021644,0.008897
2,2.3129,2.253278,0.179652,0.023548,0.020822,0.007605
3,2.2386,2.177167,0.189734,0.063584,0.023705,0.012673
4,2.1691,2.110003,0.328139,0.072231,0.064126,0.057706
5,2.1133,2.051637,0.387718,0.079887,0.08169,0.065101
6,2.0545,2.001357,0.40055,0.07574,0.085787,0.064986
7,2.0063,1.956808,0.410632,0.093163,0.089287,0.067112
8,1.9677,1.917731,0.429881,0.090969,0.098607,0.077087
9,1.9311,1.884372,0.439047,0.087489,0.103874,0.081015
10,1.9035,1.857554,0.448213,0.10583,0.108713,0.085917


[I 2025-03-26 11:01:18,897] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 1.8408992080552506e-05, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4491,2.405171,0.132906,0.009243,0.035965,0.008626
2,2.3911,2.35234,0.183318,0.015282,0.022178,0.009776
3,2.3501,2.313332,0.181485,0.013577,0.02137,0.008479
4,2.3151,2.280843,0.181485,0.020231,0.02137,0.008582
5,2.2854,2.248202,0.179652,0.023554,0.020822,0.007615


[I 2025-03-26 11:01:44,177] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 1.0838581269344744e-05, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4596,2.429218,0.04308,0.009669,0.026341,0.006411
2,2.4228,2.394165,0.160403,0.009152,0.019518,0.009359
3,2.3936,2.364072,0.188818,0.024161,0.024002,0.012103
4,2.3687,2.340714,0.184235,0.015572,0.022452,0.010131
5,2.3472,2.320631,0.185151,0.015597,0.022466,0.010189
6,2.3279,2.30333,0.182401,0.01449,0.021644,0.008931
7,2.3137,2.288213,0.179652,0.018558,0.020822,0.007599
8,2.3004,2.275064,0.179652,0.018558,0.020822,0.007599
9,2.2886,2.263793,0.179652,0.018558,0.020822,0.007599
10,2.2803,2.254771,0.179652,0.023554,0.020822,0.007615


[I 2025-03-26 11:02:33,954] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 2.049268011541735e-05, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4446,2.396842,0.16132,0.010212,0.019622,0.009401
2,2.3819,2.341642,0.182401,0.010666,0.021644,0.008779
3,2.3388,2.299941,0.179652,0.018558,0.020822,0.007599
4,2.3007,2.264565,0.180568,0.019561,0.021096,0.008097
5,2.269,2.229339,0.179652,0.023554,0.020822,0.007615


[I 2025-03-26 11:02:59,647] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.00010952662748632558, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3508,2.220122,0.176902,0.003538,0.02,0.006012
2,2.1557,2.047526,0.364803,0.062614,0.075557,0.061409
3,2.0055,1.898824,0.420715,0.071078,0.093699,0.06964
4,1.8593,1.765021,0.461962,0.100274,0.11816,0.095519
5,1.7436,1.650736,0.495875,0.101965,0.138579,0.111908
6,1.628,1.55911,0.51879,0.183257,0.155762,0.135304
7,1.5385,1.487768,0.538038,0.193764,0.173843,0.158051
8,1.4726,1.42702,0.584785,0.246785,0.202487,0.187214
9,1.4081,1.375973,0.595784,0.244687,0.21279,0.199596
10,1.3546,1.337041,0.605866,0.243515,0.223661,0.206439


[I 2025-03-26 11:04:15,470] Trial 4 finished with value: 0.24066060153297905 and parameters: {'learning_rate': 0.00010952662748632558, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 4 with value: 0.24066060153297905.


Trial 5 with params: {'learning_rate': 0.0002157696745589684, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2832,2.086134,0.231897,0.056644,0.03516,0.026963
2,1.9699,1.804505,0.43538,0.081878,0.105958,0.07923
3,1.7145,1.567026,0.505041,0.140108,0.144831,0.121585
4,1.4902,1.385216,0.563703,0.197337,0.184623,0.164027
5,1.3236,1.248885,0.626948,0.264374,0.232544,0.217568
6,1.1738,1.145206,0.672777,0.265153,0.268516,0.250031
7,1.0611,1.068329,0.683776,0.266275,0.283067,0.262633
8,0.9826,1.012111,0.698442,0.284563,0.296401,0.272103
9,0.9119,0.972303,0.705775,0.294829,0.306633,0.284542
10,0.8503,0.939513,0.71769,0.308791,0.321317,0.300245


[I 2025-03-26 11:05:06,204] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.00010769622478263136, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3447,2.214878,0.176902,0.003538,0.02,0.006012
2,2.1531,2.046734,0.36572,0.062883,0.075749,0.061892
3,2.0068,1.902232,0.421632,0.07057,0.093484,0.06862
4,1.8639,1.770794,0.457379,0.100146,0.115887,0.091599
5,1.7501,1.658052,0.491292,0.101964,0.136546,0.110118
6,1.6364,1.567724,0.51604,0.181803,0.153561,0.13255
7,1.5483,1.497696,0.536205,0.189856,0.172273,0.15623
8,1.4834,1.437581,0.579285,0.247355,0.199462,0.183521
9,1.4202,1.38737,0.592117,0.244948,0.207488,0.191945
10,1.3673,1.348756,0.604033,0.241933,0.219446,0.20348


[I 2025-03-26 11:06:21,777] Trial 6 finished with value: 0.23679842199748216 and parameters: {'learning_rate': 0.00010769622478263136, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 4 with value: 0.24066060153297905.


Trial 7 with params: {'learning_rate': 0.000236288641842364, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.253,2.050192,0.306141,0.069261,0.056665,0.047826
2,1.926,1.75069,0.444546,0.081624,0.110171,0.080723
3,1.6575,1.509693,0.517874,0.18805,0.151446,0.130888
4,1.4273,1.328247,0.587534,0.205296,0.200035,0.178475
5,1.259,1.194351,0.654445,0.253844,0.252652,0.235116
6,1.1081,1.091788,0.679193,0.265087,0.274121,0.255957
7,0.9948,1.018273,0.695692,0.266289,0.289993,0.266849
8,0.9157,0.963571,0.706691,0.297169,0.305238,0.282413
9,0.8484,0.929126,0.710357,0.321894,0.313918,0.292715
10,0.787,0.898313,0.721357,0.314316,0.322957,0.299784


[I 2025-03-26 11:07:10,927] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 1.6119044727609182e-05, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4484,2.408619,0.120073,0.009018,0.034514,0.008578
2,2.397,2.360924,0.185151,0.015188,0.022906,0.010735
3,2.3599,2.32573,0.181485,0.012675,0.02137,0.00846
4,2.3287,2.296499,0.184235,0.016917,0.022192,0.009843
5,2.302,2.267933,0.180568,0.019564,0.021096,0.008101


[I 2025-03-26 11:07:36,658] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.00013353819088790598, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3363,2.186799,0.176902,0.003538,0.02,0.006012
2,2.1105,1.989481,0.39505,0.057949,0.084131,0.063933
3,1.9341,1.811707,0.448213,0.100223,0.111363,0.086988
4,1.7639,1.663582,0.494042,0.118812,0.137533,0.110771
5,1.6327,1.53857,0.526123,0.185006,0.165043,0.146854
6,1.5052,1.44135,0.557287,0.243346,0.186254,0.170935
7,1.4065,1.363968,0.582951,0.247709,0.205525,0.19285
8,1.334,1.301796,0.632447,0.286393,0.243904,0.234491
9,1.2645,1.248725,0.647113,0.288806,0.257147,0.246156
10,1.2066,1.209024,0.659945,0.283833,0.2682,0.254502


[I 2025-03-26 11:08:52,114] Trial 9 finished with value: 0.26150970153790776 and parameters: {'learning_rate': 0.00013353819088790598, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 9 with value: 0.26150970153790776.


Trial 10 with params: {'learning_rate': 0.00022653365944687691, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2882,2.085122,0.225481,0.077473,0.03378,0.026729
2,1.9618,1.788522,0.448213,0.102168,0.114176,0.087086
3,1.6931,1.541627,0.515124,0.164761,0.153381,0.133594
4,1.4614,1.35583,0.574702,0.201914,0.192353,0.171933
5,1.2898,1.216897,0.654445,0.26972,0.255174,0.241548
6,1.1365,1.111592,0.67736,0.268374,0.276495,0.258516
7,1.0214,1.036226,0.694775,0.278614,0.290291,0.270664
8,0.9431,0.981014,0.707608,0.291746,0.305156,0.280601
9,0.873,0.944546,0.711274,0.3059,0.313319,0.291473
10,0.8115,0.913108,0.71769,0.329446,0.321717,0.301011


[I 2025-03-26 11:10:07,551] Trial 10 finished with value: 0.3075680022838695 and parameters: {'learning_rate': 0.00022653365944687691, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 10 with value: 0.3075680022838695.


Trial 11 with params: {'learning_rate': 0.0001238942211364432, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.35,2.206977,0.176902,0.003538,0.02,0.006012
2,2.1339,2.017157,0.387718,0.061364,0.08217,0.064704
3,1.9671,1.850075,0.432631,0.065698,0.100685,0.075051
4,1.8049,1.705754,0.48396,0.101446,0.132508,0.106113
5,1.6783,1.583165,0.505958,0.127774,0.147325,0.121941
6,1.554,1.48737,0.538038,0.216887,0.168767,0.151593
7,1.4582,1.41196,0.56187,0.245601,0.189278,0.176221
8,1.3879,1.350164,0.603116,0.261635,0.222797,0.207815
9,1.3201,1.297256,0.632447,0.288024,0.242886,0.234187
10,1.2633,1.257276,0.648029,0.288201,0.256283,0.246275


[I 2025-03-26 11:11:24,140] Trial 11 finished with value: 0.25969440039476316 and parameters: {'learning_rate': 0.0001238942211364432, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 10 with value: 0.3075680022838695.


Trial 12 with params: {'learning_rate': 0.0004297370115180055, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1908,1.891071,0.417049,0.072386,0.095998,0.07486
2,1.6918,1.46597,0.540788,0.167815,0.173703,0.154061
3,1.314,1.16986,0.648946,0.239484,0.25238,0.232507
4,1.0425,1.004065,0.68286,0.264426,0.285225,0.260594
5,0.8567,0.90539,0.715857,0.316084,0.323639,0.300841
6,0.72,0.830315,0.722273,0.340203,0.337802,0.317957
7,0.6203,0.804056,0.727773,0.374812,0.351488,0.337476
8,0.5592,0.773652,0.747938,0.393343,0.388945,0.373866
9,0.5053,0.75308,0.75527,0.437854,0.40717,0.403307
10,0.4565,0.747204,0.748854,0.429512,0.402067,0.392221


[I 2025-03-26 11:12:40,567] Trial 12 finished with value: 0.4537973075064203 and parameters: {'learning_rate': 0.0004297370115180055, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 5.0}. Best is trial 12 with value: 0.4537973075064203.


Trial 13 with params: {'learning_rate': 0.0003287279860635089, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.232,1.977442,0.384968,0.063085,0.081534,0.064047
2,1.8093,1.598877,0.500458,0.126072,0.144453,0.120127
3,1.4705,1.316373,0.593034,0.226357,0.207701,0.191757
4,1.2105,1.139363,0.663611,0.259354,0.268539,0.251444
5,1.029,1.01666,0.694775,0.288821,0.294101,0.27062
6,0.8806,0.921803,0.712191,0.299406,0.312454,0.291786
7,0.7699,0.869387,0.710357,0.300381,0.313423,0.292577
8,0.6986,0.835389,0.727773,0.314749,0.341589,0.315834
9,0.6397,0.809206,0.730522,0.364781,0.352378,0.337784
10,0.5832,0.796446,0.744271,0.376778,0.369908,0.354173


[I 2025-03-26 11:13:55,273] Trial 13 finished with value: 0.3930004795924375 and parameters: {'learning_rate': 0.0003287279860635089, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 12 with value: 0.4537973075064203.


Trial 14 with params: {'learning_rate': 0.00014690077743882243, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3308,2.171867,0.176902,0.003538,0.02,0.006012
2,2.0885,1.960178,0.4033,0.054664,0.086638,0.063682
3,1.897,1.76756,0.454629,0.099905,0.116476,0.090907
4,1.7147,1.610514,0.502291,0.128261,0.144916,0.120919
5,1.5757,1.482816,0.549954,0.231415,0.183315,0.166147


[I 2025-03-26 11:14:20,826] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.00037789891348026413, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.203,1.924946,0.40055,0.074459,0.086372,0.064034
2,1.7439,1.527955,0.522456,0.181123,0.162966,0.145819
3,1.3878,1.241255,0.629698,0.227244,0.230616,0.211435
4,1.1235,1.074671,0.67736,0.266058,0.280077,0.259012
5,0.9411,0.961317,0.703941,0.288708,0.305741,0.281788
6,0.7981,0.871703,0.715857,0.331682,0.323188,0.304167
7,0.6913,0.834034,0.710357,0.306275,0.319193,0.298351
8,0.625,0.805097,0.739688,0.382933,0.368958,0.350424
9,0.5689,0.782214,0.740605,0.398314,0.384029,0.373404
10,0.516,0.771054,0.747938,0.41324,0.399741,0.388339


[I 2025-03-26 11:15:35,860] Trial 15 finished with value: 0.4238603320278003 and parameters: {'learning_rate': 0.00037789891348026413, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 5.5}. Best is trial 12 with value: 0.4537973075064203.


Trial 16 with params: {'learning_rate': 0.00043218919289894907, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1898,1.888986,0.417965,0.071218,0.096213,0.074753
2,1.6892,1.463129,0.540788,0.166682,0.173482,0.153322
3,1.3104,1.166918,0.649863,0.240158,0.252744,0.233149
4,1.0387,1.001279,0.684693,0.265087,0.286408,0.261446
5,0.853,0.903162,0.716774,0.316176,0.324457,0.300804
6,0.7168,0.828738,0.72319,0.361486,0.342802,0.326439
7,0.6174,0.803076,0.727773,0.362047,0.351748,0.337083
8,0.5564,0.772461,0.747021,0.394291,0.389921,0.375113
9,0.5027,0.751024,0.757104,0.434832,0.408837,0.403829
10,0.4537,0.74663,0.748854,0.430585,0.402067,0.392922


[I 2025-03-26 11:16:51,964] Trial 16 finished with value: 0.45265026045659323 and parameters: {'learning_rate': 0.00043218919289894907, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 3.5}. Best is trial 12 with value: 0.4537973075064203.


Trial 17 with params: {'learning_rate': 0.00020632163595140406, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2888,2.096345,0.207149,0.058288,0.028301,0.019587
2,1.9847,1.824282,0.428964,0.061831,0.10107,0.07372
3,1.7375,1.590943,0.498625,0.125411,0.141597,0.11768
4,1.5173,1.411305,0.560953,0.202674,0.183186,0.163048
5,1.3538,1.275693,0.613199,0.244613,0.220876,0.204115
6,1.2059,1.172037,0.665445,0.264205,0.262075,0.2448
7,1.0939,1.094072,0.67736,0.266649,0.278401,0.258072
8,1.0153,1.036958,0.694775,0.26467,0.293586,0.268487
9,0.9438,0.995043,0.702108,0.291887,0.303314,0.280814
10,0.8821,0.960888,0.710357,0.293098,0.310305,0.286986


[I 2025-03-26 11:18:08,686] Trial 17 finished with value: 0.2919364435836574 and parameters: {'learning_rate': 0.00020632163595140406, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 12 with value: 0.4537973075064203.


Trial 18 with params: {'learning_rate': 0.000462586328422914, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1857,1.866588,0.421632,0.065934,0.098534,0.074369
2,1.6567,1.419832,0.558203,0.2001,0.190231,0.16988
3,1.2618,1.129464,0.654445,0.255383,0.261069,0.240716
4,0.9934,0.969516,0.687443,0.259138,0.291344,0.267576
5,0.8121,0.880368,0.716774,0.327198,0.32805,0.305906
6,0.6827,0.814717,0.725023,0.378382,0.353075,0.341835
7,0.5858,0.793198,0.731439,0.402244,0.375111,0.363843
8,0.5257,0.760167,0.745188,0.402863,0.392513,0.37714
9,0.4715,0.742734,0.753437,0.460135,0.420536,0.4164
10,0.4227,0.729394,0.759853,0.472068,0.428736,0.426212


[I 2025-03-26 11:19:23,278] Trial 18 finished with value: 0.45947921384572155 and parameters: {'learning_rate': 0.000462586328422914, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 18 with value: 0.45947921384572155.


Trial 19 with params: {'learning_rate': 0.0004392564949386537, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1946,1.887239,0.416132,0.070492,0.095182,0.073007
2,1.6827,1.448741,0.555454,0.184139,0.187613,0.168167
3,1.2951,1.154936,0.650779,0.25618,0.255126,0.236275
4,1.0258,0.992144,0.693859,0.278319,0.300242,0.278692
5,0.8421,0.897182,0.716774,0.338521,0.327387,0.306282
6,0.7099,0.826725,0.722273,0.372297,0.344677,0.329236
7,0.6101,0.799507,0.733272,0.372214,0.364816,0.348089
8,0.5493,0.767925,0.753437,0.409311,0.396169,0.380251
9,0.4938,0.750237,0.753437,0.430352,0.408861,0.398882
10,0.4444,0.740099,0.756187,0.439644,0.413017,0.405397


[I 2025-03-26 11:20:41,232] Trial 19 finished with value: 0.445417028736355 and parameters: {'learning_rate': 0.0004392564949386537, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 18 with value: 0.45947921384572155.


Trial 20 with params: {'learning_rate': 0.000418311875401925, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2026,1.905834,0.4033,0.073732,0.088277,0.065812
2,1.7061,1.476558,0.540788,0.174127,0.172028,0.153064
3,1.3257,1.180012,0.651696,0.257067,0.253489,0.233793
4,1.0565,1.018214,0.684693,0.259493,0.287168,0.263807
5,0.8726,0.916705,0.713107,0.314899,0.319719,0.300048
6,0.7358,0.840112,0.72319,0.363084,0.338025,0.320886
7,0.6342,0.808881,0.727773,0.378891,0.350098,0.339506
8,0.5714,0.779014,0.751604,0.38668,0.392159,0.374386
9,0.5154,0.758218,0.747021,0.407715,0.38958,0.378649
10,0.4656,0.749496,0.749771,0.4388,0.406638,0.399811


[I 2025-03-26 11:21:57,330] Trial 20 finished with value: 0.4501210336948468 and parameters: {'learning_rate': 0.000418311875401925, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 18 with value: 0.45947921384572155.


Trial 21 with params: {'learning_rate': 0.0003962948462538969, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.211,1.924031,0.3978,0.074607,0.085978,0.063571
2,1.7307,1.504362,0.529789,0.170163,0.165481,0.146195
3,1.3587,1.20831,0.641613,0.22567,0.242872,0.220785
4,1.0911,1.044652,0.683776,0.271126,0.284644,0.264734
5,0.9078,0.937786,0.709441,0.299702,0.31169,0.289932
6,0.7665,0.852584,0.725023,0.357673,0.333842,0.31686
7,0.6609,0.815803,0.724106,0.355343,0.342654,0.3273
8,0.5958,0.787293,0.746104,0.393035,0.379415,0.363065
9,0.54,0.764593,0.747021,0.40895,0.386698,0.375912
10,0.4888,0.758265,0.749771,0.439012,0.399474,0.391057


[I 2025-03-26 11:22:48,702] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0004472927966194536, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.172,1.86119,0.423465,0.06321,0.099076,0.073623
2,1.6602,1.434152,0.552704,0.214218,0.189398,0.171931
3,1.2809,1.151746,0.647113,0.255225,0.25018,0.232757
4,1.0177,0.990915,0.696609,0.264238,0.293701,0.269662
5,0.8374,0.894149,0.714024,0.325997,0.325795,0.303925
6,0.7064,0.825621,0.718607,0.352011,0.336042,0.322043
7,0.6076,0.803365,0.732356,0.395217,0.372308,0.355438
8,0.5465,0.76949,0.752521,0.42838,0.399547,0.388971
9,0.4923,0.752295,0.747021,0.44562,0.413195,0.406179
10,0.4423,0.735858,0.756187,0.445012,0.418632,0.412858


[I 2025-03-26 11:24:05,095] Trial 22 finished with value: 0.44944255910489334 and parameters: {'learning_rate': 0.0004472927966194536, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 18 with value: 0.45947921384572155.


Trial 23 with params: {'learning_rate': 0.00012789838339807573, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3473,2.201322,0.176902,0.003538,0.02,0.006012
2,2.1264,2.007705,0.389551,0.059545,0.082818,0.06385
3,1.9552,1.835473,0.437214,0.082669,0.104127,0.078796
4,1.7891,1.689087,0.488543,0.099987,0.134972,0.10765
5,1.66,1.564951,0.510541,0.147847,0.150052,0.125313


[I 2025-03-26 11:24:29,949] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.00019834397434131536, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2988,2.109601,0.184235,0.040234,0.022012,0.009856
2,2.0012,1.843523,0.424381,0.061142,0.099268,0.072452
3,1.7587,1.612601,0.497709,0.145642,0.144688,0.123712
4,1.5414,1.434249,0.554537,0.220936,0.179438,0.160281
5,1.3804,1.299175,0.607699,0.243361,0.216878,0.200311
6,1.2332,1.194234,0.659945,0.263308,0.259061,0.242589
7,1.1214,1.115353,0.67461,0.269436,0.275963,0.257306
8,1.0427,1.057074,0.687443,0.256203,0.286874,0.26204
9,0.9699,1.012904,0.698442,0.293704,0.29934,0.277165
10,0.9079,0.977776,0.703941,0.295022,0.304544,0.281879


[I 2025-03-26 11:25:20,485] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0004548883029411888, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1805,1.869206,0.422548,0.066028,0.098579,0.074761
2,1.6649,1.438095,0.549038,0.196747,0.182228,0.163658
3,1.2788,1.142094,0.652612,0.240361,0.256435,0.234318
4,1.0063,0.978491,0.686526,0.271548,0.288104,0.262795
5,0.8217,0.884826,0.714024,0.299063,0.320099,0.295102
6,0.6903,0.816499,0.725023,0.36754,0.345978,0.329801
7,0.5935,0.797207,0.728689,0.363053,0.361625,0.344725
8,0.5343,0.763246,0.747938,0.41283,0.400175,0.38866
9,0.4822,0.745007,0.750687,0.428331,0.407578,0.401389
10,0.4324,0.73417,0.754354,0.438311,0.417913,0.40851


[I 2025-03-26 11:26:36,311] Trial 25 finished with value: 0.46453030960093045 and parameters: {'learning_rate': 0.0004548883029411888, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 25 with value: 0.46453030960093045.


Trial 26 with params: {'learning_rate': 1.7944110405218628e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4473,2.404163,0.137489,0.009553,0.036653,0.009047
2,2.3909,2.352963,0.184235,0.02065,0.022632,0.010578
3,2.3513,2.315023,0.182401,0.01449,0.021644,0.008931
4,2.3172,2.283427,0.181485,0.020231,0.02137,0.008582
5,2.2883,2.25174,0.179652,0.023554,0.020822,0.007615
6,2.2596,2.226571,0.179652,0.023554,0.020822,0.007615
7,2.2374,2.204318,0.182401,0.043567,0.021585,0.009045
8,2.2188,2.185188,0.192484,0.063604,0.024527,0.0139
9,2.2021,2.169127,0.216315,0.061468,0.031081,0.023561
10,2.1898,2.155197,0.231897,0.07311,0.035539,0.029775


[I 2025-03-26 11:27:26,048] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.00044102599076789074, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1863,1.881404,0.420715,0.068956,0.097719,0.07528
2,1.6797,1.452969,0.540788,0.168875,0.173741,0.153621
3,1.2979,1.157043,0.650779,0.238471,0.253763,0.233099
4,1.0257,0.992307,0.68561,0.265253,0.287452,0.262047
5,0.8405,0.895887,0.714024,0.301148,0.32221,0.297596
6,0.706,0.823649,0.725023,0.362307,0.344923,0.328486
7,0.6073,0.799512,0.730522,0.374015,0.356347,0.341218
8,0.5469,0.767921,0.747938,0.38442,0.391588,0.374352
9,0.494,0.745995,0.75802,0.430806,0.409052,0.403767
10,0.4446,0.741683,0.751604,0.43453,0.411795,0.402091


[I 2025-03-26 11:28:42,179] Trial 27 finished with value: 0.4606128626709426 and parameters: {'learning_rate': 0.00044102599076789074, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}. Best is trial 25 with value: 0.46453030960093045.


Trial 28 with params: {'learning_rate': 0.00020776571785130973, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2934,2.099753,0.197067,0.038919,0.025561,0.015675
2,1.9865,1.823851,0.432631,0.081443,0.104453,0.078275
3,1.7356,1.587795,0.502291,0.145512,0.146721,0.125266
4,1.5136,1.407041,0.562786,0.200022,0.184148,0.164613
5,1.3491,1.270774,0.618698,0.222184,0.220895,0.201096


[I 2025-03-26 11:29:07,818] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.00038182066544290374, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2012,1.921113,0.404216,0.074599,0.087662,0.065682
2,1.7388,1.522052,0.523373,0.18012,0.16343,0.146252
3,1.3812,1.23539,0.632447,0.228627,0.231707,0.213016
4,1.1172,1.069633,0.67736,0.264087,0.279986,0.258435
5,0.9348,0.95698,0.704858,0.288086,0.306195,0.282685
6,0.7925,0.868792,0.71494,0.331253,0.323231,0.304013
7,0.6862,0.831799,0.71494,0.335549,0.327907,0.311206
8,0.6201,0.802957,0.738772,0.380541,0.3697,0.349635
9,0.5644,0.774037,0.742438,0.397099,0.383242,0.372097
10,0.511,0.769407,0.747938,0.434773,0.403699,0.392604


[I 2025-03-26 11:30:25,213] Trial 29 finished with value: 0.42371368321589153 and parameters: {'learning_rate': 0.00038182066544290374, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 25 with value: 0.46453030960093045.


Trial 30 with params: {'learning_rate': 0.00035975819697717514, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2252,1.955004,0.389551,0.059045,0.083166,0.063044
2,1.7748,1.55581,0.513291,0.139393,0.152102,0.129238
3,1.4188,1.265274,0.615949,0.229215,0.225477,0.207387
4,1.1543,1.095396,0.672777,0.270379,0.276634,0.258244
5,0.9725,0.980502,0.700275,0.285668,0.301995,0.278059
6,0.8264,0.885691,0.718607,0.321769,0.324301,0.3044
7,0.716,0.840155,0.715857,0.32413,0.321757,0.302107
8,0.6478,0.811585,0.731439,0.381699,0.358099,0.339722
9,0.5903,0.785523,0.737855,0.403642,0.370082,0.361194
10,0.5361,0.774313,0.745188,0.403909,0.386471,0.374395


[I 2025-03-26 11:31:41,564] Trial 30 finished with value: 0.4034298617313803 and parameters: {'learning_rate': 0.00035975819697717514, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 25 with value: 0.46453030960093045.


Trial 31 with params: {'learning_rate': 0.00047466236594699297, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1725,1.851465,0.428048,0.064181,0.10202,0.075786
2,1.6429,1.415416,0.553621,0.200641,0.189111,0.170654
3,1.2513,1.121557,0.656279,0.238639,0.262938,0.240279
4,0.9797,0.958638,0.692026,0.276896,0.293033,0.271312
5,0.7975,0.872086,0.713107,0.308161,0.323691,0.298488
6,0.6711,0.809066,0.729606,0.383075,0.3608,0.349676
7,0.5744,0.789227,0.732356,0.364978,0.365385,0.348603
8,0.5162,0.757346,0.754354,0.425041,0.403851,0.392325
9,0.4648,0.733786,0.754354,0.456376,0.420232,0.417378
10,0.4156,0.731074,0.75527,0.45286,0.420965,0.415495


[I 2025-03-26 11:32:57,420] Trial 31 finished with value: 0.4493050948881866 and parameters: {'learning_rate': 0.00047466236594699297, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 3.5}. Best is trial 25 with value: 0.46453030960093045.


Trial 32 with params: {'learning_rate': 0.000380377126709603, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2097,1.9304,0.398717,0.075054,0.085919,0.064064
2,1.7451,1.526288,0.523373,0.162155,0.160622,0.140637
3,1.3843,1.235692,0.636114,0.226228,0.238455,0.218505
4,1.1189,1.070223,0.67736,0.262646,0.279606,0.259865
5,0.937,0.959278,0.705775,0.29043,0.307874,0.284627
6,0.7931,0.86913,0.719523,0.334589,0.327157,0.308267
7,0.6861,0.830187,0.714024,0.329659,0.328275,0.309498
8,0.6189,0.802728,0.739688,0.379554,0.3691,0.350495
9,0.5636,0.777424,0.740605,0.383869,0.380081,0.366879
10,0.5111,0.773821,0.744271,0.420919,0.392436,0.382606


[I 2025-03-26 11:34:14,381] Trial 32 finished with value: 0.41062904162092495 and parameters: {'learning_rate': 0.000380377126709603, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 4.5}. Best is trial 25 with value: 0.46453030960093045.


Trial 33 with params: {'learning_rate': 1.1404826981171539e-05, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4577,2.426316,0.053162,0.00851,0.027481,0.007073
2,2.4193,2.389725,0.170486,0.009995,0.021179,0.010224
3,2.3892,2.359286,0.188818,0.024987,0.024002,0.01207
4,2.3638,2.335363,0.185151,0.012196,0.022466,0.009967
5,2.342,2.314589,0.184235,0.015019,0.022192,0.009766


[I 2025-03-26 11:34:39,208] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0004874798943543435, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1766,1.847266,0.426214,0.063412,0.10219,0.075093
2,1.6323,1.394507,0.56187,0.196017,0.198681,0.176406
3,1.2305,1.103445,0.659945,0.275605,0.269421,0.248072
4,0.9609,0.94707,0.699358,0.285805,0.300731,0.279652
5,0.7824,0.86413,0.72044,0.330085,0.334483,0.31046
6,0.6572,0.799784,0.730522,0.390287,0.361467,0.348976
7,0.5624,0.776782,0.744271,0.397817,0.383109,0.371146
8,0.5024,0.751868,0.749771,0.421363,0.40034,0.387468
9,0.4485,0.731663,0.751604,0.453812,0.424699,0.422816
10,0.4007,0.721887,0.75802,0.488359,0.428984,0.430721


[I 2025-03-26 11:35:55,965] Trial 34 finished with value: 0.477880764070784 and parameters: {'learning_rate': 0.0004874798943543435, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 34 with value: 0.477880764070784.


Trial 35 with params: {'learning_rate': 0.000434755860227755, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1963,1.891267,0.412466,0.069622,0.092601,0.069958
2,1.6876,1.454554,0.554537,0.181866,0.187158,0.167953
3,1.3017,1.160066,0.651696,0.258254,0.255614,0.236689
4,1.0322,0.996988,0.691109,0.277818,0.297225,0.276462
5,0.848,0.901095,0.715857,0.347643,0.3269,0.305897
6,0.7149,0.82914,0.72044,0.35283,0.338738,0.321507
7,0.6149,0.801381,0.732356,0.373925,0.363808,0.348885
8,0.5541,0.770184,0.750687,0.388079,0.391659,0.374346
9,0.4984,0.75018,0.751604,0.432826,0.407458,0.399245
10,0.4489,0.742569,0.753437,0.434605,0.410649,0.401333


[I 2025-03-26 11:37:18,474] Trial 35 finished with value: 0.4406141811646433 and parameters: {'learning_rate': 0.000434755860227755, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 5.5}. Best is trial 34 with value: 0.477880764070784.


Trial 36 with params: {'learning_rate': 0.00011705694027089814, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3548,2.216873,0.176902,0.003538,0.02,0.006012
2,2.147,2.033761,0.373052,0.061497,0.07771,0.062359
3,1.9876,1.875219,0.428964,0.068742,0.098215,0.073838
4,1.8323,1.734769,0.472044,0.102375,0.125256,0.101006
5,1.71,1.615191,0.500458,0.124548,0.142194,0.116159
6,1.5891,1.520944,0.526123,0.191886,0.161682,0.142577
7,1.4959,1.447404,0.554537,0.226821,0.184925,0.171067
8,1.4276,1.386038,0.597617,0.263538,0.216999,0.20278
9,1.3613,1.333885,0.606783,0.244083,0.222574,0.207341
10,1.3057,1.294151,0.628781,0.28416,0.24036,0.230065


[I 2025-03-26 11:38:10,085] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0001809199150247622, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3132,2.131766,0.177819,0.023541,0.020238,0.006488
2,2.0314,1.882933,0.420715,0.067787,0.095397,0.071334
3,1.8045,1.662064,0.483043,0.122002,0.132667,0.107684
4,1.5965,1.488065,0.541705,0.181422,0.169587,0.15084
5,1.4419,1.355097,0.591201,0.249741,0.2066,0.190098


[I 2025-03-26 11:38:35,261] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 2.5689465631735298e-05, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4342,2.378001,0.184235,0.032125,0.022803,0.010698
2,2.3612,2.316641,0.183318,0.015251,0.021918,0.009373
3,2.3117,2.267148,0.179652,0.023551,0.020822,0.00761
4,2.266,2.224447,0.180568,0.023554,0.021096,0.008114
5,2.2294,2.184103,0.193401,0.063607,0.024801,0.014286


[I 2025-03-26 11:39:00,647] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.00038443212074493367, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2156,1.933989,0.394134,0.05552,0.084539,0.06245
2,1.7447,1.520418,0.523373,0.14977,0.158584,0.136791
3,1.3776,1.225996,0.638863,0.225606,0.241098,0.219918
4,1.1108,1.06028,0.68011,0.263422,0.280658,0.26102
5,0.9279,0.950645,0.707608,0.297284,0.309774,0.287472
6,0.7848,0.861737,0.725023,0.352051,0.332872,0.314293
7,0.6772,0.821974,0.718607,0.346005,0.332017,0.314312
8,0.6112,0.793476,0.743355,0.386522,0.375867,0.358656
9,0.5548,0.771351,0.742438,0.404894,0.381967,0.370915
10,0.5029,0.761024,0.746104,0.43648,0.395243,0.385944


[I 2025-03-26 11:40:21,110] Trial 39 finished with value: 0.43797198988477426 and parameters: {'learning_rate': 0.00038443212074493367, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 34 with value: 0.477880764070784.


Trial 40 with params: {'learning_rate': 0.0004814362072710081, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1563,1.826219,0.431714,0.061918,0.105335,0.075695
2,1.618,1.391582,0.563703,0.193835,0.195367,0.173983
3,1.2318,1.109537,0.658112,0.277792,0.265717,0.247984
4,0.9707,0.953432,0.705775,0.283389,0.302858,0.280987
5,0.7951,0.869039,0.716774,0.322268,0.331341,0.307084
6,0.669,0.807116,0.736022,0.383127,0.37274,0.357781
7,0.5737,0.783947,0.736022,0.423006,0.386977,0.375737
8,0.5143,0.75584,0.751604,0.42529,0.407329,0.397693
9,0.4592,0.733873,0.758937,0.464598,0.430672,0.428134
10,0.4102,0.718908,0.765353,0.487228,0.441516,0.440801


[I 2025-03-26 11:41:37,894] Trial 40 finished with value: 0.45796652775412455 and parameters: {'learning_rate': 0.0004814362072710081, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 34 with value: 0.477880764070784.


Trial 41 with params: {'learning_rate': 0.00045519276280906894, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1682,1.853031,0.427131,0.062891,0.101226,0.074704
2,1.6502,1.424091,0.555454,0.212484,0.190133,0.171544
3,1.2693,1.142262,0.648946,0.256841,0.255435,0.237589
4,1.0067,0.982499,0.696609,0.274735,0.295026,0.272515
5,0.8276,0.888039,0.712191,0.321951,0.324535,0.302549
6,0.6974,0.821811,0.721357,0.355696,0.342966,0.330527
7,0.5996,0.800122,0.733272,0.415135,0.378628,0.365383
8,0.539,0.767006,0.752521,0.428561,0.39947,0.389722
9,0.4848,0.749223,0.747938,0.446663,0.41384,0.407098
10,0.4348,0.729932,0.76077,0.447443,0.422012,0.415961


[I 2025-03-26 11:42:57,241] Trial 41 finished with value: 0.45104554763748517 and parameters: {'learning_rate': 0.00045519276280906894, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 34 with value: 0.477880764070784.


Trial 42 with params: {'learning_rate': 0.00038384402584336157, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2158,1.934471,0.394134,0.055755,0.084539,0.062515
2,1.7454,1.521201,0.523373,0.149639,0.158584,0.136718
3,1.3786,1.226918,0.638863,0.225606,0.241098,0.219918
4,1.1118,1.061089,0.68011,0.264198,0.280749,0.261352
5,0.9289,0.951298,0.707608,0.297377,0.309774,0.287407
6,0.7857,0.862255,0.725023,0.35215,0.332872,0.314354
7,0.678,0.822341,0.718607,0.346005,0.332017,0.314312
8,0.612,0.793836,0.743355,0.386662,0.375867,0.358727
9,0.5555,0.771698,0.742438,0.404894,0.381967,0.370915
10,0.5036,0.761283,0.746104,0.436331,0.395243,0.385854


[I 2025-03-26 11:44:08,090] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0004145660682858629, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1564,1.857194,0.425298,0.065865,0.101226,0.076108
2,1.6691,1.45255,0.549954,0.171137,0.1817,0.16214
3,1.3101,1.176145,0.640697,0.234039,0.247005,0.228847
4,1.0551,1.024117,0.687443,0.256002,0.288044,0.261855
5,0.8815,0.91854,0.71494,0.325611,0.324487,0.304604
6,0.7478,0.844291,0.728689,0.359775,0.34196,0.322211
7,0.6452,0.809543,0.732356,0.363602,0.360024,0.342268
8,0.5803,0.78345,0.749771,0.393488,0.386089,0.370478
9,0.5261,0.757816,0.748854,0.388545,0.390506,0.374964
10,0.4753,0.748762,0.752521,0.421607,0.406291,0.394754


[I 2025-03-26 11:45:25,702] Trial 43 finished with value: 0.4401191883308661 and parameters: {'learning_rate': 0.0004145660682858629, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 34 with value: 0.477880764070784.


Trial 44 with params: {'learning_rate': 0.00017665299926535667, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3158,2.1369,0.176902,0.003538,0.02,0.006012
2,2.0386,1.892572,0.416132,0.069527,0.092489,0.068115
3,1.8159,1.674679,0.48121,0.101488,0.130988,0.105118
4,1.6106,1.502152,0.534372,0.183519,0.163361,0.14373
5,1.4578,1.36988,0.585701,0.247331,0.203256,0.187324


[I 2025-03-26 11:45:51,260] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.00043371026242218253, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1892,1.887658,0.417965,0.071218,0.096213,0.074753
2,1.6875,1.461245,0.539872,0.166438,0.173244,0.15314
3,1.3082,1.165164,0.650779,0.240507,0.253107,0.233707
4,1.0364,0.999663,0.68561,0.265236,0.286646,0.261608
5,0.8507,0.901898,0.716774,0.316176,0.324457,0.300804
6,0.7149,0.827885,0.72319,0.361691,0.342802,0.326555
7,0.6156,0.802517,0.727773,0.362047,0.351748,0.337083
8,0.5547,0.771714,0.747021,0.391481,0.389921,0.374821
9,0.5011,0.750124,0.757104,0.434577,0.408837,0.403609
10,0.4522,0.746299,0.750687,0.432251,0.410162,0.400647


[I 2025-03-26 11:47:06,240] Trial 45 finished with value: 0.4531542858652594 and parameters: {'learning_rate': 0.00043371026242218253, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 34 with value: 0.477880764070784.


Trial 46 with params: {'learning_rate': 0.00039095110215088823, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1887,1.904867,0.409716,0.072649,0.09011,0.067552
2,1.7202,1.501025,0.532539,0.172161,0.165413,0.146941
3,1.3619,1.220115,0.630614,0.228267,0.228617,0.208732
4,1.1021,1.057592,0.68286,0.264542,0.281726,0.260124
5,0.9219,0.946479,0.709441,0.299353,0.311848,0.29043
6,0.7827,0.862032,0.721357,0.307246,0.326766,0.304724
7,0.6784,0.825827,0.719523,0.349117,0.336073,0.322776
8,0.6125,0.798821,0.743355,0.390592,0.375122,0.357758
9,0.5571,0.773699,0.744271,0.399689,0.387693,0.375926
10,0.5047,0.760993,0.752521,0.443768,0.407387,0.398308


[I 2025-03-26 11:48:22,499] Trial 46 finished with value: 0.43379607818078897 and parameters: {'learning_rate': 0.00039095110215088823, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 34 with value: 0.477880764070784.


Trial 47 with params: {'learning_rate': 1.0393235223774966e-05, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4577,2.428191,0.048579,0.01041,0.027133,0.00727
2,2.4226,2.394848,0.159487,0.008976,0.019414,0.009358
3,2.3948,2.365916,0.187901,0.017968,0.023728,0.011713
4,2.371,2.34343,0.186068,0.016623,0.023,0.010889
5,2.3502,2.324186,0.186068,0.015369,0.02274,0.010552
6,2.3316,2.307625,0.184235,0.015019,0.022192,0.009766
7,2.3181,2.293196,0.181485,0.017853,0.02137,0.008555
8,2.3053,2.280615,0.179652,0.018558,0.020822,0.007599
9,2.2941,2.269839,0.179652,0.018558,0.020822,0.007599
10,2.286,2.261192,0.179652,0.018558,0.020822,0.007599


[I 2025-03-26 11:49:13,217] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.00029186640119552315, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2434,2.00939,0.366636,0.066764,0.075679,0.061855
2,1.8554,1.657168,0.485793,0.143394,0.139134,0.116632
3,1.5404,1.386186,0.562786,0.197118,0.18355,0.163954
4,1.2871,1.198571,0.648946,0.247887,0.255849,0.237963
5,1.104,1.06676,0.686526,0.266311,0.28828,0.264721
6,0.9533,0.969441,0.700275,0.295525,0.298195,0.277076
7,0.8398,0.91017,0.705775,0.305522,0.302964,0.281198
8,0.7667,0.867523,0.724106,0.335834,0.333,0.310544
9,0.7043,0.842103,0.725023,0.301848,0.330761,0.307807
10,0.6462,0.825383,0.728689,0.328231,0.34518,0.325246


[I 2025-03-26 11:50:04,197] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.00020329934217119537, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.296,2.10438,0.188818,0.036927,0.023275,0.011991
2,1.9934,1.833149,0.428964,0.081171,0.101624,0.074767
3,1.7465,1.59943,0.500458,0.145736,0.146052,0.124933
4,1.5267,1.41982,0.55912,0.201405,0.181821,0.161608
5,1.3639,1.284145,0.611366,0.221903,0.216584,0.196499


[I 2025-03-26 11:50:31,354] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0004984056251371117, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1728,1.839363,0.429881,0.062314,0.103629,0.075018
2,1.6221,1.384496,0.566453,0.196401,0.201108,0.178062
3,1.2176,1.092209,0.665445,0.282302,0.27694,0.256517
4,0.9464,0.936362,0.703025,0.296892,0.303696,0.284288
5,0.77,0.858928,0.722273,0.33293,0.337656,0.311366
6,0.65,0.800029,0.736939,0.417295,0.37497,0.367644
7,0.5542,0.772977,0.739688,0.386694,0.379197,0.36535
8,0.4956,0.747226,0.750687,0.427675,0.401563,0.39068
9,0.4414,0.727939,0.75527,0.447733,0.427852,0.424562
10,0.3936,0.719489,0.756187,0.490555,0.42831,0.430775


[I 2025-03-26 11:51:46,205] Trial 50 finished with value: 0.4716905893774584 and parameters: {'learning_rate': 0.0004984056251371117, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 34 with value: 0.477880764070784.


Trial 51 with params: {'learning_rate': 0.00036432523448613356, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2234,1.951083,0.389551,0.057345,0.083166,0.062464
2,1.7692,1.549114,0.51604,0.139859,0.15378,0.130468
3,1.411,1.257814,0.624198,0.2281,0.230673,0.211984
4,1.1459,1.088727,0.673694,0.26911,0.277295,0.258379
5,0.964,0.974924,0.703025,0.28852,0.304643,0.281088
6,0.8183,0.880777,0.72044,0.323095,0.325401,0.305687
7,0.7083,0.836318,0.714024,0.32329,0.321016,0.301267
8,0.6405,0.807957,0.735105,0.378136,0.361233,0.34318
9,0.5832,0.783036,0.740605,0.406953,0.380244,0.371371
10,0.5295,0.771985,0.744271,0.422603,0.392196,0.383683


[I 2025-03-26 11:53:00,986] Trial 51 finished with value: 0.40844618305339947 and parameters: {'learning_rate': 0.00036432523448613356, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 34 with value: 0.477880764070784.


Trial 52 with params: {'learning_rate': 0.0004624549625399216, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1857,1.866678,0.421632,0.065934,0.098534,0.074369
2,1.6568,1.419931,0.558203,0.199942,0.190231,0.169767
3,1.262,1.129563,0.654445,0.255383,0.261069,0.240716
4,0.9936,0.969545,0.687443,0.259138,0.291344,0.267576
5,0.8123,0.880426,0.716774,0.327198,0.32805,0.305906
6,0.6828,0.814743,0.725023,0.378964,0.353075,0.341774
7,0.586,0.793365,0.730522,0.402022,0.372611,0.360153
8,0.5259,0.76019,0.745188,0.402863,0.392513,0.37714
9,0.4716,0.742815,0.753437,0.460135,0.420536,0.4164
10,0.4228,0.729546,0.759853,0.472068,0.428736,0.426212


[I 2025-03-26 11:54:18,950] Trial 52 finished with value: 0.45947921384572155 and parameters: {'learning_rate': 0.0004624549625399216, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 34 with value: 0.477880764070784.


Trial 53 with params: {'learning_rate': 0.0004889391046203257, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1761,1.84624,0.426214,0.063409,0.10219,0.075088
2,1.6309,1.39314,0.562786,0.196374,0.199702,0.17735
3,1.2288,1.10191,0.659945,0.275657,0.269421,0.248116
4,0.959,0.945508,0.699358,0.285805,0.300731,0.279652
5,0.7806,0.862796,0.719523,0.328711,0.334379,0.309952
6,0.6564,0.800811,0.731439,0.38698,0.368738,0.357218
7,0.561,0.776865,0.739688,0.391131,0.381489,0.370372
8,0.5019,0.750277,0.748854,0.412805,0.398726,0.388201
9,0.4467,0.732087,0.748854,0.444092,0.423652,0.420636
10,0.3998,0.722668,0.75527,0.479724,0.426781,0.427231


[I 2025-03-26 11:55:36,764] Trial 53 finished with value: 0.47445697659191544 and parameters: {'learning_rate': 0.0004889391046203257, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 34 with value: 0.477880764070784.


Trial 54 with params: {'learning_rate': 0.0004535023095910422, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1891,1.87448,0.419798,0.065794,0.096791,0.073268
2,1.6665,1.430508,0.557287,0.202396,0.190754,0.17136
3,1.2744,1.139259,0.651696,0.25545,0.257846,0.238824
4,1.0058,0.977899,0.688359,0.259811,0.291539,0.267905
5,0.8236,0.886077,0.716774,0.328779,0.329502,0.307922
6,0.6931,0.81938,0.718607,0.369503,0.344388,0.329535
7,0.5954,0.797056,0.730522,0.394328,0.367992,0.353833
8,0.535,0.762423,0.749771,0.406169,0.39376,0.37842
9,0.4803,0.747948,0.748854,0.451526,0.417177,0.411421
10,0.4312,0.734838,0.758937,0.469727,0.42692,0.425016


[I 2025-03-26 11:56:53,146] Trial 54 finished with value: 0.45172252995634765 and parameters: {'learning_rate': 0.0004535023095910422, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 34 with value: 0.477880764070784.


Trial 55 with params: {'learning_rate': 1.3699906998412503e-05, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4549,2.419359,0.08341,0.008102,0.030199,0.007795
2,2.4096,2.37621,0.182401,0.009993,0.022335,0.009984
3,2.3752,2.343427,0.184235,0.011439,0.022362,0.009865
4,2.347,2.316581,0.184235,0.015023,0.022192,0.009771
5,2.3227,2.292089,0.181485,0.020231,0.02137,0.008582
6,2.2993,2.271177,0.179652,0.018561,0.020822,0.007604
7,2.2817,2.252937,0.179652,0.023554,0.020822,0.007615
8,2.2662,2.237278,0.179652,0.023554,0.020822,0.007615
9,2.2523,2.223956,0.179652,0.023554,0.020822,0.007615
10,2.2427,2.213177,0.182401,0.043564,0.021585,0.00904


[I 2025-03-26 11:57:44,726] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.00041958991272579696, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1948,1.899458,0.411549,0.071307,0.093269,0.072151
2,1.7028,1.478029,0.536205,0.163876,0.1703,0.15002
3,1.3284,1.182062,0.647113,0.238582,0.250765,0.231377
4,1.0577,1.016247,0.68561,0.267946,0.287099,0.263879
5,0.8721,0.914882,0.716774,0.317071,0.321308,0.299572
6,0.7337,0.837084,0.724106,0.340943,0.336075,0.315853
7,0.6325,0.808259,0.724106,0.364455,0.342961,0.327566
8,0.5706,0.778526,0.747938,0.397399,0.386035,0.37075
9,0.5163,0.754206,0.753437,0.438733,0.401241,0.395789
10,0.467,0.755742,0.747938,0.443193,0.398353,0.39023


[I 2025-03-26 11:58:34,278] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00015629272253669425, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3064,2.144367,0.176902,0.003538,0.02,0.006012
2,2.0593,1.928741,0.409716,0.051704,0.088292,0.063031
3,1.8631,1.731618,0.459212,0.101842,0.118315,0.091729
4,1.6754,1.570262,0.516957,0.205508,0.156537,0.138031
5,1.5342,1.443634,0.563703,0.228332,0.190213,0.172615
6,1.4006,1.345307,0.593951,0.239781,0.211756,0.196676
7,1.2969,1.265622,0.636114,0.268989,0.241517,0.229652
8,1.2213,1.204177,0.660862,0.261835,0.265686,0.247986
9,1.1503,1.153292,0.670027,0.264481,0.275871,0.254782
10,1.0899,1.114517,0.681943,0.264543,0.283347,0.260827


[I 2025-03-26 11:59:24,663] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.00047869201160203576, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1574,1.828676,0.428964,0.061616,0.1034,0.074759
2,1.621,1.394686,0.560953,0.193006,0.193865,0.171784
3,1.2355,1.112935,0.658112,0.277792,0.265717,0.247984
4,0.9745,0.956164,0.703941,0.282585,0.300645,0.278906
5,0.7986,0.87097,0.71769,0.322816,0.331579,0.307659
6,0.6719,0.80886,0.736022,0.383088,0.37177,0.357542
7,0.5763,0.784995,0.736022,0.423006,0.386977,0.375737
8,0.5169,0.75972,0.750687,0.426792,0.407037,0.397615
9,0.4617,0.735847,0.75802,0.464716,0.426107,0.421846
10,0.4123,0.720817,0.765353,0.487034,0.440451,0.440003


[I 2025-03-26 12:00:46,276] Trial 58 finished with value: 0.45782043827717417 and parameters: {'learning_rate': 0.00047869201160203576, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}. Best is trial 34 with value: 0.477880764070784.


Trial 59 with params: {'learning_rate': 0.00046329107000868657, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1854,1.866012,0.421632,0.065934,0.098534,0.074369
2,1.6559,1.419062,0.558203,0.2001,0.190231,0.16988
3,1.2609,1.12871,0.654445,0.255383,0.261069,0.240716
4,0.9924,0.968815,0.687443,0.259138,0.291344,0.267576
5,0.8112,0.879793,0.71769,0.327652,0.328959,0.306518
6,0.6819,0.814236,0.724106,0.37803,0.352122,0.341188
7,0.5851,0.792758,0.731439,0.400816,0.375111,0.363932
8,0.525,0.760029,0.745188,0.402863,0.392513,0.37714
9,0.4709,0.742267,0.753437,0.460884,0.420536,0.416904
10,0.4221,0.728953,0.76077,0.473854,0.429788,0.427727


[I 2025-03-26 12:02:04,084] Trial 59 finished with value: 0.45947921384572155 and parameters: {'learning_rate': 0.00046329107000868657, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 34 with value: 0.477880764070784.


Trial 60 with params: {'learning_rate': 4.608864757704483e-05, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4165,2.336309,0.181485,0.013574,0.02137,0.008474
2,2.3067,2.244493,0.178735,0.023545,0.020548,0.007089
3,2.2284,2.164548,0.20165,0.059635,0.026992,0.017953
4,2.1551,2.09371,0.344638,0.069284,0.068776,0.060514
5,2.0963,2.032597,0.391384,0.077739,0.082847,0.064592
6,2.0341,1.979052,0.407883,0.074458,0.087656,0.065314
7,1.9831,1.932265,0.415215,0.092397,0.09204,0.070769
8,1.9424,1.891068,0.437214,0.088528,0.102957,0.080986
9,1.9037,1.856319,0.444546,0.105743,0.106968,0.085396
10,1.8747,1.828419,0.460128,0.104562,0.116793,0.093935


[I 2025-03-26 12:02:55,683] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0002928184488683756, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2538,2.015555,0.36022,0.067849,0.073305,0.060406
2,1.8623,1.660786,0.479377,0.142534,0.135522,0.111132
3,1.5442,1.387946,0.562786,0.199337,0.184194,0.163674
4,1.2896,1.20124,0.649863,0.248002,0.253515,0.236808
5,1.1078,1.0697,0.686526,0.269934,0.28442,0.262661


[I 2025-03-26 12:03:20,686] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.000495313190572418, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1738,1.841542,0.428048,0.062617,0.103199,0.074993
2,1.6249,1.387151,0.56462,0.197668,0.200405,0.17776
3,1.2212,1.095313,0.662695,0.276004,0.271825,0.251533
4,0.9505,0.939162,0.702108,0.29681,0.303333,0.283886
5,0.7732,0.858735,0.721357,0.331038,0.336372,0.31034
6,0.6512,0.797212,0.733272,0.415557,0.370896,0.363321
7,0.556,0.773935,0.739688,0.389874,0.381839,0.369196
8,0.4971,0.748183,0.747938,0.415935,0.401381,0.387318
9,0.4426,0.72941,0.753437,0.447677,0.427008,0.424337
10,0.3949,0.718143,0.759853,0.492199,0.431915,0.435638


[I 2025-03-26 12:04:43,008] Trial 62 finished with value: 0.4771931368124554 and parameters: {'learning_rate': 0.000495313190572418, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 34 with value: 0.477880764070784.


Trial 63 with params: {'learning_rate': 0.00037289600544837765, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2128,1.936925,0.395967,0.055209,0.08482,0.06272
2,1.7536,1.535629,0.520623,0.159906,0.15842,0.138484
3,1.3952,1.246141,0.633364,0.225789,0.236,0.216449
4,1.1308,1.080297,0.67461,0.262514,0.278185,0.258827
5,0.9498,0.968144,0.704858,0.290935,0.307347,0.284737


[I 2025-03-26 12:05:09,026] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.00044567610748604626, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1921,1.881538,0.417965,0.07036,0.09593,0.073594
2,1.6754,1.440488,0.554537,0.182794,0.187512,0.167707
3,1.2858,1.147863,0.650779,0.257555,0.255509,0.236825
4,1.0168,0.985435,0.693859,0.277814,0.300276,0.278662
5,0.8337,0.891928,0.716774,0.339012,0.328528,0.307191
6,0.7023,0.823433,0.71769,0.3667,0.342839,0.327409
7,0.6033,0.797655,0.730522,0.375277,0.363688,0.346867
8,0.5426,0.765106,0.753437,0.40922,0.39532,0.380092
9,0.4871,0.747848,0.752521,0.435263,0.410975,0.401788
10,0.438,0.738207,0.757104,0.434425,0.414993,0.406565


[I 2025-03-26 12:06:26,714] Trial 64 finished with value: 0.44743856039786045 and parameters: {'learning_rate': 0.00044567610748604626, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 34 with value: 0.477880764070784.


Trial 65 with params: {'learning_rate': 0.0003309726189437743, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2369,1.97994,0.376719,0.063924,0.078707,0.062624
2,1.811,1.599152,0.499542,0.130577,0.145234,0.121818
3,1.4701,1.315083,0.593034,0.227175,0.206956,0.189263
4,1.209,1.137363,0.660862,0.263272,0.266151,0.250495
5,1.0265,1.014579,0.694775,0.289128,0.293555,0.270516


[I 2025-03-26 12:06:51,958] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0001855853030494943, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3105,2.126421,0.178735,0.023545,0.020476,0.006952
2,2.0238,1.872576,0.424381,0.065837,0.097547,0.072981
3,1.7924,1.648564,0.489459,0.141929,0.140238,0.118145
4,1.5815,1.473127,0.545371,0.160608,0.173147,0.153395
5,1.425,1.339413,0.594867,0.249737,0.208595,0.192411
6,1.2801,1.23545,0.648029,0.2684,0.25391,0.241298
7,1.1698,1.154757,0.664528,0.269045,0.269236,0.252107
8,1.0911,1.095045,0.681027,0.257689,0.283466,0.2604
9,1.0173,1.047684,0.692026,0.292792,0.293659,0.270204
10,0.9554,1.011853,0.700275,0.29465,0.298997,0.277404


[I 2025-03-26 12:07:41,798] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 1.5641639605857323e-05, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4514,2.41246,0.109074,0.009675,0.033271,0.00864
2,2.4009,2.365041,0.185151,0.014581,0.022906,0.010746
3,2.3639,2.330106,0.181485,0.011921,0.02137,0.008436
4,2.3331,2.301184,0.184235,0.015895,0.022192,0.009804
5,2.3067,2.273307,0.180568,0.019564,0.021096,0.008101


[I 2025-03-26 12:08:07,221] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.0003288544461073333, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2378,1.981822,0.375802,0.064074,0.078343,0.062505
2,1.8138,1.602539,0.494959,0.126084,0.141965,0.117553
3,1.4741,1.319078,0.589368,0.227303,0.204876,0.186935
4,1.2133,1.140606,0.659945,0.262655,0.265788,0.249915
5,1.0306,1.017271,0.694775,0.28913,0.293555,0.270626


[I 2025-03-26 12:08:32,579] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 1.460315612078236e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4536,2.416477,0.096242,0.007906,0.031649,0.008065
2,2.4058,2.37109,0.183318,0.013743,0.022699,0.010622
3,2.37,2.337292,0.183318,0.011839,0.021918,0.009219
4,2.3406,2.309474,0.185151,0.015594,0.022466,0.010184
5,2.3153,2.283392,0.180568,0.019564,0.021096,0.008101
6,2.2907,2.261451,0.179652,0.023558,0.020822,0.00762
7,2.272,2.242266,0.179652,0.023554,0.020822,0.007615
8,2.2559,2.225931,0.179652,0.023554,0.020822,0.007615
9,2.2414,2.212002,0.183318,0.043567,0.021859,0.009518
10,2.2313,2.200544,0.186068,0.043581,0.022681,0.01089


[I 2025-03-26 12:09:24,045] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00032990252154684437, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2373,1.980906,0.376719,0.063971,0.078707,0.062669
2,1.8124,1.600922,0.496792,0.128641,0.143372,0.119495
3,1.4721,1.317108,0.592117,0.228008,0.20643,0.188454
4,1.2112,1.139023,0.660862,0.262885,0.266151,0.250367
5,1.0286,1.01596,0.694775,0.289128,0.293555,0.270516
6,0.8805,0.920769,0.71494,0.300442,0.314686,0.293866
7,0.7683,0.867769,0.710357,0.300823,0.313893,0.293022
8,0.6978,0.834058,0.728689,0.335504,0.345569,0.323006
9,0.6386,0.809185,0.729606,0.362323,0.352161,0.338269
10,0.5821,0.79742,0.742438,0.37691,0.374023,0.357869


[I 2025-03-26 12:10:41,789] Trial 70 finished with value: 0.401640914843656 and parameters: {'learning_rate': 0.00032990252154684437, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 5.5}. Best is trial 34 with value: 0.477880764070784.


Trial 71 with params: {'learning_rate': 0.0004445390873396189, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1848,1.878325,0.421632,0.068412,0.097934,0.075186
2,1.676,1.449114,0.540788,0.169242,0.173741,0.153708
3,1.2931,1.153247,0.651696,0.240204,0.254677,0.233986
4,1.0207,0.988816,0.68561,0.2716,0.287452,0.262075
5,0.8357,0.893034,0.715857,0.30162,0.32306,0.298388
6,0.702,0.821889,0.72594,0.36262,0.3464,0.329386
7,0.6036,0.798627,0.730522,0.373524,0.356347,0.340908
8,0.5436,0.766506,0.749771,0.405474,0.398709,0.38516
9,0.4908,0.741712,0.75802,0.451205,0.412948,0.410603
10,0.4415,0.740133,0.752521,0.431201,0.413497,0.402247


[I 2025-03-26 12:11:56,782] Trial 71 finished with value: 0.46526531790863423 and parameters: {'learning_rate': 0.0004445390873396189, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 34 with value: 0.477880764070784.


Trial 72 with params: {'learning_rate': 0.0004701555596851597, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1743,1.8555,0.426214,0.063963,0.10073,0.075063
2,1.6475,1.420306,0.553621,0.199222,0.186531,0.167298
3,1.2573,1.126071,0.656279,0.239405,0.262665,0.24034
4,0.9853,0.962771,0.690192,0.273184,0.29084,0.266281
5,0.8024,0.875066,0.714024,0.299098,0.323795,0.29791
6,0.6752,0.810467,0.727773,0.381789,0.357154,0.344809
7,0.5786,0.791395,0.730522,0.365527,0.36454,0.347844
8,0.5203,0.758823,0.751604,0.414266,0.400541,0.389263
9,0.4689,0.738474,0.752521,0.445483,0.416157,0.411219
10,0.4191,0.729992,0.75527,0.440866,0.41789,0.410113


[I 2025-03-26 12:13:13,087] Trial 72 finished with value: 0.4505997422515422 and parameters: {'learning_rate': 0.0004701555596851597, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 4.0}. Best is trial 34 with value: 0.477880764070784.


Trial 73 with params: {'learning_rate': 0.0004852295070146293, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1775,1.848979,0.426214,0.0637,0.10219,0.075204
2,1.6345,1.396772,0.562786,0.196219,0.199327,0.176839
3,1.2333,1.105771,0.659945,0.275916,0.269421,0.248171
4,0.9639,0.949042,0.698442,0.282625,0.300276,0.278741
5,0.785,0.865706,0.719523,0.328371,0.332944,0.308984
6,0.6591,0.802592,0.728689,0.389988,0.360991,0.34858
7,0.5639,0.777704,0.741522,0.397945,0.382143,0.370428
8,0.5042,0.75091,0.750687,0.422264,0.400601,0.388918
9,0.4503,0.733683,0.749771,0.440943,0.419698,0.416172
10,0.4028,0.722948,0.759853,0.474623,0.429946,0.430395


[I 2025-03-26 12:14:31,287] Trial 73 finished with value: 0.4657949340930646 and parameters: {'learning_rate': 0.0004852295070146293, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 5.5}. Best is trial 34 with value: 0.477880764070784.


Trial 74 with params: {'learning_rate': 0.0002575917610207219, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2716,2.053063,0.300642,0.072294,0.055984,0.045537
2,1.913,1.723951,0.463795,0.125216,0.127881,0.102174
3,1.6187,1.463332,0.533456,0.18076,0.161648,0.142447
4,1.3747,1.275077,0.621448,0.250028,0.228518,0.210659
5,1.1958,1.136341,0.670944,0.260826,0.270102,0.250972
6,1.0428,1.036266,0.693859,0.294675,0.291937,0.271529
7,0.9264,0.966612,0.701192,0.276396,0.295605,0.273066
8,0.8502,0.91608,0.71769,0.319812,0.320001,0.299636
9,0.7835,0.887306,0.722273,0.309405,0.324285,0.302633
10,0.7239,0.862409,0.72319,0.31452,0.330037,0.309015


[I 2025-03-26 12:15:19,851] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00044096491957423665, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1939,1.885785,0.417049,0.070369,0.095285,0.073019
2,1.6809,1.446623,0.555454,0.183432,0.188039,0.168338
3,1.2927,1.153184,0.650779,0.257233,0.255126,0.236317
4,1.0235,0.990414,0.693859,0.278319,0.300242,0.278692
5,0.8399,0.895744,0.716774,0.338765,0.327387,0.306424
6,0.7079,0.825803,0.72044,0.37142,0.344881,0.328923
7,0.6083,0.798863,0.733272,0.372401,0.364816,0.348202
8,0.5474,0.767046,0.754354,0.410789,0.396272,0.38107
9,0.492,0.749765,0.754354,0.432624,0.409791,0.400189
10,0.4426,0.739527,0.75527,0.438691,0.412802,0.404782


[I 2025-03-26 12:16:35,841] Trial 75 finished with value: 0.4469811002520372 and parameters: {'learning_rate': 0.00044096491957423665, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 5.0}. Best is trial 34 with value: 0.477880764070784.


Trial 76 with params: {'learning_rate': 0.00037072626819241775, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2209,1.9456,0.390467,0.057022,0.08353,0.06238
2,1.7613,1.539838,0.51879,0.142585,0.155675,0.132827
3,1.4002,1.247558,0.629698,0.226317,0.233903,0.214977
4,1.1345,1.079582,0.67461,0.268614,0.277658,0.258238
5,0.9523,0.967116,0.705775,0.289766,0.305676,0.282191
6,0.8073,0.8743,0.72044,0.349812,0.327888,0.309857
7,0.6979,0.831281,0.71494,0.343042,0.327319,0.310476
8,0.6308,0.803076,0.739688,0.387486,0.367619,0.350473
9,0.5737,0.779466,0.740605,0.404847,0.380244,0.370108
10,0.5206,0.769038,0.745188,0.440791,0.394696,0.387005


[I 2025-03-26 12:17:51,485] Trial 76 finished with value: 0.4105720197763161 and parameters: {'learning_rate': 0.00037072626819241775, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 34 with value: 0.477880764070784.


Trial 77 with params: {'learning_rate': 0.00030927025673911044, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2406,1.994966,0.376719,0.066179,0.078447,0.063737
2,1.8357,1.630534,0.484876,0.133129,0.135314,0.109298
3,1.5085,1.353894,0.572869,0.22422,0.192125,0.175041
4,1.2516,1.171306,0.660862,0.263684,0.265932,0.24995
5,1.0697,1.043697,0.695692,0.268822,0.291886,0.267089


[I 2025-03-26 12:18:16,570] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.0003782288255961332, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1942,1.916187,0.404216,0.05303,0.08695,0.062947
2,1.7353,1.518004,0.525206,0.157194,0.159814,0.139467
3,1.3816,1.237794,0.618698,0.224819,0.221268,0.200766
4,1.1225,1.073492,0.679193,0.257322,0.280804,0.25892
5,0.9433,0.960569,0.707608,0.292768,0.305806,0.282588


[I 2025-03-26 12:18:41,325] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 1.1513610346634454e-05, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4586,2.426998,0.049496,0.008331,0.027066,0.0068
2,2.4198,2.389886,0.170486,0.010151,0.021179,0.010271
3,2.3891,2.359105,0.188818,0.025331,0.024002,0.01214
4,2.3635,2.334886,0.185151,0.012621,0.022466,0.010003
5,2.3414,2.313814,0.184235,0.015019,0.022192,0.009766
6,2.321,2.295636,0.180568,0.016898,0.021096,0.008075
7,2.306,2.279735,0.179652,0.018558,0.020822,0.007599
8,2.2921,2.265926,0.179652,0.018558,0.020822,0.007599
9,2.2798,2.254118,0.179652,0.023554,0.020822,0.007615
10,2.2712,2.24471,0.179652,0.023554,0.020822,0.007615


[I 2025-03-26 12:19:30,703] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0004196887749054982, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1947,1.899409,0.412466,0.071398,0.093484,0.072336
2,1.7027,1.477894,0.536205,0.163876,0.1703,0.15002
3,1.3283,1.181965,0.647113,0.238582,0.250765,0.231377
4,1.0576,1.016126,0.684693,0.26766,0.286995,0.263666
5,0.872,0.914851,0.716774,0.317071,0.321308,0.299572
6,0.7335,0.837032,0.72319,0.340777,0.335972,0.315703
7,0.6324,0.808215,0.724106,0.364455,0.342961,0.327566
8,0.5705,0.778492,0.747938,0.397399,0.386035,0.37075
9,0.5163,0.754201,0.753437,0.438733,0.401241,0.395789
10,0.467,0.755929,0.747938,0.443145,0.398353,0.390346


[I 2025-03-26 12:20:21,188] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.0003051276334935715, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2425,1.998784,0.374885,0.06641,0.077868,0.063481
2,1.8415,1.63753,0.483043,0.122271,0.1343,0.10773
3,1.5169,1.362218,0.571036,0.220756,0.191111,0.173552
4,1.2608,1.178632,0.658112,0.26044,0.260892,0.244031
5,1.0789,1.050051,0.695692,0.269107,0.291502,0.267137
6,0.9294,0.955273,0.706691,0.299427,0.304913,0.285382
7,0.8171,0.896798,0.708524,0.300792,0.306792,0.283562
8,0.7446,0.857282,0.727773,0.330706,0.335845,0.313065
9,0.6837,0.832626,0.724106,0.331982,0.336671,0.318016
10,0.6259,0.813794,0.742438,0.374016,0.364691,0.350127


[I 2025-03-26 12:21:11,636] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 4.2241048909514606e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4156,2.339908,0.182401,0.012815,0.021644,0.008869
2,2.3133,2.254847,0.179652,0.023548,0.020822,0.007605
3,2.241,2.180511,0.187901,0.063577,0.023157,0.011815
4,2.1732,2.115137,0.318973,0.072585,0.061195,0.055572
5,2.1189,2.05806,0.384051,0.080719,0.08076,0.065231


[I 2025-03-26 12:21:37,367] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0004960970701067248, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1735,1.840984,0.428964,0.062722,0.103414,0.07517
2,1.6242,1.386573,0.56462,0.195721,0.199984,0.176852
3,1.2203,1.094507,0.664528,0.276888,0.274851,0.253609
4,0.9494,0.938261,0.702108,0.296793,0.303333,0.2839
5,0.7724,0.860091,0.71769,0.321888,0.335645,0.30987
6,0.6518,0.800607,0.735105,0.409807,0.377628,0.370524
7,0.5561,0.778029,0.741522,0.383329,0.380153,0.366897
8,0.4989,0.749924,0.746104,0.420372,0.399479,0.386978
9,0.4426,0.731435,0.751604,0.452688,0.423436,0.420559
10,0.3954,0.723483,0.756187,0.481753,0.427137,0.429609


[I 2025-03-26 12:22:52,820] Trial 83 finished with value: 0.4673402544142266 and parameters: {'learning_rate': 0.0004960970701067248, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 34 with value: 0.477880764070784.


Trial 84 with params: {'learning_rate': 0.00034728821117664427, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.2, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.224,1.960494,0.390467,0.058968,0.08327,0.063423
2,1.7852,1.570951,0.509624,0.140158,0.150052,0.127636
3,1.437,1.284328,0.608616,0.228643,0.220382,0.202753
4,1.1746,1.112667,0.673694,0.261324,0.277108,0.257432
5,0.9938,0.995077,0.699358,0.289998,0.300404,0.277785


[I 2025-03-26 12:23:17,087] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 4.4166255288717016e-05, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4134,2.335785,0.181485,0.012668,0.02137,0.00845
2,2.3081,2.247884,0.179652,0.023548,0.020822,0.007605
3,2.2333,2.171139,0.195234,0.063604,0.025218,0.015205
4,2.163,2.103335,0.336389,0.070635,0.066405,0.05946
5,2.1068,2.044625,0.388634,0.079318,0.081905,0.064904
6,2.0472,1.993693,0.405133,0.055414,0.086622,0.064742
7,1.9985,1.948846,0.411549,0.092703,0.089932,0.067989
8,1.9595,1.909391,0.431714,0.090792,0.099468,0.077654
9,1.9225,1.875778,0.439047,0.087236,0.103874,0.080872
10,1.8946,1.848783,0.451879,0.105642,0.111302,0.088372


[I 2025-03-26 12:24:07,170] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.00048481023093695626, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1776,1.849294,0.425298,0.063474,0.101545,0.074792
2,1.6348,1.397211,0.563703,0.196798,0.199565,0.177244
3,1.2338,1.106173,0.659945,0.275916,0.269421,0.248171
4,0.9644,0.949381,0.698442,0.282625,0.300276,0.278741
5,0.7855,0.86606,0.719523,0.329761,0.332944,0.3094
6,0.6596,0.802889,0.728689,0.389988,0.360991,0.34858
7,0.5643,0.777709,0.740605,0.395826,0.381689,0.369773
8,0.5046,0.752066,0.748854,0.420728,0.399388,0.3874
9,0.451,0.732079,0.753437,0.446445,0.42462,0.422308
10,0.4026,0.723492,0.757104,0.472288,0.427002,0.428631


[I 2025-03-26 12:25:21,399] Trial 86 finished with value: 0.4778879458794155 and parameters: {'learning_rate': 0.00048481023093695626, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 87 with params: {'learning_rate': 0.0004385356626520977, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1948,1.887827,0.416132,0.070496,0.095182,0.073014
2,1.6834,1.449679,0.555454,0.184139,0.187613,0.168167
3,1.2962,1.155716,0.651696,0.256404,0.255614,0.236659
4,1.0268,0.992863,0.692942,0.277711,0.29929,0.278013
5,0.843,0.897802,0.716774,0.338766,0.327387,0.306585
6,0.7106,0.827064,0.721357,0.372158,0.34419,0.328964
7,0.6109,0.799885,0.732356,0.371815,0.364713,0.347756
8,0.55,0.768319,0.751604,0.408693,0.395611,0.379655
9,0.4945,0.750025,0.753437,0.432778,0.408861,0.399425
10,0.4451,0.740674,0.754354,0.438637,0.412348,0.404563


[I 2025-03-26 12:26:35,920] Trial 87 finished with value: 0.44731370533334447 and parameters: {'learning_rate': 0.0004385356626520977, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 88 with params: {'learning_rate': 0.000223991456731085, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2896,2.087601,0.219065,0.058389,0.031755,0.024005
2,1.9656,1.793663,0.447296,0.1023,0.113531,0.086959
3,1.6991,1.548022,0.514207,0.165438,0.153166,0.133618
4,1.4685,1.362709,0.572869,0.20109,0.190952,0.170549
5,1.2977,1.22402,0.651696,0.269407,0.253246,0.240595
6,1.1448,1.118508,0.676444,0.266764,0.273572,0.256279
7,1.0299,1.042669,0.692942,0.278009,0.289621,0.27002
8,0.9515,0.987197,0.703941,0.292117,0.302474,0.277975
9,0.8811,0.950016,0.710357,0.304869,0.311043,0.289813
10,0.8195,0.918177,0.718607,0.333005,0.323146,0.303606


[I 2025-03-26 12:27:25,868] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 6.961472074236449e-05, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.381,2.283038,0.176902,0.003538,0.02,0.006012
2,2.2404,2.160906,0.188818,0.043581,0.02349,0.012242
3,2.136,2.053872,0.374885,0.061083,0.077776,0.062172
4,2.0349,1.959246,0.407883,0.071903,0.087633,0.064025
5,1.9532,1.874346,0.437214,0.086495,0.102797,0.079488


[I 2025-03-26 12:27:51,221] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.00039680681549353884, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2108,1.92364,0.3978,0.074818,0.085978,0.063623
2,1.7302,1.5037,0.529789,0.170163,0.165481,0.146195
3,1.3579,1.207496,0.641613,0.22567,0.242872,0.220785
4,1.0902,1.043855,0.683776,0.271126,0.284644,0.264734
5,0.9068,0.937101,0.709441,0.299702,0.31169,0.289932
6,0.7656,0.852051,0.724106,0.357349,0.333478,0.316559
7,0.6601,0.815486,0.724106,0.354198,0.342654,0.326723
8,0.5951,0.786997,0.746104,0.392697,0.379415,0.362842
9,0.5393,0.76459,0.745188,0.408099,0.385756,0.37503
10,0.4883,0.758157,0.749771,0.439012,0.399474,0.391057


[I 2025-03-26 12:28:42,924] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.000398295002879009, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2102,1.922396,0.3978,0.074657,0.085978,0.063526
2,1.7284,1.501789,0.530706,0.17078,0.165969,0.146916
3,1.3556,1.205507,0.64253,0.225841,0.243825,0.221167
4,1.0879,1.042136,0.684693,0.271094,0.287144,0.265899
5,0.9045,0.93575,0.710357,0.300675,0.313119,0.291717
6,0.7636,0.851222,0.724106,0.357173,0.333478,0.316455
7,0.6583,0.814926,0.724106,0.354198,0.342654,0.326723
8,0.5934,0.78646,0.747021,0.393426,0.380749,0.363561
9,0.5376,0.763947,0.745188,0.408099,0.385756,0.37503
10,0.4867,0.757869,0.749771,0.458595,0.401843,0.395699


[I 2025-03-26 12:29:58,417] Trial 91 finished with value: 0.4409012067685775 and parameters: {'learning_rate': 0.000398295002879009, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 92 with params: {'learning_rate': 0.0003725588397131786, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2056,1.930469,0.398717,0.074848,0.085919,0.063965
2,1.7507,1.535696,0.523373,0.182488,0.163069,0.145734
3,1.3967,1.249168,0.624198,0.225629,0.228509,0.210051
4,1.1324,1.081626,0.676444,0.265003,0.279401,0.258036
5,0.9499,0.967071,0.705775,0.295893,0.308767,0.285933
6,0.8058,0.875793,0.71494,0.311923,0.319612,0.299071
7,0.6983,0.836887,0.707608,0.300796,0.31743,0.295689
8,0.6316,0.80808,0.736939,0.381389,0.367729,0.349559
9,0.5753,0.786281,0.738772,0.387298,0.37622,0.364253
10,0.5224,0.774482,0.746104,0.411036,0.399183,0.387511


[I 2025-03-26 12:30:50,527] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.00035688475656907954, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2198,1.951599,0.391384,0.057058,0.083596,0.062824
2,1.773,1.557159,0.514207,0.138875,0.151987,0.129199
3,1.4205,1.269331,0.616865,0.226082,0.226033,0.208242
4,1.1574,1.100183,0.675527,0.263952,0.278231,0.259029
5,0.9769,0.985297,0.701192,0.291251,0.303499,0.28161
6,0.8307,0.890371,0.71494,0.296841,0.318107,0.297302
7,0.7214,0.844776,0.711274,0.323801,0.319277,0.299929
8,0.6523,0.815745,0.732356,0.34964,0.353648,0.332363
9,0.595,0.788543,0.734189,0.361013,0.363286,0.348401
10,0.5406,0.780087,0.746104,0.395684,0.382409,0.367305


[I 2025-03-26 12:32:06,065] Trial 93 finished with value: 0.40779667087983495 and parameters: {'learning_rate': 0.00035688475656907954, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 4.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 94 with params: {'learning_rate': 0.00036943749446526915, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2143,1.940046,0.394134,0.055306,0.08439,0.062492
2,1.7576,1.540025,0.517874,0.143199,0.154906,0.132861
3,1.4003,1.251015,0.630614,0.225066,0.233778,0.214418
4,1.1363,1.084738,0.672777,0.262129,0.277606,0.258263
5,0.9556,0.972099,0.701192,0.290197,0.303749,0.280814


[I 2025-03-26 12:32:32,013] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 5.047617393737974e-05, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4022,2.319473,0.179652,0.018551,0.020822,0.00759
2,2.2885,2.222953,0.178735,0.023545,0.020548,0.007089
3,2.2066,2.139323,0.24473,0.072895,0.03908,0.033813
4,2.1287,2.064645,0.378552,0.08124,0.079252,0.064616
5,2.0666,2.000153,0.404216,0.055335,0.086259,0.064519
6,1.9997,1.94267,0.412466,0.092625,0.090295,0.067965
7,1.9457,1.893328,0.430797,0.087415,0.099525,0.077493
8,1.9028,1.850276,0.446379,0.105678,0.107853,0.085434
9,1.8617,1.813975,0.456462,0.10252,0.115083,0.092808
10,1.8311,1.784945,0.472961,0.102637,0.12383,0.099567


[I 2025-03-26 12:33:25,356] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 2.6821272497630925e-05, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4325,2.37488,0.186068,0.032682,0.02318,0.011072
2,2.3574,2.311867,0.182401,0.01449,0.021644,0.008931
3,2.3063,2.260536,0.179652,0.023548,0.020822,0.007605
4,2.259,2.216236,0.182401,0.023561,0.021644,0.009083
5,2.2213,2.174765,0.2044,0.063648,0.027958,0.018728


[I 2025-03-26 12:33:50,491] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.00015758755429273638, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3278,2.161618,0.176902,0.003538,0.02,0.006012
2,2.0724,1.937683,0.410632,0.092579,0.089481,0.066296
3,1.8687,1.734291,0.467461,0.101816,0.123947,0.098488
4,1.6771,1.570236,0.517874,0.16279,0.153707,0.130471
5,1.5325,1.441092,0.560953,0.227559,0.188788,0.171225
6,1.3959,1.339232,0.594867,0.241494,0.214289,0.199538
7,1.2898,1.257853,0.638863,0.270952,0.245168,0.234834
8,1.2129,1.195491,0.660862,0.262978,0.269676,0.25104
9,1.1394,1.14378,0.673694,0.265778,0.275391,0.255968
10,1.0792,1.105531,0.68286,0.266075,0.285302,0.262383


[I 2025-03-26 12:34:40,053] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 2.025662008519137e-05, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4467,2.39942,0.153071,0.009399,0.018519,0.00893
2,2.3841,2.343836,0.182401,0.01067,0.021644,0.008784
3,2.341,2.302176,0.179652,0.018558,0.020822,0.007599
4,2.303,2.267023,0.180568,0.019561,0.021096,0.008097
5,2.2713,2.231851,0.179652,0.023554,0.020822,0.007615
6,2.2397,2.204185,0.181485,0.043564,0.021311,0.008554
7,2.2149,2.179431,0.19615,0.063614,0.025492,0.015601
8,2.1941,2.157942,0.226398,0.079049,0.033977,0.028055
9,2.1757,2.139898,0.265811,0.073041,0.045215,0.041056
10,2.1615,2.124325,0.303391,0.073874,0.056196,0.051836


[I 2025-03-26 12:35:30,598] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 4.35374541141818e-05, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4192,2.341503,0.182401,0.013581,0.021644,0.008897
2,2.3134,2.2535,0.179652,0.023548,0.020822,0.007605
3,2.2385,2.176896,0.189734,0.063584,0.023705,0.012673
4,2.1685,2.109167,0.329056,0.071906,0.064365,0.057686
5,2.1122,2.050308,0.387718,0.079798,0.08169,0.065028


[I 2025-03-26 12:35:55,642] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 4.174411847798711e-05, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4211,2.345301,0.183318,0.012344,0.021918,0.009239
2,2.3182,2.259895,0.179652,0.023548,0.020822,0.007605
3,2.2456,2.185724,0.185151,0.043567,0.02243,0.010476
4,2.178,2.120303,0.306141,0.072309,0.057587,0.052029
5,2.1236,2.062937,0.382218,0.081679,0.080159,0.065361
6,2.0665,2.014195,0.394134,0.07627,0.083813,0.064221
7,2.0195,1.970546,0.407883,0.073666,0.087656,0.064875
8,1.9819,1.932516,0.419798,0.092361,0.09392,0.072601
9,1.9463,1.899841,0.431714,0.087609,0.100193,0.078021
10,1.9193,1.873568,0.442713,0.086189,0.106086,0.083148


[I 2025-03-26 12:36:45,591] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0004636837052468016, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1769,1.861312,0.425298,0.064297,0.100084,0.074891
2,1.655,1.428223,0.553621,0.198545,0.185964,0.166688
3,1.2665,1.132888,0.655362,0.239403,0.258618,0.236147
4,0.994,0.969335,0.686526,0.271558,0.289828,0.264434
5,0.8102,0.878822,0.713107,0.298225,0.319884,0.294464
6,0.6812,0.812722,0.725023,0.366093,0.345978,0.329193
7,0.5848,0.794231,0.730522,0.367487,0.364651,0.347979
8,0.526,0.760569,0.750687,0.415393,0.399895,0.389026
9,0.4744,0.74311,0.750687,0.445925,0.409183,0.404429
10,0.425,0.731924,0.75527,0.459258,0.424398,0.418881


[I 2025-03-26 12:38:01,259] Trial 101 finished with value: 0.4493441148240114 and parameters: {'learning_rate': 0.0004636837052468016, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 102 with params: {'learning_rate': 0.0004891719445432319, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.176,1.846061,0.426214,0.06298,0.10219,0.074838
2,1.6307,1.392909,0.563703,0.196423,0.199806,0.177427
3,1.2285,1.101701,0.659945,0.275657,0.269421,0.248116
4,0.9587,0.945359,0.699358,0.285805,0.300731,0.279652
5,0.7804,0.862477,0.719523,0.329727,0.334379,0.310198
6,0.6561,0.802678,0.730522,0.385881,0.362831,0.349378
7,0.5622,0.776587,0.741522,0.391466,0.386836,0.373492
8,0.5032,0.753054,0.749771,0.413992,0.397338,0.386882
9,0.4484,0.732541,0.752521,0.446409,0.42557,0.42265
10,0.4011,0.722022,0.75527,0.481838,0.427259,0.427822


[I 2025-03-26 12:39:20,144] Trial 102 finished with value: 0.47189610388824405 and parameters: {'learning_rate': 0.0004891719445432319, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 103 with params: {'learning_rate': 0.00045918198842206346, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.187,1.869466,0.421632,0.06606,0.097674,0.073796
2,1.6603,1.423766,0.557287,0.200138,0.190016,0.169848
3,1.2665,1.133207,0.653529,0.255737,0.259162,0.239526
4,0.998,0.972528,0.686526,0.258841,0.290981,0.2672
5,0.8164,0.88255,0.71769,0.328274,0.329717,0.307661
6,0.6866,0.816362,0.725023,0.378964,0.353075,0.341774
7,0.5894,0.79493,0.730522,0.395903,0.373006,0.359503
8,0.5292,0.761111,0.746104,0.404522,0.391166,0.376253
9,0.4748,0.744801,0.752521,0.459219,0.42001,0.415638
10,0.4258,0.731864,0.758937,0.470506,0.428372,0.425726


[I 2025-03-26 12:40:36,100] Trial 103 finished with value: 0.4588381698404095 and parameters: {'learning_rate': 0.00045918198842206346, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 104 with params: {'learning_rate': 0.000474007732838863, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1727,1.85207,0.427131,0.063929,0.101375,0.075348
2,1.6436,1.416094,0.554537,0.200855,0.189214,0.170852
3,1.2521,1.122142,0.655362,0.238048,0.26245,0.24001
4,0.9804,0.959264,0.692026,0.276243,0.293033,0.270772
5,0.7981,0.872584,0.714024,0.308421,0.323795,0.298661
6,0.6717,0.809296,0.729606,0.383738,0.3608,0.349922
7,0.575,0.789666,0.732356,0.364915,0.365385,0.348563
8,0.5168,0.757638,0.753437,0.41459,0.402184,0.390823
9,0.4653,0.738737,0.751604,0.442978,0.415142,0.409295
10,0.4161,0.727947,0.754354,0.448756,0.419912,0.413075


[I 2025-03-26 12:41:51,980] Trial 104 finished with value: 0.44564968938491534 and parameters: {'learning_rate': 0.000474007732838863, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 105 with params: {'learning_rate': 0.0004972156650098043, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1732,1.84022,0.428964,0.062722,0.103414,0.07517
2,1.6232,1.385588,0.565536,0.196303,0.200893,0.177913
3,1.219,1.09336,0.665445,0.282302,0.27694,0.256517
4,0.9479,0.937315,0.703025,0.29639,0.303696,0.283896
5,0.7712,0.858503,0.719523,0.331833,0.334936,0.309599
6,0.6508,0.801015,0.736022,0.41568,0.373541,0.366011
7,0.5549,0.771182,0.740605,0.386716,0.379435,0.365466
8,0.4959,0.746913,0.752521,0.427671,0.403068,0.391105
9,0.4421,0.726724,0.752521,0.443934,0.426338,0.42232
10,0.3935,0.718626,0.752521,0.467217,0.424547,0.424555


[I 2025-03-26 12:43:04,468] Trial 105 finished with value: 0.47187001129115386 and parameters: {'learning_rate': 0.0004972156650098043, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 106 with params: {'learning_rate': 0.00038830645075827786, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2141,1.930704,0.395967,0.055119,0.085117,0.06249
2,1.7401,1.515051,0.525206,0.166279,0.161683,0.141621
3,1.3714,1.220024,0.640697,0.225812,0.241801,0.220423
4,1.1042,1.054837,0.68286,0.272792,0.284117,0.264796
5,0.9211,0.946075,0.708524,0.297146,0.311202,0.289217
6,0.7785,0.858459,0.725023,0.352217,0.332872,0.314407
7,0.6715,0.819747,0.718607,0.341844,0.333597,0.317207
8,0.6059,0.791206,0.743355,0.386024,0.375867,0.358379
9,0.5498,0.768629,0.746104,0.401256,0.386027,0.372968
10,0.4981,0.760638,0.747021,0.43687,0.395346,0.386243


[I 2025-03-26 12:44:19,300] Trial 106 finished with value: 0.435704867137416 and parameters: {'learning_rate': 0.00038830645075827786, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 6.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 107 with params: {'learning_rate': 0.0003596738637418528, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2253,1.955141,0.389551,0.059026,0.083166,0.063025
2,1.775,1.555923,0.513291,0.139393,0.152102,0.129238
3,1.419,1.265341,0.615949,0.229215,0.225477,0.207387
4,1.1544,1.095427,0.673694,0.270519,0.276872,0.258395
5,0.9726,0.980529,0.700275,0.285668,0.301995,0.278059


[I 2025-03-26 12:44:43,445] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.00028166078804503996, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2592,2.027096,0.339138,0.068508,0.067258,0.054969
2,1.8772,1.679015,0.477544,0.144346,0.134398,0.109746
3,1.566,1.409601,0.55637,0.199233,0.180638,0.160157
4,1.3141,1.222022,0.638863,0.248299,0.244106,0.227777
5,1.133,1.087815,0.68286,0.268938,0.282357,0.261071


[I 2025-03-26 12:45:08,028] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0004544605928817944, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1887,1.873598,0.419798,0.065794,0.096791,0.073268
2,1.6654,1.429323,0.55637,0.200387,0.189801,0.169785
3,1.273,1.138252,0.651696,0.25545,0.257846,0.238824
4,1.0045,0.97699,0.688359,0.259811,0.291539,0.267905
5,0.8224,0.885466,0.716774,0.328714,0.329502,0.307906
6,0.692,0.818932,0.72044,0.370516,0.34627,0.330961
7,0.5944,0.796663,0.730522,0.394328,0.367992,0.353833
8,0.5341,0.762329,0.748854,0.406071,0.393397,0.378197
9,0.4793,0.747453,0.748854,0.45184,0.417177,0.411766
10,0.4302,0.734211,0.759853,0.470422,0.428587,0.426068


[I 2025-03-26 12:46:22,832] Trial 109 finished with value: 0.45695793507780297 and parameters: {'learning_rate': 0.0004544605928817944, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 110 with params: {'learning_rate': 0.0002916371527822545, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2544,2.016776,0.35747,0.068399,0.072578,0.059968
2,1.8638,1.662706,0.479377,0.142534,0.135522,0.111132
3,1.5465,1.390233,0.560953,0.199187,0.183285,0.163084
4,1.2922,1.203424,0.648946,0.247526,0.252562,0.236067
5,1.1104,1.071599,0.68561,0.270157,0.284056,0.262505
6,0.96,0.975249,0.705775,0.298827,0.302227,0.28284
7,0.8448,0.912551,0.702108,0.273085,0.299974,0.276288
8,0.7711,0.870289,0.724106,0.336362,0.332909,0.310726
9,0.7086,0.846379,0.72319,0.327517,0.33049,0.309348
10,0.6503,0.826923,0.731439,0.366309,0.354272,0.339652


[I 2025-03-26 12:47:12,553] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.000495988193335961, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1641,1.834023,0.430797,0.062853,0.104237,0.075805
2,1.621,1.391271,0.562786,0.195956,0.199441,0.176049
3,1.2242,1.103717,0.662695,0.28679,0.271543,0.253132
4,0.957,0.946586,0.693859,0.275898,0.295542,0.27399
5,0.7777,0.862403,0.71769,0.33326,0.331743,0.307372
6,0.6537,0.802816,0.737855,0.407481,0.380737,0.371206
7,0.5575,0.780549,0.736939,0.366099,0.369918,0.353791
8,0.4985,0.749905,0.758937,0.428647,0.415308,0.402998
9,0.4463,0.724587,0.75802,0.455051,0.422005,0.418859
10,0.3991,0.717981,0.75802,0.453646,0.427168,0.421781


[I 2025-03-26 12:48:29,153] Trial 111 finished with value: 0.45431889686694843 and parameters: {'learning_rate': 0.000495988193335961, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 5.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 112 with params: {'learning_rate': 0.00043024518854995066, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.198,1.895304,0.410632,0.072801,0.091311,0.06888
2,1.6925,1.460381,0.553621,0.180466,0.1854,0.166757
3,1.3083,1.165365,0.651696,0.25817,0.25616,0.236707
4,1.0387,1.002445,0.689276,0.27626,0.296407,0.275115
5,0.8544,0.905106,0.71494,0.347781,0.325221,0.30476
6,0.7202,0.832273,0.721357,0.361922,0.337558,0.320444
7,0.6201,0.803621,0.732356,0.375632,0.362929,0.349362
8,0.559,0.772601,0.752521,0.388055,0.39283,0.375224
9,0.503,0.752688,0.751604,0.431149,0.399151,0.391878
10,0.4535,0.744209,0.750687,0.434002,0.40641,0.397535


[I 2025-03-26 12:49:42,912] Trial 112 finished with value: 0.44092158388558816 and parameters: {'learning_rate': 0.00043024518854995066, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 113 with params: {'learning_rate': 0.00010121831356866389, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.2, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3638,2.23874,0.176902,0.003538,0.02,0.006012
2,2.1771,2.073528,0.335472,0.069809,0.066359,0.057643
3,2.0349,1.933203,0.411549,0.071735,0.089138,0.064313
4,1.8969,1.80458,0.448213,0.097323,0.109279,0.085441
5,1.7861,1.694167,0.483043,0.103508,0.132189,0.10702


[I 2025-03-26 12:50:07,139] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.00048028646482360107, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1793,1.852665,0.426214,0.064685,0.10176,0.075521
2,1.6392,1.401603,0.563703,0.197235,0.199588,0.177467
3,1.2394,1.110956,0.658112,0.257581,0.267504,0.245381
4,0.9704,0.953588,0.696609,0.278375,0.297109,0.274889
5,0.7909,0.869171,0.716774,0.326043,0.330335,0.307
6,0.6638,0.805541,0.72594,0.387242,0.35686,0.344961
7,0.5686,0.785307,0.736022,0.397495,0.381668,0.370297
8,0.5099,0.753015,0.748854,0.419037,0.400277,0.388391
9,0.4562,0.736353,0.752521,0.441093,0.422283,0.418307
10,0.4079,0.723446,0.75802,0.477107,0.427671,0.428606


[I 2025-03-26 12:51:20,619] Trial 114 finished with value: 0.46790808088535774 and parameters: {'learning_rate': 0.00048028646482360107, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 115 with params: {'learning_rate': 0.0003361903816895983, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2347,1.975406,0.378552,0.06263,0.079286,0.062384
2,1.8043,1.590955,0.502291,0.130722,0.146358,0.122827
3,1.4603,1.305451,0.5967,0.225858,0.210621,0.193758
4,1.1986,1.129425,0.666361,0.256895,0.268797,0.251517
5,1.0163,1.008003,0.693859,0.28859,0.294041,0.270746


[I 2025-03-26 12:51:44,683] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.00046004707891416665, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1866,1.868715,0.421632,0.06606,0.097674,0.073796
2,1.6593,1.42269,0.558203,0.200036,0.190231,0.169812
3,1.2653,1.132292,0.655362,0.256618,0.261024,0.240773
4,0.9969,0.971867,0.687443,0.25898,0.291344,0.267462
5,0.8153,0.882098,0.71769,0.328229,0.329717,0.307566
6,0.6856,0.816001,0.725023,0.378964,0.353075,0.341774
7,0.5885,0.7947,0.731439,0.396535,0.373461,0.360071
8,0.5284,0.760839,0.746104,0.404522,0.391166,0.376253
9,0.4739,0.744277,0.753437,0.459863,0.420536,0.416227
10,0.425,0.731291,0.759853,0.472068,0.428736,0.426212


[I 2025-03-26 12:52:57,572] Trial 116 finished with value: 0.4593608610890367 and parameters: {'learning_rate': 0.00046004707891416665, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 5.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 117 with params: {'learning_rate': 0.0004828435266487486, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1783,1.850747,0.425298,0.06348,0.101545,0.074804
2,1.6367,1.399063,0.563703,0.196703,0.199565,0.177199
3,1.2362,1.108194,0.658112,0.255633,0.267504,0.245076
4,0.967,0.951128,0.697525,0.282299,0.29975,0.278319
5,0.7879,0.867286,0.719523,0.330212,0.332944,0.30917
6,0.6614,0.804026,0.72594,0.38814,0.358086,0.346198
7,0.5664,0.779221,0.741522,0.397287,0.382215,0.370838
8,0.5069,0.75468,0.748854,0.42064,0.399676,0.387837
9,0.4537,0.733125,0.752521,0.441479,0.422431,0.418587
10,0.4045,0.722375,0.75802,0.480707,0.427677,0.429611


[I 2025-03-26 12:54:11,509] Trial 117 finished with value: 0.4734758165909028 and parameters: {'learning_rate': 0.0004828435266487486, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 118 with params: {'learning_rate': 0.0004538493543253887, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.189,1.874167,0.419798,0.065794,0.096791,0.073268
2,1.6661,1.430116,0.55637,0.200864,0.189801,0.170049
3,1.2739,1.138897,0.651696,0.25545,0.257846,0.238824
4,1.0054,0.977608,0.689276,0.260498,0.292491,0.26861
5,0.8232,0.88593,0.716774,0.328762,0.329502,0.307929
6,0.6928,0.819225,0.72044,0.370516,0.34627,0.330961
7,0.595,0.796876,0.730522,0.394328,0.367992,0.353833
8,0.5347,0.76245,0.749771,0.406351,0.39376,0.37852
9,0.4799,0.747705,0.748854,0.451526,0.417177,0.411421
10,0.4308,0.734573,0.758937,0.469727,0.42692,0.425016


[I 2025-03-26 12:55:27,088] Trial 118 finished with value: 0.45172252995634765 and parameters: {'learning_rate': 0.0004538493543253887, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 119 with params: {'learning_rate': 1.0704036787379217e-05, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4598,2.429662,0.04308,0.009653,0.026341,0.006402
2,2.4234,2.395048,0.159487,0.00837,0.019244,0.008997
3,2.3945,2.365108,0.187901,0.020452,0.023728,0.011758
4,2.3698,2.341886,0.184235,0.015572,0.022452,0.010131
5,2.3485,2.322017,0.185151,0.014851,0.022466,0.010148
6,2.3293,2.304889,0.182401,0.01449,0.021644,0.008931
7,2.3152,2.289916,0.179652,0.018558,0.020822,0.007599
8,2.3021,2.276883,0.179652,0.018558,0.020822,0.007599
9,2.2904,2.265748,0.179652,0.018558,0.020822,0.007599
10,2.2821,2.256823,0.179652,0.023554,0.020822,0.007615


[I 2025-03-26 12:56:16,730] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 2.70102317495885e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4359,2.377661,0.185151,0.032541,0.022906,0.010803
2,2.3595,2.313644,0.180568,0.012463,0.021096,0.008016
3,2.3076,2.26161,0.179652,0.023548,0.020822,0.007605
4,2.2597,2.216617,0.181485,0.023558,0.02137,0.008605
5,2.2214,2.174619,0.20165,0.063638,0.027136,0.017753


[I 2025-03-26 12:56:41,034] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 1.5745418122329243e-05, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.452,2.412838,0.109074,0.009732,0.033271,0.0087
2,2.401,2.364999,0.185151,0.014581,0.022906,0.010746
3,2.3637,2.329768,0.181485,0.011921,0.02137,0.008436
4,2.3327,2.300641,0.184235,0.015895,0.022192,0.009804
5,2.3061,2.272509,0.180568,0.019564,0.021096,0.008101
6,2.28,2.249357,0.179652,0.023558,0.020822,0.00762
7,2.2601,2.229105,0.179652,0.023554,0.020822,0.007615
8,2.243,2.21177,0.180568,0.043558,0.021037,0.008045
9,2.2278,2.197138,0.186984,0.043584,0.022955,0.011324
10,2.2169,2.184819,0.195234,0.063611,0.02529,0.015081


[I 2025-03-26 12:57:30,477] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.000418584161567306, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2025,1.905607,0.4033,0.073531,0.088277,0.065749
2,1.7057,1.476166,0.540788,0.174127,0.172028,0.153064
3,1.3253,1.179695,0.651696,0.257309,0.253579,0.234173
4,1.0561,1.017848,0.68561,0.259739,0.287532,0.264159
5,0.8722,0.916413,0.713107,0.311512,0.319719,0.299166
6,0.7355,0.839979,0.72319,0.363084,0.338025,0.320886
7,0.6339,0.808804,0.727773,0.378891,0.350098,0.339506
8,0.5711,0.778918,0.752521,0.387307,0.392804,0.375039
9,0.5152,0.758153,0.747938,0.40797,0.389683,0.378848
10,0.4654,0.749421,0.749771,0.4388,0.406638,0.399811


[I 2025-03-26 12:58:20,483] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0004842140345505916, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1779,1.849738,0.425298,0.063474,0.101545,0.074792
2,1.6354,1.397776,0.563703,0.196703,0.199565,0.177199
3,1.2345,1.106752,0.659028,0.255866,0.267992,0.245459
4,0.9652,0.949871,0.698442,0.282674,0.300276,0.278727
5,0.7862,0.866455,0.719523,0.330212,0.332944,0.30917
6,0.6601,0.803141,0.726856,0.389018,0.359419,0.347325
7,0.5644,0.783099,0.739688,0.397866,0.381701,0.371
8,0.5056,0.751764,0.749771,0.41828,0.400492,0.387741
9,0.4519,0.734034,0.752521,0.445303,0.424516,0.421469
10,0.4041,0.723372,0.759853,0.471597,0.428053,0.429272


[I 2025-03-26 12:59:41,057] Trial 123 finished with value: 0.47078326689420136 and parameters: {'learning_rate': 0.0004842140345505916, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 124 with params: {'learning_rate': 0.0003444978864092439, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2313,1.96819,0.385885,0.061795,0.082009,0.063576
2,1.7938,1.578331,0.506874,0.137458,0.148889,0.125299
3,1.4453,1.290738,0.60495,0.226067,0.215515,0.197757
4,1.1826,1.117226,0.669111,0.261782,0.272704,0.254868
5,1.0007,0.998098,0.695692,0.286423,0.296231,0.272082


[I 2025-03-26 13:00:13,893] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0003258457570114764, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.239,1.984457,0.374885,0.064579,0.077868,0.062361
2,1.8177,1.607312,0.494042,0.126231,0.141511,0.117093
3,1.4798,1.324634,0.587534,0.22767,0.203934,0.186522
4,1.2194,1.145192,0.659945,0.26246,0.265365,0.249604
5,1.0366,1.021062,0.694775,0.28935,0.293555,0.2709
6,0.8882,0.925907,0.713107,0.299927,0.312893,0.291713
7,0.7757,0.871934,0.710357,0.300823,0.313893,0.293022
8,0.7049,0.837338,0.728689,0.34708,0.345569,0.323645
9,0.6455,0.812694,0.729606,0.365562,0.354423,0.340743
10,0.5887,0.799826,0.745188,0.378237,0.375056,0.359219


[I 2025-03-26 13:01:07,015] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.00040332152063429446, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2083,1.918232,0.40055,0.074532,0.087053,0.06488
2,1.7227,1.495339,0.532539,0.170517,0.16671,0.147771
3,1.348,1.198751,0.64528,0.266429,0.248117,0.228814
4,1.08,1.036193,0.681943,0.258771,0.285575,0.262273
5,0.8965,0.930834,0.711274,0.301516,0.313573,0.292473
6,0.7565,0.84814,0.727773,0.359272,0.335866,0.31866
7,0.6521,0.81299,0.725023,0.363018,0.345513,0.332344
8,0.5876,0.784478,0.747938,0.392885,0.381224,0.363668
9,0.5318,0.763706,0.747938,0.405793,0.389666,0.377624
10,0.4812,0.755469,0.748854,0.457829,0.402543,0.396076


[I 2025-03-26 13:01:56,244] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0004757376866892141, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1809,1.856098,0.424381,0.064607,0.10047,0.07485
2,1.6435,1.406111,0.56187,0.1968,0.196633,0.175416
3,1.245,1.115526,0.658112,0.257153,0.267504,0.245197
4,0.9762,0.957451,0.696609,0.278284,0.297655,0.275045
5,0.7961,0.871823,0.716774,0.324411,0.330335,0.306757
6,0.6683,0.80791,0.727773,0.380089,0.357313,0.345095
7,0.5727,0.785188,0.736939,0.39458,0.379581,0.367213
8,0.5134,0.759338,0.746104,0.422484,0.399313,0.387624
9,0.4598,0.736806,0.751604,0.441206,0.419357,0.414326
10,0.411,0.72301,0.75527,0.477052,0.42617,0.427166


[I 2025-03-26 13:03:16,944] Trial 127 finished with value: 0.47023782805672404 and parameters: {'learning_rate': 0.0004757376866892141, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 128 with params: {'learning_rate': 0.00040001163209461567, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2096,1.92093,0.40055,0.07467,0.087053,0.064972
2,1.7264,1.499497,0.530706,0.169612,0.165969,0.146871
3,1.353,1.203082,0.643446,0.246017,0.246325,0.225758
4,1.085,1.039918,0.68286,0.260677,0.285441,0.263147
5,0.9017,0.933989,0.710357,0.300679,0.313119,0.291789
6,0.7611,0.850001,0.725023,0.357537,0.334005,0.316888
7,0.6561,0.814133,0.72319,0.357899,0.343409,0.328337
8,0.5913,0.785754,0.746104,0.393159,0.380645,0.363364
9,0.5355,0.763915,0.746104,0.409947,0.386809,0.376662
10,0.4848,0.757353,0.749771,0.457396,0.401843,0.395471


[I 2025-03-26 13:04:32,318] Trial 128 finished with value: 0.4481808081500425 and parameters: {'learning_rate': 0.00040001163209461567, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 3.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 129 with params: {'learning_rate': 0.0004968336400777481, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1733,1.840518,0.428964,0.062722,0.103414,0.07517
2,1.6236,1.385929,0.56462,0.195633,0.199984,0.176809
3,1.2195,1.093803,0.665445,0.282065,0.276518,0.256219
4,0.9485,0.937681,0.702108,0.296723,0.303333,0.283826
5,0.7717,0.858738,0.72044,0.322614,0.33504,0.310034
6,0.6513,0.800977,0.736022,0.41588,0.373541,0.366093
7,0.5555,0.772267,0.741522,0.387173,0.380863,0.36753
8,0.4965,0.747359,0.751604,0.426863,0.403118,0.391107
9,0.4427,0.72723,0.753437,0.444287,0.425843,0.421699
10,0.3944,0.719177,0.754354,0.469881,0.426126,0.426968


[I 2025-03-26 13:05:48,144] Trial 129 finished with value: 0.4713200881499334 and parameters: {'learning_rate': 0.0004968336400777481, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 130 with params: {'learning_rate': 0.0004961264572494048, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1735,1.840952,0.428964,0.062722,0.103414,0.07517
2,1.6242,1.386488,0.56462,0.194155,0.199984,0.176636
3,1.2203,1.094471,0.664528,0.276888,0.274851,0.253609
4,0.9495,0.93839,0.703025,0.297744,0.303787,0.284664
5,0.7723,0.858599,0.718607,0.332603,0.335883,0.310268
6,0.6499,0.798703,0.736939,0.416839,0.373694,0.366858
7,0.5556,0.773601,0.741522,0.389706,0.386837,0.371735
8,0.4972,0.748331,0.751604,0.431829,0.40517,0.393725
9,0.4425,0.731629,0.752521,0.447899,0.430514,0.426107
10,0.3944,0.719961,0.757104,0.488754,0.429283,0.432529


[I 2025-03-26 13:07:05,699] Trial 130 finished with value: 0.46933531708628534 and parameters: {'learning_rate': 0.0004961264572494048, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 131 with params: {'learning_rate': 0.0004746453105341743, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1813,1.856909,0.424381,0.064611,0.10047,0.074857
2,1.6445,1.4071,0.560953,0.195674,0.195681,0.174341
3,1.2463,1.116611,0.658112,0.257483,0.267504,0.245376
4,0.9776,0.958442,0.696609,0.278748,0.297655,0.275239
5,0.7974,0.872522,0.71769,0.325102,0.330823,0.307466
6,0.6694,0.808473,0.727773,0.380679,0.357313,0.345377
7,0.5737,0.785764,0.736022,0.402136,0.378153,0.366201
8,0.5144,0.75993,0.745188,0.421728,0.39921,0.387108
9,0.4607,0.737343,0.751604,0.441206,0.419357,0.414326
10,0.412,0.723079,0.756187,0.477672,0.426657,0.427703


[I 2025-03-26 13:08:23,233] Trial 131 finished with value: 0.47082080220104283 and parameters: {'learning_rate': 0.0004746453105341743, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 3.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 132 with params: {'learning_rate': 0.0004546368061268981, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1807,1.869444,0.423465,0.066127,0.098794,0.074932
2,1.6652,1.438285,0.549038,0.196912,0.182228,0.163778
3,1.2792,1.142356,0.652612,0.240361,0.256435,0.234318
4,1.0066,0.978768,0.686526,0.271471,0.288104,0.262709
5,0.8221,0.885067,0.714024,0.299063,0.320099,0.295102
6,0.6906,0.816624,0.725023,0.367708,0.345978,0.329912
7,0.5937,0.797244,0.728689,0.363053,0.361625,0.344725
8,0.5346,0.763333,0.747021,0.412512,0.399687,0.388253
9,0.4824,0.745066,0.750687,0.428331,0.407578,0.401389
10,0.4326,0.734264,0.754354,0.438311,0.417913,0.40851


[I 2025-03-26 13:09:37,376] Trial 132 finished with value: 0.46453030960093045 and parameters: {'learning_rate': 0.0004546368061268981, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 133 with params: {'learning_rate': 0.0004343254498789213, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1965,1.891641,0.412466,0.069774,0.092601,0.070009
2,1.6881,1.455093,0.554537,0.181866,0.187158,0.167953
3,1.3023,1.160605,0.650779,0.257907,0.25525,0.236132
4,1.0328,0.99752,0.691109,0.277818,0.297225,0.276462
5,0.8486,0.90143,0.715857,0.347643,0.3269,0.305897
6,0.7154,0.829492,0.72044,0.35283,0.338738,0.321507
7,0.6155,0.801721,0.732356,0.375084,0.363808,0.349407
8,0.5546,0.770419,0.750687,0.388079,0.391659,0.374346
9,0.4989,0.750473,0.751604,0.435231,0.407458,0.40007
10,0.4493,0.74271,0.754354,0.436646,0.411012,0.401837


[I 2025-03-26 13:10:52,114] Trial 133 finished with value: 0.44049558966674146 and parameters: {'learning_rate': 0.0004343254498789213, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 3.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 134 with params: {'learning_rate': 9.499899727306372e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3627,2.24352,0.176902,0.003538,0.02,0.006012
2,2.186,2.087139,0.316224,0.071651,0.060585,0.054272
3,2.0516,1.954155,0.408799,0.052207,0.087914,0.063164
4,1.9215,1.832315,0.442713,0.10104,0.105226,0.081954
5,1.8165,1.726509,0.477544,0.103762,0.128544,0.104752


[I 2025-03-26 13:11:17,062] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0004991786459085929, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.2, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1628,1.831679,0.431714,0.062517,0.104882,0.075818
2,1.6179,1.388214,0.562786,0.195956,0.199441,0.176049
3,1.2207,1.101162,0.661778,0.284292,0.271952,0.252769
4,0.954,0.945199,0.692942,0.275408,0.295327,0.273672
5,0.7749,0.861178,0.71769,0.335458,0.332717,0.308968
6,0.6513,0.80177,0.737855,0.407699,0.380737,0.371327
7,0.555,0.778686,0.736022,0.36625,0.371413,0.354868
8,0.4957,0.748487,0.757104,0.427486,0.413393,0.400291
9,0.4426,0.726614,0.754354,0.455833,0.424649,0.420483
10,0.3965,0.716418,0.759853,0.490804,0.435406,0.434724


[I 2025-03-26 13:12:30,082] Trial 135 finished with value: 0.45523056288389796 and parameters: {'learning_rate': 0.0004991786459085929, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.2, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 136 with params: {'learning_rate': 0.00041484507008942537, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2039,1.908746,0.402383,0.073984,0.087913,0.065781
2,1.71,1.481056,0.537122,0.172061,0.169364,0.150432
3,1.3309,1.184385,0.652612,0.259296,0.253439,0.234285
4,1.062,1.022715,0.684693,0.260153,0.287168,0.263956
5,0.8782,0.919853,0.714024,0.315172,0.319957,0.300355
6,0.7407,0.842053,0.722273,0.35309,0.33269,0.314123
7,0.6385,0.809991,0.72594,0.374533,0.347431,0.335881
8,0.5752,0.780747,0.750687,0.388541,0.388795,0.371673
9,0.5193,0.759493,0.745188,0.405263,0.389436,0.377315
10,0.469,0.75119,0.749771,0.447987,0.401304,0.394225


[I 2025-03-26 13:13:20,077] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0004727216067798781, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.182,1.858387,0.424381,0.064738,0.10047,0.074935
2,1.6464,1.409054,0.560037,0.19728,0.193181,0.172396
3,1.2487,1.118658,0.657195,0.256069,0.266595,0.244521
4,0.9801,0.960335,0.692942,0.266915,0.294499,0.270918
5,0.7997,0.873928,0.715857,0.324431,0.330097,0.306612
6,0.6714,0.809518,0.726856,0.380124,0.355979,0.344106
7,0.5757,0.786982,0.735105,0.401675,0.376614,0.365113
8,0.5163,0.760925,0.744271,0.420899,0.398755,0.38661
9,0.4624,0.73871,0.751604,0.441772,0.419357,0.414563
10,0.4137,0.72394,0.757104,0.477557,0.427145,0.427883


[I 2025-03-26 13:14:33,799] Trial 137 finished with value: 0.47136743691502486 and parameters: {'learning_rate': 0.0004727216067798781, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 138 with params: {'learning_rate': 0.0004965729592855284, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1734,1.84065,0.428964,0.062722,0.103414,0.07517
2,1.6238,1.386068,0.56462,0.195633,0.199984,0.176809
3,1.2197,1.09404,0.665445,0.282065,0.276518,0.256219
4,0.9488,0.938014,0.702108,0.296793,0.303333,0.2839
5,0.7719,0.859703,0.718607,0.322112,0.335749,0.310131
6,0.65,0.800055,0.736939,0.414774,0.373468,0.36571
7,0.5554,0.775427,0.740605,0.387788,0.381963,0.369045
8,0.4974,0.749827,0.748854,0.424616,0.401173,0.389054
9,0.4425,0.730373,0.750687,0.455701,0.425194,0.422105
10,0.3947,0.717418,0.758937,0.483532,0.430109,0.431996


[I 2025-03-26 13:15:49,055] Trial 138 finished with value: 0.46867521275733326 and parameters: {'learning_rate': 0.0004965729592855284, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 3.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 139 with params: {'learning_rate': 0.00043865666805057785, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1948,1.887768,0.416132,0.070496,0.095182,0.073014
2,1.6834,1.449556,0.555454,0.184139,0.187613,0.168167
3,1.296,1.155631,0.651696,0.257457,0.255614,0.236702
4,1.0267,0.992806,0.693859,0.278319,0.300242,0.278692
5,0.8429,0.897704,0.716774,0.338521,0.327387,0.306282
6,0.7105,0.827054,0.721357,0.372158,0.34419,0.328964
7,0.6108,0.799805,0.732356,0.371815,0.364713,0.347756
8,0.55,0.768211,0.751604,0.408693,0.395611,0.379655
9,0.4945,0.750256,0.753437,0.430504,0.408861,0.399134
10,0.445,0.740452,0.753437,0.438244,0.41186,0.404109


[I 2025-03-26 13:16:39,058] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0003224275665351777, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2405,1.987523,0.373969,0.064933,0.077505,0.062292
2,1.8223,1.612869,0.492209,0.122387,0.139789,0.114878
3,1.4864,1.331138,0.584785,0.226077,0.20049,0.182777
4,1.2264,1.150571,0.661778,0.264255,0.265841,0.250296
5,1.0435,1.025578,0.694775,0.289472,0.293555,0.270825
6,0.8949,0.930437,0.710357,0.298029,0.308329,0.289271
7,0.7821,0.875536,0.707608,0.298467,0.310462,0.288531
8,0.7111,0.840071,0.728689,0.346588,0.343415,0.322254
9,0.6514,0.815769,0.730522,0.365922,0.354877,0.34115
10,0.5944,0.801714,0.745188,0.377709,0.370511,0.354618


[I 2025-03-26 13:17:29,006] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0004185746031446022, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2025,1.905607,0.4033,0.073531,0.088277,0.065749
2,1.7058,1.476161,0.540788,0.174127,0.172028,0.153064
3,1.3253,1.179683,0.649863,0.256913,0.252689,0.233266
4,1.056,1.01775,0.684693,0.259493,0.287168,0.263807
5,0.8721,0.916379,0.713107,0.313993,0.319719,0.299414
6,0.7354,0.839981,0.72319,0.363084,0.338025,0.320886
7,0.6339,0.808757,0.727773,0.378959,0.350098,0.339617
8,0.5711,0.778867,0.752521,0.387307,0.392804,0.375039
9,0.5151,0.758153,0.747021,0.407715,0.38958,0.378649
10,0.4653,0.749439,0.749771,0.4388,0.406638,0.399811


[I 2025-03-26 13:18:18,510] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0003206381591668457, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2412,1.98912,0.373969,0.064958,0.077505,0.062316
2,1.8247,1.615804,0.492209,0.122713,0.13955,0.114587
3,1.4899,1.334566,0.582951,0.225471,0.198769,0.181674
4,1.2301,1.153415,0.661778,0.264012,0.265841,0.250229
5,1.0472,1.028021,0.693859,0.285962,0.292127,0.268461
6,0.8984,0.932894,0.708524,0.29436,0.305948,0.286344
7,0.7856,0.877531,0.707608,0.299167,0.310462,0.288123
8,0.7144,0.841693,0.727773,0.345554,0.341748,0.320846
9,0.6547,0.817483,0.727773,0.346661,0.347877,0.332055
10,0.5975,0.80279,0.744271,0.377834,0.370147,0.354643


[I 2025-03-26 13:19:06,979] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0004859810790115975, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1772,1.848341,0.426214,0.063555,0.10219,0.075151
2,1.6337,1.395974,0.562786,0.196219,0.199327,0.176839
3,1.2324,1.104994,0.659945,0.275916,0.269421,0.248171
4,0.9629,0.948524,0.698442,0.282625,0.300276,0.278741
5,0.7842,0.865405,0.719523,0.329515,0.332944,0.309225
6,0.6585,0.802221,0.728689,0.389988,0.360991,0.34858
7,0.5632,0.777701,0.742438,0.398131,0.382381,0.37066
8,0.5035,0.750263,0.750687,0.422264,0.400601,0.388918
9,0.4496,0.733824,0.752521,0.444247,0.423521,0.420034
10,0.4024,0.722781,0.759853,0.472231,0.428995,0.429493


[I 2025-03-26 13:20:21,794] Trial 143 finished with value: 0.47233594283367625 and parameters: {'learning_rate': 0.0004859810790115975, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 86 with value: 0.4778879458794155.


Trial 144 with params: {'learning_rate': 0.0004372517403740796, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1953,1.88899,0.414299,0.070184,0.093891,0.071592
2,1.6848,1.451297,0.554537,0.184196,0.187158,0.16793
3,1.298,1.157188,0.651696,0.256883,0.255614,0.236429
4,1.0286,0.994267,0.692942,0.278341,0.29929,0.278242
5,0.8446,0.898899,0.716774,0.338846,0.327387,0.306617
6,0.712,0.827756,0.72044,0.352083,0.338738,0.321122
7,0.6122,0.800445,0.732356,0.372481,0.364713,0.348254
8,0.5515,0.768985,0.750687,0.388723,0.393111,0.375018
9,0.4958,0.749757,0.752521,0.432355,0.408407,0.399013
10,0.4464,0.74147,0.753437,0.437942,0.41186,0.40414


[I 2025-03-26 13:21:36,972] Trial 144 finished with value: 0.4483650033004464 and parameters: {'learning_rate': 0.0004372517403740796, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 145 with params: {'learning_rate': 0.00030443484511967825, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2428,1.999401,0.374885,0.066875,0.077868,0.063553
2,1.8424,1.638687,0.482126,0.122614,0.133845,0.107124
3,1.5183,1.363561,0.570119,0.221558,0.190342,0.173282
4,1.2623,1.17988,0.658112,0.26044,0.260892,0.244031
5,1.0805,1.05117,0.695692,0.269107,0.291502,0.267137


[I 2025-03-26 13:22:11,100] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0003630312598777649, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2239,1.952198,0.389551,0.057999,0.083166,0.062655
2,1.7708,1.550999,0.51604,0.139859,0.15378,0.130468
3,1.4132,1.259954,0.620532,0.22785,0.228142,0.209919
4,1.1483,1.09063,0.673694,0.269564,0.277295,0.258314
5,0.9664,0.976538,0.701192,0.286717,0.30245,0.278762
6,0.8206,0.88218,0.72044,0.323095,0.325401,0.305687
7,0.7105,0.837435,0.714024,0.32329,0.321016,0.301267
8,0.6426,0.808979,0.733272,0.383057,0.358566,0.340721
9,0.5852,0.783793,0.740605,0.407416,0.380244,0.371417
10,0.5314,0.772536,0.745188,0.424453,0.392684,0.384465


[I 2025-03-26 13:23:26,876] Trial 146 finished with value: 0.4089588189122477 and parameters: {'learning_rate': 0.0003630312598777649, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 147 with params: {'learning_rate': 0.00036786391393189155, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.222,1.948025,0.390467,0.057212,0.08353,0.062505
2,1.7648,1.543994,0.516957,0.1403,0.154235,0.131061
3,1.4051,1.25223,0.625115,0.228007,0.231271,0.212476
4,1.1397,1.083851,0.673694,0.269345,0.277295,0.258313
5,0.9576,0.970639,0.704858,0.289275,0.305312,0.281953


[I 2025-03-26 13:23:51,770] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.00048236333478817866, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1785,1.851163,0.425298,0.063747,0.101545,0.074923
2,1.6372,1.399599,0.563703,0.196703,0.199565,0.177199
3,1.2368,1.108765,0.658112,0.257146,0.267504,0.245183
4,0.9676,0.951466,0.698442,0.283157,0.300205,0.279052
5,0.7884,0.867552,0.719523,0.33037,0.332944,0.309245
6,0.6618,0.804185,0.726856,0.388767,0.359039,0.346918
7,0.5669,0.779895,0.740605,0.396561,0.381689,0.370182
8,0.5074,0.754791,0.748854,0.42064,0.399676,0.387837
9,0.4542,0.733645,0.752521,0.441479,0.422431,0.418587
10,0.405,0.722666,0.75802,0.480707,0.427677,0.429611


[I 2025-03-26 13:25:08,004] Trial 148 finished with value: 0.47070410106521526 and parameters: {'learning_rate': 0.00048236333478817866, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 86 with value: 0.4778879458794155.


Trial 149 with params: {'learning_rate': 0.0003091947441664066, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2406,1.995046,0.376719,0.066179,0.078447,0.063737
2,1.8358,1.630691,0.484876,0.133129,0.135314,0.109298
3,1.5086,1.354026,0.571952,0.22418,0.191599,0.174036
4,1.2517,1.171419,0.660862,0.263684,0.265932,0.24995
5,1.0698,1.043775,0.695692,0.268822,0.291886,0.267089
6,0.9204,0.949116,0.708524,0.2998,0.306229,0.286202
7,0.8085,0.891648,0.708524,0.298573,0.307103,0.284081
8,0.7362,0.853041,0.72594,0.327903,0.335378,0.311706
9,0.6757,0.828287,0.724106,0.33066,0.338123,0.319351
10,0.6181,0.809933,0.742438,0.373351,0.364691,0.349815


[I 2025-03-26 13:25:59,883] Trial 149 pruned. 


In [22]:
print(best_trial_distill)

BestRun(run_id='86', objective=0.4778879458794155, hyperparameters={'learning_rate': 0.00048481023093695626, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 2.5}, run_summary=None)


Přepočet kroků s ohledem na změnu velikosti datasetu. Ke zmenšení počtu epoch nedochází, augmentovaný dataset je stále relativně malý. 

In [23]:
data_length = len(train_aug)
min_r = math.ceil(data_length/batch_size)*5
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

In [24]:
base.reset_seed()

## Prohledávání s normálním tréninkem nad augmentovaným datasetem
Konfigurace jednotlivých tréninků.

In [25]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bert-base-aug_fine_hp-search", logging_dir=f"~/logs/{DATASET}/bert-base-aug_fine_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [26]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 5e-4, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [27]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [28]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_aug,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_Bert()
)
  

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Nastavení prohledávání.

In [29]:
best_trial_normal_aug = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Test-base-aug",
    n_trials=150
)

[I 2025-03-26 13:26:00,559] A new study created in memory with name: Test-base-aug


Trial 0 with params: {'learning_rate': 4.3284502212938785e-05, 'weight_decay': 0.01, 'warmup_steps': 39}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9327,2.235381,0.577452,0.251613,0.206702,0.192989
2,1.7244,1.548038,0.703025,0.320607,0.331331,0.310229
3,1.151,1.272127,0.746104,0.382888,0.401835,0.379462
4,0.8427,1.145172,0.76077,0.441398,0.439341,0.42326
5,0.6555,1.084483,0.766269,0.471065,0.463492,0.449217


[I 2025-03-26 13:27:30,154] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010401663679887307, 'weight_decay': 0.001, 'warmup_steps': 8}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1754,1.397009,0.733272,0.357032,0.374699,0.351703
2,0.8164,1.074034,0.76352,0.451657,0.458918,0.445542
3,0.4327,0.997396,0.770852,0.494056,0.497951,0.481924
4,0.2563,1.012277,0.779102,0.618539,0.55654,0.568428
5,0.1631,1.018821,0.786434,0.677925,0.614088,0.627822
6,0.1116,1.052724,0.789184,0.688095,0.609935,0.630828
7,0.0797,1.079528,0.7956,0.705145,0.64448,0.656518
8,0.0612,1.10462,0.797434,0.783738,0.686365,0.706878
9,0.0496,1.13949,0.787351,0.771125,0.677404,0.700903
10,0.0399,1.186543,0.789184,0.773754,0.69785,0.712939


[I 2025-03-26 13:30:28,101] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 1.2551115172973821e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.517,3.196738,0.351054,0.055195,0.074407,0.051748
2,2.9383,2.738555,0.47846,0.105283,0.133664,0.111068
3,2.5231,2.398709,0.527039,0.172007,0.163069,0.14225
4,2.2075,2.148483,0.585701,0.250816,0.215852,0.199271
5,1.9583,1.956343,0.614115,0.260171,0.240123,0.2223


[I 2025-03-26 13:31:55,308] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00015958573588141273, 'weight_decay': 0.0, 'warmup_steps': 52}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8951,1.174138,0.759853,0.441487,0.4343,0.418386
2,0.5307,0.996867,0.780935,0.514725,0.508635,0.502272
3,0.2312,0.980664,0.7956,0.65196,0.609097,0.616692
4,0.1188,1.040272,0.793767,0.716676,0.629055,0.65456
5,0.0716,1.081217,0.7956,0.773172,0.698715,0.719975
6,0.0481,1.13047,0.781852,0.785667,0.657598,0.693858
7,0.0334,1.18707,0.7956,0.805077,0.701408,0.727515
8,0.0263,1.197125,0.794684,0.795629,0.739477,0.749265
9,0.0217,1.254264,0.791017,0.789459,0.710928,0.732169
10,0.0161,1.30247,0.788268,0.815834,0.709758,0.734385


[I 2025-03-26 13:34:50,033] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.00025959425503112657, 'weight_decay': 0.002, 'warmup_steps': 9}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4162,1.032974,0.774519,0.44934,0.471039,0.449134
2,0.2748,1.018367,0.781852,0.671033,0.608093,0.620665
3,0.1018,1.094981,0.791934,0.766668,0.698511,0.714644
4,0.0515,1.129571,0.802933,0.811794,0.685051,0.724949
5,0.0316,1.220691,0.797434,0.77682,0.717004,0.728165
6,0.0227,1.198218,0.799267,0.801529,0.695447,0.726203
7,0.0157,1.314942,0.790101,0.78769,0.693135,0.718063
8,0.0136,1.313053,0.792851,0.787902,0.701706,0.72295
9,0.0102,1.375067,0.794684,0.79594,0.70636,0.730281
10,0.008,1.390883,0.796517,0.793276,0.723698,0.737487


[I 2025-03-26 13:37:46,131] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 2.049268011541735e-05, 'weight_decay': 0.003, 'warmup_steps': 28}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3346,2.885929,0.445463,0.109365,0.112952,0.096663
2,2.5234,2.282967,0.56187,0.206598,0.190136,0.173908
3,1.9948,1.900837,0.630614,0.294492,0.257746,0.24012
4,1.6379,1.649444,0.68561,0.335732,0.307292,0.292571
5,1.3817,1.479711,0.708524,0.338391,0.337635,0.316831
6,1.2006,1.364818,0.726856,0.379894,0.371914,0.353988
7,1.0605,1.284745,0.745188,0.430392,0.417392,0.404883
8,0.9615,1.231742,0.750687,0.426907,0.430992,0.411552
9,0.8831,1.193065,0.748854,0.423434,0.426507,0.406944
10,0.8185,1.165618,0.751604,0.438519,0.444075,0.426882


[I 2025-03-26 13:40:39,879] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 5.4182823195332406e-05, 'weight_decay': 0.003, 'warmup_steps': 33}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7644,2.00966,0.609533,0.26808,0.244097,0.226531
2,1.4677,1.374825,0.731439,0.377656,0.377592,0.358852
3,0.93,1.164127,0.759853,0.449244,0.43925,0.423922
4,0.6588,1.073958,0.767186,0.448756,0.462178,0.445439
5,0.4956,1.036264,0.773602,0.511268,0.492931,0.486726
6,0.3885,1.013636,0.771769,0.516378,0.502978,0.497515
7,0.3052,1.008563,0.769936,0.534964,0.510533,0.506712
8,0.2526,0.998516,0.780018,0.595344,0.542472,0.548879
9,0.214,1.010424,0.781852,0.599075,0.550816,0.559993
10,0.184,1.023824,0.782768,0.644041,0.580671,0.595894


[I 2025-03-26 13:43:34,918] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 1.7258215396625005e-05, 'weight_decay': 0.003, 'warmup_steps': 19}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.399,3.00093,0.420715,0.096502,0.09951,0.084453
2,2.6784,2.446831,0.526123,0.183561,0.161139,0.141123
3,2.1857,2.075567,0.5967,0.281815,0.228563,0.212293
4,1.8383,1.817021,0.653529,0.314028,0.271264,0.255615
5,1.5795,1.63197,0.690192,0.332171,0.312699,0.296952
6,1.3918,1.499478,0.709441,0.360701,0.347438,0.331062
7,1.2441,1.403939,0.719523,0.389168,0.362614,0.348632
8,1.1368,1.338363,0.734189,0.426167,0.397922,0.384227
9,1.0518,1.28723,0.742438,0.423397,0.410106,0.398179
10,0.9823,1.252371,0.743355,0.420824,0.417765,0.402239


[I 2025-03-26 13:46:31,584] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 5.954553793888986e-05, 'weight_decay': 0.008, 'warmup_steps': 10}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6533,1.894241,0.625115,0.289193,0.256988,0.242139
2,1.3488,1.305623,0.746104,0.404773,0.40166,0.383059
3,0.8395,1.125017,0.762603,0.456775,0.450526,0.435955
4,0.5861,1.05071,0.768103,0.480157,0.474433,0.464943
5,0.4318,1.016618,0.771769,0.497649,0.492911,0.484976
6,0.3302,0.999938,0.773602,0.554039,0.518719,0.51695
7,0.2552,1.005029,0.777269,0.593239,0.536098,0.540026
8,0.2087,0.998983,0.783685,0.641463,0.570038,0.585656
9,0.1759,1.017802,0.784601,0.645807,0.590989,0.602962
10,0.1497,1.036667,0.786434,0.670982,0.602204,0.619668


[I 2025-03-26 13:49:32,557] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 7.475992999956501e-05, 'weight_decay': 0.006, 'warmup_steps': 2}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4544,1.672142,0.688359,0.326686,0.310781,0.29702
2,1.1159,1.182524,0.749771,0.441132,0.421358,0.409552
3,0.6572,1.062601,0.767186,0.467027,0.47259,0.45761
4,0.437,1.010823,0.769936,0.504221,0.49931,0.491929
5,0.3011,0.994802,0.780018,0.560047,0.52198,0.522992
6,0.2188,1.00551,0.785518,0.619803,0.573622,0.583387
7,0.1625,1.026962,0.790101,0.672568,0.604641,0.621858
8,0.1298,1.029195,0.789184,0.666401,0.610701,0.624257
9,0.1071,1.051578,0.785518,0.679173,0.609678,0.625774
10,0.0885,1.085384,0.789184,0.725531,0.64337,0.662894


[I 2025-03-26 13:52:29,935] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0004587604755149822, 'weight_decay': 0.002, 'warmup_steps': 1}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0168,1.012189,0.778185,0.55406,0.56009,0.548193
2,0.1348,1.117707,0.786434,0.728469,0.691829,0.695316
3,0.0536,1.251234,0.784601,0.762463,0.712827,0.724393
4,0.029,1.334953,0.778185,0.800739,0.703991,0.732888
5,0.0196,1.347624,0.794684,0.802625,0.728297,0.744976
6,0.0155,1.38588,0.791934,0.783455,0.703191,0.725167
7,0.0107,1.390878,0.802016,0.799999,0.718383,0.740194
8,0.0084,1.504664,0.784601,0.782051,0.721691,0.734309
9,0.0072,1.527934,0.789184,0.771435,0.721912,0.731802
10,0.0055,1.567686,0.785518,0.781343,0.730292,0.740534


[I 2025-03-26 13:57:00,578] Trial 10 finished with value: 0.7298882081886409 and parameters: {'learning_rate': 0.0004587604755149822, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 10 with value: 0.7298882081886409.


Trial 11 with params: {'learning_rate': 0.0004362378788055201, 'weight_decay': 0.003, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0372,0.991951,0.787351,0.577837,0.567303,0.55779
2,0.1417,1.144146,0.786434,0.711295,0.675612,0.673705
3,0.054,1.223456,0.790101,0.773586,0.733392,0.738922
4,0.0303,1.27996,0.792851,0.82162,0.709299,0.742697
5,0.0202,1.307653,0.788268,0.800675,0.699426,0.731295
6,0.0149,1.405703,0.791934,0.811528,0.702021,0.735039
7,0.0116,1.47059,0.784601,0.783611,0.708122,0.730875
8,0.0093,1.440552,0.791934,0.789359,0.730667,0.744259
9,0.0072,1.524564,0.789184,0.781213,0.712014,0.731655
10,0.0052,1.537916,0.783685,0.772991,0.70867,0.727177


[I 2025-03-26 14:01:25,681] Trial 11 finished with value: 0.7453400168527454 and parameters: {'learning_rate': 0.0004362378788055201, 'weight_decay': 0.003, 'warmup_steps': 0}. Best is trial 11 with value: 0.7453400168527454.


Trial 12 with params: {'learning_rate': 0.00040699996899648717, 'weight_decay': 0.005, 'warmup_steps': 6}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1067,0.977303,0.786434,0.579324,0.558282,0.554546
2,0.1547,1.08434,0.792851,0.716696,0.675535,0.677483
3,0.0574,1.192208,0.787351,0.771712,0.713076,0.724011
4,0.0312,1.303247,0.79835,0.820392,0.718114,0.748969
5,0.021,1.357278,0.789184,0.777776,0.712343,0.728852
6,0.0169,1.442685,0.776352,0.789747,0.69204,0.718071
7,0.0125,1.472583,0.781852,0.772709,0.708872,0.72021
8,0.01,1.467125,0.780018,0.768372,0.706474,0.72073
9,0.0077,1.498188,0.789184,0.80256,0.703776,0.733142
10,0.0061,1.572629,0.782768,0.790753,0.706266,0.729579


[I 2025-03-26 14:04:22,683] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.00047120889231092516, 'weight_decay': 0.0, 'warmup_steps': 7}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0271,0.981177,0.791934,0.601391,0.57492,0.577561
2,0.1331,1.1509,0.785518,0.730753,0.687366,0.693307
3,0.0541,1.271895,0.783685,0.793765,0.734468,0.747231
4,0.0296,1.27489,0.786434,0.734646,0.693225,0.696667
5,0.0208,1.347593,0.786434,0.796573,0.716213,0.736628
6,0.0149,1.390026,0.784601,0.812316,0.699344,0.734196
7,0.0111,1.498425,0.781852,0.791656,0.697788,0.725346
8,0.0101,1.47769,0.771769,0.759835,0.699077,0.711866
9,0.0069,1.474432,0.785518,0.775771,0.712722,0.728005
10,0.0056,1.478139,0.790101,0.788474,0.715714,0.729547


[I 2025-03-26 14:08:47,341] Trial 13 finished with value: 0.7196875427337419 and parameters: {'learning_rate': 0.00047120889231092516, 'weight_decay': 0.0, 'warmup_steps': 7}. Best is trial 11 with value: 0.7453400168527454.


Trial 14 with params: {'learning_rate': 0.0003750707646511455, 'weight_decay': 0.008, 'warmup_steps': 26}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.204,0.992989,0.780018,0.51915,0.509568,0.498536
2,0.1698,1.096906,0.789184,0.752682,0.647747,0.677164
3,0.0623,1.195739,0.792851,0.774813,0.720966,0.732592
4,0.0322,1.317065,0.791017,0.799491,0.701431,0.722698
5,0.022,1.291203,0.794684,0.77765,0.708457,0.722845
6,0.0175,1.386321,0.789184,0.795969,0.716973,0.740547
7,0.0119,1.447145,0.787351,0.781812,0.694067,0.715738
8,0.01,1.499647,0.777269,0.755069,0.706517,0.715227
9,0.0081,1.526763,0.780018,0.769885,0.700089,0.719205
10,0.0057,1.553535,0.783685,0.779508,0.702941,0.724902


[I 2025-03-26 14:11:46,080] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0002578039701724928, 'weight_decay': 0.004, 'warmup_steps': 28}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4649,1.027478,0.772686,0.462171,0.472596,0.453131
2,0.2802,1.01675,0.781852,0.688893,0.62336,0.638521
3,0.1032,1.0833,0.790101,0.729723,0.651756,0.67277
4,0.0512,1.148334,0.802933,0.827016,0.706162,0.74362
5,0.0322,1.191835,0.802933,0.795326,0.724726,0.744479
6,0.0242,1.267987,0.792851,0.801355,0.687125,0.721982
7,0.0166,1.352166,0.793767,0.777953,0.69585,0.717998
8,0.0135,1.347995,0.794684,0.783425,0.703404,0.724698
9,0.0111,1.399341,0.793767,0.777546,0.722801,0.735665
10,0.0078,1.448406,0.794684,0.818903,0.715035,0.744147


[I 2025-03-26 14:16:08,410] Trial 15 finished with value: 0.7373352475492936 and parameters: {'learning_rate': 0.0002578039701724928, 'weight_decay': 0.004, 'warmup_steps': 28}. Best is trial 11 with value: 0.7453400168527454.


Trial 16 with params: {'learning_rate': 0.0001903565661716161, 'weight_decay': 0.003, 'warmup_steps': 28}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6993,1.096948,0.762603,0.436669,0.447257,0.426031
2,0.4193,0.977895,0.789184,0.624977,0.575207,0.581526
3,0.1685,1.021845,0.789184,0.734966,0.645806,0.669967
4,0.0844,1.083307,0.790101,0.712973,0.634078,0.656684
5,0.0508,1.125195,0.802933,0.80594,0.721449,0.746445
6,0.0353,1.200926,0.790101,0.802613,0.67181,0.707829
7,0.0243,1.246181,0.7956,0.81198,0.704629,0.733016
8,0.0204,1.24823,0.8011,0.808546,0.734469,0.748826
9,0.0171,1.273036,0.793767,0.796114,0.711343,0.731013
10,0.0119,1.321468,0.79835,0.811564,0.718583,0.740024


[I 2025-03-26 14:20:37,957] Trial 16 finished with value: 0.7497354203350519 and parameters: {'learning_rate': 0.0001903565661716161, 'weight_decay': 0.003, 'warmup_steps': 28}. Best is trial 16 with value: 0.7497354203350519.


Trial 17 with params: {'learning_rate': 0.0003148149268759786, 'weight_decay': 0.0, 'warmup_steps': 27}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3204,1.005342,0.770852,0.48117,0.491518,0.474755
2,0.2135,1.065212,0.783685,0.720246,0.637771,0.658142
3,0.0778,1.147612,0.796517,0.786263,0.730591,0.742506
4,0.0387,1.205114,0.8011,0.828258,0.717883,0.750372
5,0.0257,1.240852,0.80385,0.80937,0.722871,0.746003
6,0.0185,1.321498,0.7956,0.820985,0.710781,0.742137
7,0.0132,1.367792,0.786434,0.795039,0.696083,0.722005
8,0.0103,1.43861,0.794684,0.799594,0.73315,0.748024
9,0.0087,1.47373,0.786434,0.801149,0.722395,0.743197
10,0.0061,1.463719,0.7956,0.802937,0.726457,0.744217


[I 2025-03-26 14:25:05,680] Trial 17 finished with value: 0.7580050071093316 and parameters: {'learning_rate': 0.0003148149268759786, 'weight_decay': 0.0, 'warmup_steps': 27}. Best is trial 17 with value: 0.7580050071093316.


Trial 18 with params: {'learning_rate': 0.00021317012046880978, 'weight_decay': 0.0, 'warmup_steps': 30}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6139,1.062475,0.766269,0.451242,0.453385,0.433896
2,0.3632,0.975847,0.788268,0.624841,0.583806,0.589896
3,0.1388,1.046318,0.793767,0.723678,0.645774,0.662452
4,0.0686,1.094177,0.79835,0.788287,0.677043,0.713105
5,0.0428,1.16575,0.793767,0.788735,0.714877,0.734674
6,0.0301,1.210843,0.787351,0.790247,0.685893,0.71519
7,0.0206,1.269847,0.7956,0.808013,0.697374,0.725944
8,0.0172,1.27979,0.796517,0.802325,0.727926,0.743951
9,0.0145,1.312556,0.79835,0.800806,0.725343,0.741858
10,0.0102,1.361781,0.794684,0.801182,0.717915,0.738121


[I 2025-03-26 14:29:37,179] Trial 18 finished with value: 0.7515829644349895 and parameters: {'learning_rate': 0.00021317012046880978, 'weight_decay': 0.0, 'warmup_steps': 30}. Best is trial 17 with value: 0.7580050071093316.


Trial 19 with params: {'learning_rate': 0.0003820528120429927, 'weight_decay': 0.0, 'warmup_steps': 29}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1999,0.991758,0.775435,0.536317,0.517879,0.509648
2,0.1669,1.102166,0.788268,0.716461,0.644369,0.663451
3,0.0618,1.20052,0.793767,0.768549,0.722436,0.73002
4,0.0316,1.274727,0.791017,0.809326,0.69035,0.719521
5,0.0209,1.338713,0.788268,0.768698,0.709008,0.721768


[I 2025-03-26 14:31:04,828] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.000361812949512664, 'weight_decay': 0.004, 'warmup_steps': 43}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2678,1.000523,0.772686,0.526352,0.511373,0.502507
2,0.1826,1.065666,0.7956,0.74151,0.655441,0.677514
3,0.0669,1.131336,0.802933,0.801231,0.726302,0.739596
4,0.035,1.239411,0.791934,0.823397,0.704186,0.739662
5,0.0236,1.304322,0.79835,0.804602,0.723181,0.738929
6,0.0175,1.296278,0.8011,0.826712,0.708686,0.742163
7,0.0121,1.434993,0.787351,0.792951,0.698793,0.714824
8,0.0103,1.424732,0.791017,0.790857,0.723844,0.732272
9,0.0078,1.503668,0.791934,0.808244,0.73227,0.748684
10,0.0057,1.526368,0.789184,0.802857,0.721084,0.738989


[I 2025-03-26 14:35:31,225] Trial 20 finished with value: 0.7370887780689291 and parameters: {'learning_rate': 0.000361812949512664, 'weight_decay': 0.004, 'warmup_steps': 43}. Best is trial 17 with value: 0.7580050071093316.


Trial 21 with params: {'learning_rate': 0.00011026679403682653, 'weight_decay': 0.001, 'warmup_steps': 31}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1679,1.365226,0.739688,0.39373,0.385999,0.364381
2,0.7785,1.069919,0.761687,0.483172,0.468636,0.461949
3,0.4055,0.998044,0.776352,0.51129,0.510098,0.499486
4,0.2353,1.016451,0.780018,0.646376,0.567255,0.585562
5,0.1475,1.025737,0.784601,0.680811,0.614708,0.63001


[I 2025-03-26 14:36:59,100] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.00021395919910874566, 'weight_decay': 0.0, 'warmup_steps': 22}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5923,1.06449,0.767186,0.428614,0.460625,0.435892
2,0.3592,0.992285,0.782768,0.616803,0.578763,0.581397
3,0.137,1.068351,0.791017,0.715481,0.644659,0.662727
4,0.0687,1.116451,0.794684,0.786813,0.670329,0.708278
5,0.0425,1.14169,0.802016,0.790859,0.715172,0.738254
6,0.0304,1.20351,0.793767,0.790571,0.690604,0.72098
7,0.0206,1.280643,0.792851,0.790239,0.685968,0.71512
8,0.0177,1.283796,0.794684,0.781728,0.713356,0.729808
9,0.0145,1.307708,0.8011,0.786277,0.714999,0.733
10,0.0101,1.361951,0.802016,0.813911,0.731742,0.752438


[I 2025-03-26 14:41:23,189] Trial 22 finished with value: 0.7569498805852943 and parameters: {'learning_rate': 0.00021395919910874566, 'weight_decay': 0.0, 'warmup_steps': 22}. Best is trial 17 with value: 0.7580050071093316.


Trial 23 with params: {'learning_rate': 0.00013556631398918, 'weight_decay': 0.0, 'warmup_steps': 25}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9761,1.242712,0.748854,0.405633,0.409496,0.388975
2,0.6221,1.025029,0.771769,0.487094,0.487781,0.47809
3,0.2918,0.996974,0.781852,0.603113,0.56599,0.570564
4,0.1581,1.043222,0.781852,0.679024,0.600858,0.624626
5,0.0961,1.059476,0.786434,0.688594,0.617765,0.635399


[I 2025-03-26 14:42:51,342] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.00045779559037543175, 'weight_decay': 0.001, 'warmup_steps': 40}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1221,1.001019,0.782768,0.563925,0.559342,0.550362
2,0.1352,1.13016,0.783685,0.721901,0.664846,0.675905
3,0.0526,1.221628,0.789184,0.763983,0.709411,0.716075
4,0.0293,1.302753,0.793767,0.777357,0.725442,0.734343
5,0.0204,1.339757,0.790101,0.780491,0.699378,0.723717
6,0.0147,1.34265,0.800183,0.816784,0.716026,0.743868
7,0.0092,1.461431,0.796517,0.789704,0.695651,0.721186
8,0.0088,1.518393,0.785518,0.784181,0.702108,0.722768
9,0.0064,1.635905,0.784601,0.769093,0.696124,0.713827
10,0.0049,1.563534,0.789184,0.80322,0.713689,0.734797


[I 2025-03-26 14:47:24,220] Trial 24 finished with value: 0.7268618251744781 and parameters: {'learning_rate': 0.00045779559037543175, 'weight_decay': 0.001, 'warmup_steps': 40}. Best is trial 17 with value: 0.7580050071093316.


Trial 25 with params: {'learning_rate': 0.00029316020225973684, 'weight_decay': 0.0, 'warmup_steps': 22}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3572,1.013097,0.768103,0.4739,0.481209,0.464683
2,0.2328,1.049055,0.785518,0.74188,0.642819,0.669344
3,0.0841,1.099239,0.799267,0.758618,0.713146,0.723086
4,0.0424,1.162179,0.797434,0.800125,0.706135,0.733969
5,0.0272,1.247058,0.797434,0.776338,0.729331,0.739473
6,0.0198,1.29213,0.796517,0.82159,0.70683,0.741728
7,0.0132,1.371214,0.783685,0.783454,0.695446,0.714175
8,0.0115,1.377844,0.789184,0.777486,0.711204,0.724464
9,0.0097,1.41134,0.790101,0.77755,0.706367,0.723955
10,0.0069,1.44152,0.792851,0.793771,0.714029,0.732433


[I 2025-03-26 14:51:49,077] Trial 25 finished with value: 0.7432670059707221 and parameters: {'learning_rate': 0.00029316020225973684, 'weight_decay': 0.0, 'warmup_steps': 22}. Best is trial 17 with value: 0.7580050071093316.


Trial 26 with params: {'learning_rate': 0.00021183183333503693, 'weight_decay': 0.001, 'warmup_steps': 33}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6257,1.064072,0.766269,0.432558,0.451489,0.430218
2,0.3671,0.975822,0.786434,0.62205,0.580157,0.585901
3,0.1405,1.042242,0.794684,0.730978,0.649264,0.668624
4,0.0691,1.098554,0.7956,0.801375,0.688687,0.724733
5,0.0428,1.166963,0.791934,0.778802,0.712948,0.731318
6,0.0303,1.236592,0.788268,0.816931,0.690061,0.725432
7,0.021,1.288869,0.792851,0.807211,0.694807,0.724703
8,0.0174,1.297768,0.787351,0.797699,0.724346,0.738049
9,0.0148,1.313787,0.793767,0.781176,0.711893,0.728151
10,0.0104,1.358748,0.793767,0.805283,0.719624,0.739739


[I 2025-03-26 14:56:13,754] Trial 26 finished with value: 0.7492893231772806 and parameters: {'learning_rate': 0.00021183183333503693, 'weight_decay': 0.001, 'warmup_steps': 33}. Best is trial 17 with value: 0.7580050071093316.


Trial 27 with params: {'learning_rate': 3.392171417341792e-05, 'weight_decay': 0.001, 'warmup_steps': 52}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1007,2.474713,0.51604,0.164406,0.154103,0.133092
2,2.004,1.77764,0.676444,0.312077,0.299828,0.28341
3,1.416,1.434303,0.715857,0.369344,0.352275,0.334557
4,1.075,1.261288,0.745188,0.37698,0.407094,0.381066
5,0.8592,1.166448,0.759853,0.420335,0.43796,0.418647


[I 2025-03-26 14:57:41,941] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0004660178751295263, 'weight_decay': 0.001, 'warmup_steps': 23}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0741,1.018013,0.773602,0.569529,0.559863,0.55121
2,0.1348,1.096953,0.785518,0.770966,0.688494,0.706298
3,0.0529,1.253013,0.786434,0.75299,0.723614,0.724736
4,0.0287,1.31674,0.792851,0.826807,0.70931,0.743332
5,0.0199,1.364364,0.790101,0.789409,0.711109,0.732129
6,0.0138,1.408139,0.7956,0.81548,0.705371,0.735882
7,0.01,1.59447,0.777269,0.779085,0.690162,0.715239
8,0.0083,1.581487,0.784601,0.768445,0.702788,0.717602
9,0.007,1.582771,0.787351,0.795781,0.713435,0.737677
10,0.0051,1.620097,0.791017,0.780053,0.727249,0.740082


[I 2025-03-26 15:02:17,632] Trial 28 finished with value: 0.7329138871656129 and parameters: {'learning_rate': 0.0004660178751295263, 'weight_decay': 0.001, 'warmup_steps': 23}. Best is trial 17 with value: 0.7580050071093316.


Trial 29 with params: {'learning_rate': 2.0641950878300647e-05, 'weight_decay': 0.003, 'warmup_steps': 0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.308,2.865723,0.447296,0.108563,0.113069,0.095446
2,2.5045,2.267059,0.572869,0.216079,0.200183,0.185259
3,1.9792,1.886768,0.630614,0.299019,0.259754,0.242226
4,1.624,1.636927,0.687443,0.333688,0.31005,0.293281
5,1.3686,1.468848,0.707608,0.33762,0.335254,0.315498
6,1.1885,1.355462,0.730522,0.385137,0.379966,0.36265
7,1.0492,1.276898,0.744271,0.388397,0.402319,0.381387
8,0.9511,1.224587,0.747938,0.386726,0.411815,0.385115
9,0.8734,1.186583,0.750687,0.410494,0.426566,0.406537
10,0.8093,1.159403,0.752521,0.419172,0.431577,0.411451


[I 2025-03-26 15:05:16,091] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 6.987985617740108e-05, 'weight_decay': 0.0, 'warmup_steps': 17}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5354,1.742593,0.675527,0.315473,0.295664,0.281621
2,1.1858,1.214891,0.748854,0.407289,0.413791,0.394192
3,0.7091,1.077266,0.769019,0.448725,0.468826,0.444165
4,0.4799,1.022524,0.767186,0.471236,0.476512,0.467897
5,0.3381,0.994959,0.769936,0.556209,0.512408,0.511101
6,0.2494,0.99732,0.780018,0.616999,0.563026,0.575758
7,0.1874,1.015383,0.786434,0.648794,0.583776,0.600168
8,0.1507,1.015381,0.786434,0.665089,0.604939,0.619498
9,0.125,1.039915,0.785518,0.666166,0.600638,0.614792
10,0.1045,1.064718,0.785518,0.703862,0.621312,0.64182


[I 2025-03-26 15:08:12,632] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.00014924666987052023, 'weight_decay': 0.003, 'warmup_steps': 21}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.886,1.191772,0.75802,0.424148,0.427409,0.40833
2,0.5576,1.000096,0.776352,0.512692,0.508387,0.499622
3,0.2497,0.987286,0.789184,0.651371,0.597153,0.607417
4,0.1312,1.050511,0.789184,0.707745,0.625997,0.651737
5,0.0794,1.071304,0.792851,0.711789,0.649744,0.664648


[I 2025-03-26 15:09:40,602] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 6.345426898630038e-05, 'weight_decay': 0.007, 'warmup_steps': 36}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6472,1.855612,0.633364,0.290985,0.262393,0.246879
2,1.298,1.275791,0.747021,0.406549,0.40567,0.38291
3,0.7931,1.108657,0.767186,0.459571,0.462479,0.443625
4,0.5472,1.042066,0.770852,0.480383,0.479177,0.471436
5,0.3972,1.010589,0.770852,0.495499,0.497778,0.486828
6,0.2994,0.999449,0.777269,0.572397,0.531203,0.535702
7,0.2287,1.011061,0.780935,0.616364,0.559521,0.568063
8,0.1855,1.006341,0.788268,0.650559,0.586293,0.602378
9,0.1553,1.028409,0.787351,0.644232,0.592316,0.603085
10,0.131,1.04943,0.788268,0.692261,0.612784,0.631086


[I 2025-03-26 15:12:48,442] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.00017430004274843807, 'weight_decay': 0.0, 'warmup_steps': 26}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7667,1.126739,0.764436,0.413167,0.442648,0.417568
2,0.4669,0.985336,0.778185,0.595319,0.541703,0.545147
3,0.1952,1.008497,0.788268,0.709793,0.636971,0.654383
4,0.0992,1.069866,0.790101,0.717072,0.63096,0.656054
5,0.0596,1.104326,0.796517,0.796636,0.717823,0.74221
6,0.0405,1.187187,0.782768,0.779801,0.663537,0.697101
7,0.0286,1.239047,0.788268,0.800949,0.692596,0.722262
8,0.023,1.219584,0.799267,0.812345,0.72678,0.746152
9,0.019,1.263839,0.793767,0.800712,0.705104,0.732188
10,0.0136,1.306304,0.794684,0.816305,0.712985,0.737721


[I 2025-03-26 15:17:18,646] Trial 33 finished with value: 0.7528584883043424 and parameters: {'learning_rate': 0.00017430004274843807, 'weight_decay': 0.0, 'warmup_steps': 26}. Best is trial 17 with value: 0.7580050071093316.


Trial 34 with params: {'learning_rate': 0.00015787988695294925, 'weight_decay': 0.0, 'warmup_steps': 23}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8428,1.165748,0.762603,0.426338,0.438101,0.420167
2,0.524,0.993953,0.776352,0.510956,0.508922,0.499078
3,0.2285,0.991997,0.788268,0.644318,0.596789,0.603871
4,0.1187,1.058587,0.791017,0.705606,0.625937,0.650848
5,0.0717,1.089444,0.794684,0.75873,0.688762,0.70923
6,0.0487,1.157826,0.782768,0.77045,0.660472,0.692302
7,0.0343,1.207285,0.791017,0.807708,0.694496,0.723009
8,0.0268,1.188141,0.806599,0.799156,0.737388,0.749069
9,0.0224,1.240221,0.789184,0.787788,0.683068,0.711845
10,0.0166,1.27769,0.794684,0.806872,0.710345,0.731232


[I 2025-03-26 15:21:45,802] Trial 34 finished with value: 0.7431685718388262 and parameters: {'learning_rate': 0.00015787988695294925, 'weight_decay': 0.0, 'warmup_steps': 23}. Best is trial 17 with value: 0.7580050071093316.


Trial 35 with params: {'learning_rate': 0.00022960780811284495, 'weight_decay': 0.0, 'warmup_steps': 32}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5611,1.047609,0.765353,0.436304,0.459997,0.436472
2,0.3309,0.988089,0.791017,0.648417,0.595499,0.605383
3,0.1236,1.064147,0.791934,0.737404,0.662007,0.68191
4,0.0611,1.123247,0.794684,0.786428,0.675511,0.710648
5,0.0381,1.152096,0.799267,0.79752,0.720778,0.743422
6,0.0266,1.21124,0.794684,0.795212,0.690237,0.720518
7,0.0187,1.256299,0.792851,0.80026,0.705905,0.73031
8,0.0154,1.305656,0.790101,0.786047,0.715616,0.729304
9,0.0129,1.288958,0.800183,0.790716,0.719884,0.736235
10,0.0091,1.374259,0.793767,0.808462,0.722988,0.744764


[I 2025-03-26 15:26:16,590] Trial 35 finished with value: 0.7594440660912989 and parameters: {'learning_rate': 0.00022960780811284495, 'weight_decay': 0.0, 'warmup_steps': 32}. Best is trial 35 with value: 0.7594440660912989.


Trial 36 with params: {'learning_rate': 1.0625556226593494e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5564,3.277113,0.344638,0.041242,0.071194,0.049556
2,3.0488,2.866364,0.450046,0.107878,0.117442,0.10126
3,2.6763,2.55588,0.496792,0.128688,0.142019,0.117496
4,2.3858,2.315833,0.545371,0.215387,0.177761,0.160297
5,2.15,2.130701,0.590284,0.252087,0.219675,0.202746
6,1.9677,1.984557,0.608616,0.277066,0.236241,0.221381
7,1.8178,1.867865,0.630614,0.269416,0.252817,0.236675
8,1.7022,1.776535,0.662695,0.296337,0.278332,0.261177
9,1.6075,1.702992,0.67736,0.349868,0.297492,0.281517
10,1.5304,1.646366,0.683776,0.342437,0.303762,0.287951


[I 2025-03-26 15:29:14,102] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.00015825915775592185, 'weight_decay': 0.0, 'warmup_steps': 11}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8132,1.16678,0.762603,0.420051,0.438593,0.416749
2,0.5186,0.991947,0.776352,0.533309,0.515211,0.51013
3,0.2261,0.995811,0.791017,0.645175,0.602662,0.608594
4,0.1181,1.054857,0.791017,0.697514,0.621272,0.6443
5,0.0711,1.09304,0.79835,0.767121,0.690611,0.710609


[I 2025-03-26 15:30:42,600] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0003170124206449671, 'weight_decay': 0.0, 'warmup_steps': 34}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3326,1.011052,0.768103,0.477851,0.483923,0.469884
2,0.2135,1.061812,0.784601,0.740442,0.642259,0.66743
3,0.0789,1.16924,0.784601,0.74654,0.70876,0.710595
4,0.0414,1.213153,0.793767,0.810219,0.696029,0.720801
5,0.0261,1.275878,0.802016,0.803856,0.71578,0.739202
6,0.0196,1.327408,0.7956,0.800474,0.716923,0.742138
7,0.0134,1.430923,0.785518,0.784838,0.705203,0.723722
8,0.0105,1.45979,0.787351,0.789648,0.730571,0.742841
9,0.0094,1.491943,0.780935,0.804922,0.712877,0.737068
10,0.007,1.509696,0.785518,0.760814,0.719105,0.721953


[I 2025-03-26 15:33:36,526] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.00013253735630179916, 'weight_decay': 0.009000000000000001, 'warmup_steps': 53}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0557,1.264311,0.741522,0.371309,0.392818,0.367256
2,0.6508,1.037162,0.765353,0.486087,0.489683,0.481139
3,0.3121,0.984352,0.788268,0.629428,0.57683,0.5865
4,0.1687,1.01791,0.787351,0.670489,0.60407,0.621558
5,0.1022,1.049518,0.790101,0.691023,0.622411,0.638495
6,0.0686,1.088089,0.790101,0.745014,0.645716,0.673217
7,0.0476,1.139142,0.796517,0.798913,0.687651,0.716713
8,0.0372,1.15283,0.80385,0.792086,0.710992,0.729412
9,0.0296,1.185607,0.79835,0.793364,0.702396,0.726269
10,0.023,1.234649,0.797434,0.808519,0.713928,0.734818


[I 2025-03-26 15:38:01,437] Trial 39 finished with value: 0.7404671524632447 and parameters: {'learning_rate': 0.00013253735630179916, 'weight_decay': 0.009000000000000001, 'warmup_steps': 53}. Best is trial 35 with value: 0.7594440660912989.


Trial 40 with params: {'learning_rate': 0.00012436274551017268, 'weight_decay': 0.0, 'warmup_steps': 37}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.075,1.292103,0.740605,0.381365,0.391683,0.368192
2,0.6894,1.04702,0.764436,0.492,0.479422,0.474726
3,0.3408,0.98915,0.781852,0.574796,0.541511,0.541973
4,0.1892,1.01983,0.785518,0.685082,0.603872,0.628382
5,0.116,1.035485,0.788268,0.690299,0.617528,0.635862


[I 2025-03-26 15:39:29,298] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0002216100860329021, 'weight_decay': 0.0, 'warmup_steps': 26}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5745,1.054667,0.765353,0.431765,0.459001,0.436005
2,0.344,0.990796,0.787351,0.622569,0.579235,0.585162
3,0.1302,1.07257,0.788268,0.723161,0.645718,0.664601
4,0.0651,1.119638,0.792851,0.790062,0.672433,0.710665
5,0.0408,1.170836,0.796517,0.791366,0.711515,0.734233
6,0.0286,1.222312,0.791934,0.774312,0.679063,0.707403
7,0.0194,1.286571,0.7956,0.798173,0.684722,0.717438
8,0.0162,1.278331,0.796517,0.772696,0.718609,0.729551
9,0.0134,1.301972,0.804766,0.811447,0.719119,0.744972
10,0.0096,1.375505,0.796517,0.805204,0.722767,0.744102


[I 2025-03-26 15:43:54,361] Trial 41 finished with value: 0.7593151714771309 and parameters: {'learning_rate': 0.0002216100860329021, 'weight_decay': 0.0, 'warmup_steps': 26}. Best is trial 35 with value: 0.7594440660912989.


Trial 42 with params: {'learning_rate': 1.0600021319893152e-05, 'weight_decay': 0.005, 'warmup_steps': 49}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5726,3.289061,0.342805,0.042988,0.070592,0.050293
2,3.0597,2.875604,0.447296,0.108748,0.116003,0.100348
3,2.685,2.563314,0.494959,0.10942,0.141077,0.116378
4,2.3929,2.321894,0.541705,0.217391,0.173759,0.155802
5,2.1561,2.136004,0.590284,0.251093,0.219675,0.203444


[I 2025-03-26 15:45:24,222] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.00024680468112840877, 'weight_decay': 0.0, 'warmup_steps': 18}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.473,1.037048,0.773602,0.447927,0.474522,0.452579
2,0.2941,0.998802,0.789184,0.700334,0.621414,0.642356
3,0.1082,1.075449,0.794684,0.759272,0.694726,0.711261
4,0.0537,1.12176,0.800183,0.807596,0.703013,0.735085
5,0.0341,1.192193,0.805683,0.799816,0.724901,0.746347
6,0.0244,1.226431,0.799267,0.81891,0.706659,0.740091
7,0.0171,1.295973,0.794684,0.815621,0.706793,0.737959
8,0.0146,1.307988,0.797434,0.801059,0.725024,0.743949
9,0.0118,1.3746,0.7956,0.793341,0.722026,0.742294
10,0.0086,1.358015,0.802933,0.809978,0.724414,0.74713


[I 2025-03-26 15:49:51,369] Trial 43 finished with value: 0.7614573567738171 and parameters: {'learning_rate': 0.00024680468112840877, 'weight_decay': 0.0, 'warmup_steps': 18}. Best is trial 43 with value: 0.7614573567738171.


Trial 44 with params: {'learning_rate': 0.0002733656803199959, 'weight_decay': 0.001, 'warmup_steps': 19}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4002,1.022626,0.773602,0.472187,0.4798,0.461996
2,0.2561,1.016799,0.793767,0.740003,0.639283,0.666293
3,0.0932,1.113124,0.790101,0.762688,0.682775,0.703596
4,0.0464,1.160813,0.79835,0.828941,0.694659,0.735955
5,0.0301,1.223989,0.802933,0.805993,0.720026,0.745111
6,0.0214,1.261537,0.8011,0.804877,0.708194,0.738842
7,0.0153,1.315923,0.791017,0.784836,0.705002,0.720892
8,0.013,1.375143,0.793767,0.793448,0.725041,0.740606
9,0.0103,1.42021,0.791934,0.804441,0.714473,0.738916
10,0.0078,1.432115,0.791934,0.800824,0.71302,0.736657


[I 2025-03-26 15:52:54,555] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0003707086290945264, 'weight_decay': 0.0, 'warmup_steps': 18}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1931,1.000822,0.772686,0.516713,0.520263,0.50464
2,0.1733,1.070531,0.791017,0.737202,0.645712,0.669838
3,0.0638,1.166224,0.802016,0.787676,0.729472,0.740447
4,0.0321,1.263318,0.791934,0.818599,0.708221,0.737846
5,0.0235,1.306428,0.786434,0.789846,0.713454,0.734975
6,0.0171,1.39301,0.796517,0.82173,0.702895,0.736189
7,0.0121,1.426687,0.783685,0.768153,0.691606,0.711952
8,0.01,1.496403,0.782768,0.779654,0.724195,0.735942
9,0.0072,1.534069,0.786434,0.778366,0.709809,0.723898
10,0.0056,1.509087,0.788268,0.808507,0.707606,0.732605


[I 2025-03-26 15:55:57,648] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0001660129744875839, 'weight_decay': 0.0, 'warmup_steps': 22}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7981,1.144022,0.762603,0.421055,0.440833,0.41963
2,0.4934,0.988858,0.777269,0.574017,0.534182,0.533921
3,0.2108,1.001307,0.789184,0.689344,0.615632,0.632048
4,0.1086,1.066251,0.791934,0.716145,0.63301,0.657508
5,0.065,1.100182,0.793767,0.761175,0.689078,0.709413


[I 2025-03-26 15:57:27,557] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.00019528036767985562, 'weight_decay': 0.002, 'warmup_steps': 15}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6488,1.092727,0.762603,0.425906,0.451394,0.428972
2,0.4017,0.97989,0.789184,0.625102,0.582287,0.587253
3,0.1584,1.04202,0.785518,0.726734,0.645729,0.665554
4,0.0798,1.094353,0.797434,0.763084,0.658349,0.693458
5,0.0485,1.131351,0.799267,0.795781,0.723611,0.744625
6,0.0345,1.194931,0.786434,0.789509,0.695307,0.724269
7,0.024,1.251784,0.793767,0.80106,0.697108,0.724614
8,0.0198,1.268923,0.792851,0.803247,0.720643,0.738876
9,0.0161,1.291044,0.791017,0.790769,0.713938,0.732637
10,0.0119,1.354426,0.794684,0.800693,0.715339,0.734341


[I 2025-03-26 16:01:55,953] Trial 47 finished with value: 0.7549818038549151 and parameters: {'learning_rate': 0.00019528036767985562, 'weight_decay': 0.002, 'warmup_steps': 15}. Best is trial 43 with value: 0.7614573567738171.


Trial 48 with params: {'learning_rate': 0.00030654958421000896, 'weight_decay': 0.0, 'warmup_steps': 31}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3488,1.016664,0.770852,0.474497,0.486248,0.471716
2,0.2228,1.054986,0.791017,0.752061,0.638129,0.669234
3,0.0816,1.115013,0.7956,0.760461,0.704854,0.714176
4,0.0421,1.217213,0.796517,0.814962,0.686896,0.723035
5,0.0266,1.286146,0.796517,0.792538,0.718235,0.729928
6,0.0196,1.291729,0.7956,0.801415,0.720886,0.741199
7,0.0133,1.362294,0.789184,0.763926,0.699307,0.709726
8,0.0105,1.387871,0.789184,0.793921,0.728443,0.74096
9,0.0091,1.444538,0.791017,0.801066,0.71491,0.734761
10,0.0063,1.453181,0.790101,0.792028,0.722828,0.728057


[I 2025-03-26 16:04:51,629] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.00032562008657011007, 'weight_decay': 0.0, 'warmup_steps': 17}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2747,1.019665,0.772686,0.500323,0.499312,0.485844
2,0.2034,1.044708,0.793767,0.72642,0.655212,0.673581
3,0.0748,1.139545,0.792851,0.783305,0.731718,0.740069
4,0.0367,1.190928,0.796517,0.823246,0.713875,0.744602
5,0.0244,1.317345,0.789184,0.795814,0.7093,0.73112
6,0.0195,1.346438,0.7956,0.819467,0.717201,0.747505
7,0.0128,1.454784,0.788268,0.805598,0.701812,0.726987
8,0.0116,1.425999,0.791934,0.791692,0.702544,0.722505
9,0.009,1.445811,0.784601,0.7885,0.696898,0.723115
10,0.0062,1.520325,0.781852,0.795014,0.706633,0.729667


[I 2025-03-26 16:07:49,527] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0003186834743092994, 'weight_decay': 0.002, 'warmup_steps': 31}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3225,1.009114,0.770852,0.488718,0.497965,0.483624
2,0.2123,1.071272,0.781852,0.708324,0.624715,0.645763
3,0.0779,1.161519,0.7956,0.773156,0.720794,0.730318
4,0.0393,1.245341,0.7956,0.835394,0.699367,0.737985
5,0.0251,1.318093,0.79835,0.80364,0.710656,0.738389
6,0.019,1.337886,0.800183,0.813473,0.705326,0.739405
7,0.0121,1.43657,0.790101,0.810494,0.699847,0.729712
8,0.0099,1.437814,0.792851,0.799907,0.71925,0.737495
9,0.0088,1.498787,0.7956,0.786478,0.700265,0.722915
10,0.0062,1.540602,0.781852,0.775186,0.698954,0.718796


[I 2025-03-26 16:10:45,619] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.00023281979342328887, 'weight_decay': 0.003, 'warmup_steps': 13}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.506,1.048246,0.771769,0.440287,0.469407,0.444667
2,0.3179,0.998708,0.784601,0.640047,0.599452,0.602449
3,0.1191,1.071617,0.791017,0.742618,0.674002,0.691091
4,0.059,1.105311,0.80385,0.811786,0.706784,0.740778
5,0.0373,1.144164,0.805683,0.797791,0.720475,0.743057
6,0.0262,1.207784,0.79835,0.809762,0.696381,0.730898
7,0.0179,1.248129,0.800183,0.796143,0.701113,0.726338
8,0.0153,1.274865,0.802933,0.79765,0.727202,0.746132
9,0.0122,1.354671,0.7956,0.778513,0.725358,0.737
10,0.009,1.368731,0.802016,0.808752,0.729427,0.749472


[I 2025-03-26 16:15:16,581] Trial 51 finished with value: 0.7476596443437336 and parameters: {'learning_rate': 0.00023281979342328887, 'weight_decay': 0.003, 'warmup_steps': 13}. Best is trial 43 with value: 0.7614573567738171.


Trial 52 with params: {'learning_rate': 0.00011668980669530862, 'weight_decay': 0.002, 'warmup_steps': 4}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0634,1.322559,0.743355,0.411713,0.391492,0.375506
2,0.7257,1.048254,0.766269,0.452855,0.469392,0.449475
3,0.3664,0.999403,0.777269,0.599305,0.545654,0.547469
4,0.2083,1.026378,0.779102,0.637202,0.58085,0.595518
5,0.1294,1.040285,0.788268,0.699434,0.623789,0.640029


[I 2025-03-26 16:16:44,198] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 5.4372545807912146e-05, 'weight_decay': 0.003, 'warmup_steps': 18}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7363,1.988297,0.614115,0.260534,0.246468,0.229649
2,1.4511,1.365308,0.736022,0.383962,0.385368,0.367541
3,0.9217,1.158616,0.76077,0.447447,0.439613,0.424804
4,0.653,1.070519,0.768103,0.449544,0.463644,0.446669
5,0.4908,1.030979,0.772686,0.49994,0.491979,0.48513
6,0.3834,1.006745,0.770852,0.514926,0.498735,0.494568
7,0.3012,1.005039,0.770852,0.555955,0.513353,0.5117
8,0.2495,0.995978,0.780935,0.595226,0.544343,0.550632
9,0.2115,1.008551,0.780018,0.598474,0.549443,0.559012
10,0.1819,1.022222,0.780935,0.642919,0.570005,0.585533


[I 2025-03-26 16:19:39,472] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.00018505388513671345, 'weight_decay': 0.001, 'warmup_steps': 19}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7018,1.108233,0.762603,0.432916,0.448099,0.425315
2,0.4318,0.981841,0.780018,0.598392,0.545543,0.549212
3,0.1758,1.027927,0.785518,0.704964,0.632475,0.650119
4,0.0888,1.085366,0.791017,0.724426,0.64641,0.671194
5,0.0532,1.124371,0.802933,0.798774,0.716904,0.740131
6,0.0374,1.194287,0.786434,0.778082,0.671009,0.701493
7,0.026,1.244809,0.792851,0.791481,0.692411,0.720315
8,0.0213,1.242593,0.797434,0.803828,0.732827,0.746154
9,0.018,1.261483,0.794684,0.795118,0.717299,0.736462
10,0.013,1.322766,0.794684,0.798977,0.711296,0.731602


[I 2025-03-26 16:22:39,725] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.00017151642114437876, 'weight_decay': 0.002, 'warmup_steps': 16}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.758,1.134736,0.76352,0.439367,0.451321,0.429653
2,0.4726,0.986848,0.775435,0.568731,0.522977,0.521236
3,0.1992,1.017571,0.784601,0.683481,0.618674,0.632181
4,0.1026,1.072778,0.788268,0.704131,0.620288,0.645542
5,0.0613,1.110647,0.7956,0.788009,0.703468,0.729743
6,0.0424,1.166988,0.781852,0.771373,0.67138,0.701301
7,0.0294,1.234042,0.791017,0.79369,0.693019,0.720342
8,0.0242,1.235919,0.7956,0.781922,0.733267,0.73852
9,0.0196,1.265089,0.790101,0.790694,0.711753,0.73288
10,0.0149,1.314796,0.791017,0.80797,0.705969,0.730167


[I 2025-03-26 16:25:36,136] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0004559917486250674, 'weight_decay': 0.008, 'warmup_steps': 47}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1433,0.98442,0.784601,0.562072,0.556493,0.549109
2,0.1379,1.113497,0.789184,0.730287,0.677563,0.687927
3,0.0528,1.186081,0.783685,0.764281,0.721945,0.72773
4,0.0285,1.287833,0.788268,0.808684,0.691623,0.722155
5,0.0199,1.399917,0.785518,0.764847,0.707379,0.718767


[I 2025-03-26 16:27:05,143] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00022026095138811904, 'weight_decay': 0.0, 'warmup_steps': 25}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5768,1.056191,0.765353,0.423649,0.459001,0.432862
2,0.3467,0.991478,0.783685,0.619179,0.575994,0.581673
3,0.1313,1.073586,0.787351,0.709626,0.6408,0.657209
4,0.0655,1.124066,0.794684,0.792654,0.672375,0.712245
5,0.0407,1.181803,0.797434,0.790095,0.7151,0.735917
6,0.0293,1.209414,0.792851,0.774889,0.681868,0.709387
7,0.0198,1.297697,0.792851,0.790595,0.695149,0.720683
8,0.0167,1.291734,0.793767,0.784029,0.710981,0.729079
9,0.0137,1.303843,0.802016,0.787103,0.71526,0.733548
10,0.0096,1.342404,0.797434,0.800806,0.721474,0.74157


[I 2025-03-26 16:31:30,905] Trial 57 finished with value: 0.7553492696618385 and parameters: {'learning_rate': 0.00022026095138811904, 'weight_decay': 0.0, 'warmup_steps': 25}. Best is trial 43 with value: 0.7614573567738171.


Trial 58 with params: {'learning_rate': 0.00022838128102949855, 'weight_decay': 0.001, 'warmup_steps': 31}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5627,1.048589,0.765353,0.436533,0.461036,0.437074
2,0.3328,0.987411,0.791017,0.64816,0.59481,0.604872
3,0.1247,1.06492,0.792851,0.737778,0.662462,0.682332
4,0.0616,1.123302,0.793767,0.792143,0.68165,0.71581
5,0.0384,1.158282,0.799267,0.8001,0.718936,0.743375
6,0.027,1.207419,0.792851,0.785659,0.693283,0.717546
7,0.0188,1.271012,0.796517,0.816667,0.709997,0.738434
8,0.0152,1.310009,0.786434,0.793538,0.711828,0.730358
9,0.013,1.310345,0.794684,0.803729,0.707123,0.732107
10,0.0091,1.375498,0.794684,0.806032,0.714362,0.736703


[I 2025-03-26 16:35:54,998] Trial 58 finished with value: 0.7572175863370011 and parameters: {'learning_rate': 0.00022838128102949855, 'weight_decay': 0.001, 'warmup_steps': 31}. Best is trial 43 with value: 0.7614573567738171.


Trial 59 with params: {'learning_rate': 0.000488100307012158, 'weight_decay': 0.01, 'warmup_steps': 5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0063,0.997581,0.781852,0.57343,0.565288,0.560727
2,0.1312,1.103322,0.791017,0.696587,0.69103,0.677517
3,0.0533,1.28041,0.781852,0.737157,0.727947,0.715913
4,0.0283,1.385798,0.785518,0.813331,0.707312,0.733815
5,0.0206,1.366357,0.787351,0.790558,0.70521,0.728882
6,0.0146,1.393983,0.788268,0.81674,0.714724,0.745153
7,0.0105,1.549152,0.770852,0.752197,0.708204,0.714373
8,0.0093,1.519226,0.789184,0.771071,0.742154,0.741973
9,0.0066,1.62364,0.777269,0.772456,0.711596,0.725077
10,0.0055,1.699825,0.771769,0.767737,0.70138,0.716489


[I 2025-03-26 16:38:50,446] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.00014559095280735742, 'weight_decay': 0.003, 'warmup_steps': 52}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9726,1.215368,0.746104,0.376941,0.399546,0.375155
2,0.5876,1.014102,0.770852,0.478613,0.48728,0.477455
3,0.2683,0.982206,0.790101,0.651684,0.589098,0.602177
4,0.1408,1.026834,0.791934,0.716049,0.622694,0.647812
5,0.0851,1.062654,0.793767,0.731383,0.651945,0.671658
6,0.0569,1.110788,0.787351,0.767143,0.658796,0.686476
7,0.0394,1.171508,0.7956,0.802946,0.695672,0.722702
8,0.0309,1.173003,0.799267,0.788141,0.70468,0.724447
9,0.025,1.213168,0.789184,0.798584,0.698719,0.724211
10,0.0189,1.26961,0.792851,0.813769,0.71085,0.735595


[I 2025-03-26 16:43:15,600] Trial 60 finished with value: 0.7474401137474939 and parameters: {'learning_rate': 0.00014559095280735742, 'weight_decay': 0.003, 'warmup_steps': 52}. Best is trial 43 with value: 0.7614573567738171.


Trial 61 with params: {'learning_rate': 0.0001710225097123558, 'weight_decay': 0.002, 'warmup_steps': 32}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7956,1.140007,0.764436,0.421676,0.446403,0.422737
2,0.4799,0.995792,0.774519,0.570105,0.520719,0.520593
3,0.2024,1.005992,0.791934,0.690954,0.628003,0.643292
4,0.1031,1.067135,0.789184,0.739201,0.630473,0.6642
5,0.0618,1.10565,0.791017,0.795296,0.705978,0.731977
6,0.042,1.164189,0.782768,0.786537,0.67978,0.711388
7,0.0288,1.218936,0.788268,0.79257,0.692081,0.717738
8,0.023,1.226201,0.7956,0.794861,0.740195,0.746469
9,0.0195,1.26979,0.791934,0.795732,0.717634,0.738632
10,0.0144,1.318711,0.797434,0.814021,0.735107,0.754491


[I 2025-03-26 16:47:45,739] Trial 61 finished with value: 0.7505329953914958 and parameters: {'learning_rate': 0.0001710225097123558, 'weight_decay': 0.002, 'warmup_steps': 32}. Best is trial 43 with value: 0.7614573567738171.


Trial 62 with params: {'learning_rate': 0.0001958626342476988, 'weight_decay': 0.0, 'warmup_steps': 34}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6899,1.090379,0.76077,0.431576,0.449274,0.42789
2,0.4071,0.986249,0.781852,0.596929,0.553621,0.557786
3,0.1615,1.040181,0.788268,0.740311,0.647475,0.670526
4,0.0804,1.098215,0.790101,0.761979,0.657565,0.692591
5,0.0488,1.145965,0.7956,0.79097,0.718941,0.740006
6,0.0338,1.223634,0.787351,0.794371,0.683504,0.713113
7,0.0232,1.254593,0.790101,0.794449,0.693766,0.71997
8,0.0188,1.262753,0.799267,0.803916,0.721393,0.739047
9,0.0159,1.286861,0.788268,0.778221,0.703967,0.723954
10,0.0117,1.341999,0.793767,0.813309,0.720652,0.744876


[I 2025-03-26 16:52:12,455] Trial 62 finished with value: 0.7459247513678526 and parameters: {'learning_rate': 0.0001958626342476988, 'weight_decay': 0.0, 'warmup_steps': 34}. Best is trial 43 with value: 0.7614573567738171.


Trial 63 with params: {'learning_rate': 0.0003618979921821318, 'weight_decay': 0.0, 'warmup_steps': 29}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2333,0.987719,0.778185,0.539375,0.514457,0.507656
2,0.1779,1.055642,0.789184,0.71242,0.640391,0.660404
3,0.0657,1.136807,0.797434,0.794241,0.723805,0.743055
4,0.0331,1.217102,0.797434,0.826421,0.699802,0.736237
5,0.0238,1.282956,0.80385,0.815242,0.728261,0.753199
6,0.0174,1.322119,0.792851,0.813729,0.704269,0.736454
7,0.0122,1.410541,0.791017,0.812483,0.721667,0.743605
8,0.01,1.440624,0.786434,0.783643,0.713674,0.726752
9,0.0077,1.47336,0.792851,0.787556,0.734794,0.740014
10,0.0058,1.483653,0.791934,0.799592,0.725142,0.743011


[I 2025-03-26 16:56:37,530] Trial 63 finished with value: 0.7517262413608128 and parameters: {'learning_rate': 0.0003618979921821318, 'weight_decay': 0.0, 'warmup_steps': 29}. Best is trial 43 with value: 0.7614573567738171.


Trial 64 with params: {'learning_rate': 0.00018600237286757807, 'weight_decay': 0.0, 'warmup_steps': 26}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7136,1.104619,0.761687,0.431412,0.446875,0.425033
2,0.4309,0.980051,0.786434,0.620949,0.56668,0.572319
3,0.1748,1.018417,0.789184,0.71283,0.636155,0.656489
4,0.0881,1.078526,0.791934,0.74603,0.642558,0.674761
5,0.0528,1.121779,0.802016,0.808254,0.721379,0.746869
6,0.0365,1.202155,0.786434,0.788366,0.670704,0.703439
7,0.0256,1.24235,0.791017,0.810791,0.701382,0.729506
8,0.0214,1.229892,0.8011,0.808271,0.726173,0.74438
9,0.0178,1.254128,0.7956,0.788614,0.702181,0.725641
10,0.0124,1.323011,0.7956,0.808617,0.714862,0.736134


[I 2025-03-26 17:01:02,146] Trial 64 finished with value: 0.743789217878992 and parameters: {'learning_rate': 0.00018600237286757807, 'weight_decay': 0.0, 'warmup_steps': 26}. Best is trial 43 with value: 0.7614573567738171.


Trial 65 with params: {'learning_rate': 3.80735517457004e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 52}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0307,2.364526,0.538955,0.214982,0.1686,0.151416
2,1.8736,1.664928,0.695692,0.31426,0.315696,0.297956
3,1.2879,1.350646,0.732356,0.380318,0.378613,0.358791
4,0.9613,1.201541,0.753437,0.399099,0.41586,0.393311
5,0.7589,1.121272,0.764436,0.47532,0.45425,0.44349


[I 2025-03-26 17:02:32,862] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0001936676378507846, 'weight_decay': 0.0, 'warmup_steps': 26}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6807,1.090102,0.764436,0.437269,0.44814,0.426939
2,0.4098,0.974939,0.787351,0.62431,0.574303,0.580716
3,0.1633,1.024008,0.790101,0.734258,0.646007,0.669646
4,0.0815,1.087722,0.786434,0.750981,0.64333,0.677041
5,0.0494,1.138873,0.802933,0.808551,0.726979,0.751096
6,0.0344,1.198337,0.787351,0.795063,0.670015,0.703533
7,0.0237,1.251718,0.794684,0.814456,0.69997,0.730942
8,0.0199,1.22793,0.804766,0.818019,0.736631,0.75343
9,0.0164,1.263813,0.792851,0.788789,0.703458,0.725023
10,0.012,1.326154,0.800183,0.8184,0.718442,0.742689


[I 2025-03-26 17:06:57,961] Trial 66 finished with value: 0.7444848795918415 and parameters: {'learning_rate': 0.0001936676378507846, 'weight_decay': 0.0, 'warmup_steps': 26}. Best is trial 43 with value: 0.7614573567738171.


Trial 67 with params: {'learning_rate': 0.00020923283410242662, 'weight_decay': 0.002, 'warmup_steps': 27}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6215,1.066782,0.764436,0.4322,0.451281,0.429562
2,0.371,0.978614,0.789184,0.619435,0.576095,0.582264
3,0.1427,1.044595,0.793767,0.729305,0.648036,0.667227
4,0.071,1.102748,0.794684,0.786397,0.667431,0.705561
5,0.0436,1.168062,0.797434,0.789143,0.717432,0.736049
6,0.031,1.206475,0.790101,0.815223,0.696211,0.730586
7,0.0213,1.282636,0.793767,0.79789,0.707938,0.730476
8,0.0177,1.285828,0.793767,0.797336,0.72406,0.739482
9,0.0143,1.325891,0.791017,0.772886,0.702755,0.721096
10,0.0102,1.365301,0.793767,0.790115,0.723477,0.739531


[I 2025-03-26 17:11:23,605] Trial 67 finished with value: 0.7496323789255831 and parameters: {'learning_rate': 0.00020923283410242662, 'weight_decay': 0.002, 'warmup_steps': 27}. Best is trial 43 with value: 0.7614573567738171.


Trial 68 with params: {'learning_rate': 0.00024411971233489693, 'weight_decay': 0.0, 'warmup_steps': 21}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4892,1.042042,0.773602,0.436505,0.469587,0.442206
2,0.3008,1.010998,0.782768,0.668408,0.60774,0.619628
3,0.111,1.087314,0.7956,0.749611,0.690904,0.704378
4,0.0554,1.139716,0.8011,0.817453,0.703846,0.738687
5,0.0349,1.178025,0.80385,0.794725,0.736174,0.751594
6,0.0251,1.250435,0.789184,0.800176,0.697429,0.727876
7,0.0159,1.318796,0.791934,0.793433,0.715901,0.735213
8,0.0141,1.314361,0.791934,0.79155,0.731612,0.741664
9,0.0114,1.378727,0.799267,0.802363,0.721325,0.743744
10,0.0084,1.40856,0.797434,0.809687,0.726097,0.748222


[I 2025-03-26 17:15:49,600] Trial 68 finished with value: 0.7569942667986648 and parameters: {'learning_rate': 0.00024411971233489693, 'weight_decay': 0.0, 'warmup_steps': 21}. Best is trial 43 with value: 0.7614573567738171.


Trial 69 with params: {'learning_rate': 0.0004057130166874689, 'weight_decay': 0.001, 'warmup_steps': 22}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1487,0.996201,0.775435,0.539523,0.540735,0.527873
2,0.1538,1.089427,0.779102,0.70355,0.64456,0.657366
3,0.0584,1.221926,0.784601,0.741492,0.723536,0.716888
4,0.0309,1.228943,0.80385,0.814993,0.724105,0.747432
5,0.023,1.335163,0.8011,0.802727,0.730664,0.750033
6,0.0145,1.429819,0.787351,0.819378,0.703746,0.738803
7,0.0109,1.426396,0.792851,0.800772,0.728351,0.746821
8,0.0092,1.436127,0.791017,0.797217,0.723134,0.740948
9,0.0068,1.506298,0.787351,0.791771,0.705866,0.730354
10,0.0051,1.559281,0.791017,0.813305,0.70385,0.734544


[I 2025-03-26 17:18:45,350] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00021393900311971698, 'weight_decay': 0.0, 'warmup_steps': 43}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6412,1.068157,0.762603,0.414044,0.451102,0.423898
2,0.3655,0.99628,0.782768,0.599517,0.561671,0.565783
3,0.1408,1.040769,0.790101,0.733437,0.644221,0.668679
4,0.0688,1.111449,0.786434,0.777575,0.668909,0.701589
5,0.0418,1.149258,0.797434,0.788463,0.717311,0.738327
6,0.0294,1.218283,0.785518,0.782249,0.677563,0.708338
7,0.0197,1.261164,0.792851,0.782639,0.687207,0.712861
8,0.0165,1.290267,0.796517,0.816504,0.733198,0.751836
9,0.0137,1.294208,0.799267,0.795351,0.737061,0.752533
10,0.0104,1.363521,0.793767,0.811816,0.718841,0.743085


[I 2025-03-26 17:27:39,324] Trial 71 finished with value: 0.7511571494431414 and parameters: {'learning_rate': 0.00019324488581664128, 'weight_decay': 0.0, 'warmup_steps': 17}. Best is trial 43 with value: 0.7614573567738171.


Trial 72 with params: {'learning_rate': 1.0579248993606617e-05, 'weight_decay': 0.001, 'warmup_steps': 24}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5614,3.281887,0.343721,0.041842,0.070831,0.049721
2,3.054,2.871706,0.449129,0.108133,0.116988,0.100976
3,2.682,2.561445,0.496792,0.129413,0.142019,0.117561
4,2.3917,2.321326,0.543538,0.21458,0.176747,0.159263
5,2.156,2.136175,0.590284,0.250898,0.219675,0.203355
6,1.9737,1.989987,0.607699,0.27623,0.234703,0.219357
7,1.8238,1.87315,0.630614,0.269416,0.252817,0.236675
8,1.708,1.781676,0.659945,0.295498,0.275904,0.259059
9,1.6133,1.707974,0.676444,0.329925,0.29644,0.280091
10,1.5361,1.651296,0.68286,0.343086,0.302334,0.286006


[I 2025-03-26 17:30:35,885] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 1.1597714681187563e-05, 'weight_decay': 0.01, 'warmup_steps': 46}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5471,3.243661,0.349221,0.060739,0.072978,0.050427
2,2.9987,2.806111,0.463795,0.108678,0.125013,0.106808
3,2.6021,2.477805,0.510541,0.145431,0.152448,0.130669
4,2.2965,2.231525,0.572869,0.223461,0.201348,0.186349
5,2.0527,2.041326,0.6022,0.279574,0.232028,0.215506


[I 2025-03-26 17:32:03,825] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.00021125465408003215, 'weight_decay': 0.0, 'warmup_steps': 19}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.596,1.071731,0.769019,0.428034,0.461627,0.435575
2,0.3647,0.994984,0.781852,0.618957,0.573232,0.576906
3,0.1398,1.07254,0.788268,0.714904,0.644095,0.661766
4,0.0705,1.11072,0.800183,0.782517,0.674497,0.709223
5,0.0434,1.144529,0.802016,0.78822,0.728563,0.745866
6,0.0311,1.217567,0.788268,0.783398,0.690105,0.717511
7,0.0207,1.285059,0.791934,0.795947,0.691905,0.719427
8,0.0177,1.28082,0.796517,0.77995,0.723362,0.735475
9,0.014,1.350022,0.792851,0.781367,0.71629,0.72951
10,0.0106,1.375788,0.791934,0.787396,0.707224,0.72867


[I 2025-03-26 17:35:00,289] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0002693184568595071, 'weight_decay': 0.0, 'warmup_steps': 26}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4289,1.022661,0.773602,0.477617,0.481045,0.464269
2,0.2638,1.021115,0.788268,0.72339,0.63382,0.656428
3,0.0963,1.100843,0.791017,0.750084,0.681985,0.69842
4,0.0487,1.155289,0.804766,0.821202,0.713824,0.745956
5,0.0305,1.226586,0.800183,0.794781,0.712688,0.736304
6,0.0221,1.248427,0.8011,0.795959,0.708294,0.73431
7,0.0148,1.353295,0.797434,0.791743,0.709977,0.730212
8,0.0129,1.350851,0.7956,0.788743,0.721909,0.736055
9,0.0098,1.400038,0.790101,0.786293,0.719201,0.73606
10,0.0072,1.440171,0.791017,0.804884,0.71443,0.740225


[I 2025-03-26 17:39:26,980] Trial 75 finished with value: 0.7389478819994442 and parameters: {'learning_rate': 0.0002693184568595071, 'weight_decay': 0.0, 'warmup_steps': 26}. Best is trial 43 with value: 0.7614573567738171.


Trial 76 with params: {'learning_rate': 8.607572187821745e-05, 'weight_decay': 0.0, 'warmup_steps': 27}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3773,1.555238,0.698442,0.329498,0.326861,0.309861
2,0.9903,1.132702,0.76077,0.449641,0.447833,0.433766
3,0.5602,1.03674,0.768103,0.467125,0.483313,0.470271
4,0.358,0.999518,0.772686,0.538227,0.51001,0.506201
5,0.2376,0.990279,0.787351,0.644272,0.582676,0.59766
6,0.1678,1.01607,0.792851,0.671895,0.607601,0.622723
7,0.1218,1.039301,0.791934,0.692401,0.618631,0.637027
8,0.0959,1.046291,0.79835,0.717333,0.629914,0.652322
9,0.0781,1.078653,0.784601,0.703488,0.629274,0.647571
10,0.0636,1.118888,0.785518,0.72615,0.640917,0.662915


[I 2025-03-26 17:42:24,221] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0003326480917613857, 'weight_decay': 0.002, 'warmup_steps': 30}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2906,0.996704,0.774519,0.493698,0.500971,0.487001
2,0.1994,1.066844,0.786434,0.736944,0.64359,0.669485
3,0.0728,1.166412,0.786434,0.771106,0.718604,0.730451
4,0.0391,1.233389,0.799267,0.831601,0.7155,0.749142
5,0.0242,1.278121,0.79835,0.810407,0.712495,0.735304
6,0.0184,1.285451,0.800183,0.80245,0.705629,0.734994
7,0.0131,1.386691,0.791934,0.818577,0.714366,0.740728
8,0.0103,1.455107,0.791017,0.800727,0.701464,0.725378
9,0.0089,1.473217,0.785518,0.809296,0.697267,0.731059
10,0.0058,1.442048,0.79835,0.797391,0.735529,0.745878


[I 2025-03-26 17:46:49,086] Trial 77 finished with value: 0.7478097189536502 and parameters: {'learning_rate': 0.0003326480917613857, 'weight_decay': 0.002, 'warmup_steps': 30}. Best is trial 43 with value: 0.7614573567738171.


Trial 78 with params: {'learning_rate': 4.2739403038429994e-05, 'weight_decay': 0.005, 'warmup_steps': 47}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9508,2.251523,0.567369,0.247147,0.195856,0.182189
2,1.7412,1.55941,0.706691,0.321914,0.332422,0.31154
3,1.1651,1.278785,0.742438,0.379272,0.398892,0.376209
4,0.8545,1.149507,0.758937,0.437526,0.43647,0.419364
5,0.6654,1.086558,0.765353,0.471736,0.463218,0.449263


[I 2025-03-26 17:48:17,216] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.00022156741883185956, 'weight_decay': 0.0, 'warmup_steps': 24}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5696,1.055272,0.766269,0.426521,0.459911,0.435046
2,0.3437,0.996064,0.781852,0.616017,0.575253,0.579072
3,0.13,1.075054,0.786434,0.702617,0.63292,0.64951
4,0.065,1.122568,0.796517,0.789395,0.672628,0.709767
5,0.0403,1.169313,0.79835,0.793334,0.713313,0.737646
6,0.0288,1.222373,0.792851,0.800699,0.691253,0.723945
7,0.0196,1.30246,0.791017,0.79719,0.685633,0.718054
8,0.0169,1.286108,0.792851,0.774116,0.713292,0.726229
9,0.0139,1.318022,0.796517,0.785236,0.703225,0.724552
10,0.0099,1.352986,0.796517,0.805766,0.723623,0.744667


[I 2025-03-26 17:52:42,435] Trial 79 finished with value: 0.7430381833839068 and parameters: {'learning_rate': 0.00022156741883185956, 'weight_decay': 0.0, 'warmup_steps': 24}. Best is trial 43 with value: 0.7614573567738171.


Trial 80 with params: {'learning_rate': 0.0004957651934502081, 'weight_decay': 0.0, 'warmup_steps': 23}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.041,1.008247,0.779102,0.56141,0.544298,0.540201
2,0.1277,1.145359,0.788268,0.712165,0.663426,0.670152
3,0.0491,1.273507,0.780018,0.742655,0.716762,0.717981
4,0.0298,1.288861,0.789184,0.810958,0.709322,0.740353
5,0.0202,1.438598,0.784601,0.783102,0.727469,0.739816
6,0.0146,1.433441,0.788268,0.813417,0.713738,0.744362
7,0.0107,1.525307,0.788268,0.779024,0.720999,0.732923
8,0.0087,1.524702,0.788268,0.797345,0.724365,0.743922
9,0.0071,1.603594,0.785518,0.809286,0.716208,0.742105
10,0.0052,1.670404,0.781852,0.796945,0.704121,0.732394


[I 2025-03-26 17:55:38,673] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 1.0855908291649989e-05, 'weight_decay': 0.0, 'warmup_steps': 41}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5627,3.274936,0.344638,0.041807,0.071194,0.049806
2,3.0419,2.856313,0.450962,0.107809,0.117657,0.101462
3,2.6622,2.540044,0.500458,0.128489,0.143822,0.11951
4,2.3668,2.297519,0.548121,0.218534,0.180964,0.163247
5,2.1285,2.110815,0.590284,0.251402,0.220059,0.202738
6,1.9445,1.963695,0.610449,0.272911,0.236983,0.220907
7,1.7937,1.846821,0.637947,0.267545,0.25741,0.24094
8,1.6775,1.755653,0.668194,0.338906,0.285125,0.270183
9,1.5826,1.682158,0.68011,0.337697,0.298674,0.282414
10,1.5053,1.625962,0.687443,0.329212,0.305764,0.289472


[I 2025-03-26 17:58:34,621] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0002891902282670203, 'weight_decay': 0.0, 'warmup_steps': 7}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3313,1.025506,0.771769,0.495449,0.494917,0.482574
2,0.2383,1.041964,0.790101,0.716999,0.638235,0.659332
3,0.0875,1.116303,0.794684,0.792314,0.718596,0.736787
4,0.0447,1.186444,0.7956,0.804339,0.700265,0.729762
5,0.0288,1.288338,0.793767,0.808276,0.695346,0.729185
6,0.0205,1.350289,0.791017,0.817239,0.712209,0.743021
7,0.0141,1.405231,0.785518,0.805449,0.702453,0.730977
8,0.0126,1.360173,0.79835,0.796157,0.720997,0.737786
9,0.0095,1.429174,0.792851,0.799259,0.702365,0.725805
10,0.0068,1.464486,0.792851,0.789314,0.705806,0.72793


[I 2025-03-26 18:01:43,368] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.00019832646378364438, 'weight_decay': 0.003, 'warmup_steps': 19}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6462,1.08759,0.76352,0.42009,0.451658,0.425077
2,0.3957,0.985557,0.785518,0.614996,0.570496,0.576622
3,0.1558,1.047704,0.790101,0.725358,0.641344,0.662207
4,0.0785,1.10051,0.793767,0.776903,0.675292,0.70732
5,0.0476,1.141599,0.79835,0.781993,0.722487,0.738725
6,0.0339,1.19387,0.788268,0.786023,0.682282,0.713338
7,0.023,1.253866,0.797434,0.794253,0.694681,0.722309
8,0.0192,1.261803,0.7956,0.791386,0.72858,0.739497
9,0.0163,1.279509,0.787351,0.771067,0.703748,0.71853
10,0.0117,1.345941,0.7956,0.804146,0.71947,0.741243


[I 2025-03-26 18:06:15,070] Trial 83 finished with value: 0.7523746683231832 and parameters: {'learning_rate': 0.00019832646378364438, 'weight_decay': 0.003, 'warmup_steps': 19}. Best is trial 43 with value: 0.7614573567738171.


Trial 84 with params: {'learning_rate': 0.0003687217369305351, 'weight_decay': 0.003, 'warmup_steps': 17}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1943,1.001749,0.774519,0.520167,0.525209,0.509704
2,0.1763,1.075462,0.782768,0.717784,0.632765,0.654865
3,0.0661,1.191773,0.786434,0.771108,0.721341,0.728532
4,0.0332,1.259847,0.797434,0.813975,0.71388,0.744963
5,0.0229,1.285054,0.788268,0.791925,0.702051,0.725034


[I 2025-03-26 18:07:43,348] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.00018353869474219, 'weight_decay': 0.0, 'warmup_steps': 32}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7374,1.110794,0.76352,0.42829,0.447077,0.424844
2,0.4402,0.988153,0.776352,0.597072,0.535694,0.540197
3,0.1801,1.021644,0.787351,0.712352,0.633354,0.653923
4,0.0905,1.08342,0.789184,0.756592,0.653522,0.686257
5,0.0544,1.110541,0.797434,0.804356,0.716099,0.741503
6,0.0373,1.180168,0.787351,0.779897,0.671064,0.704494
7,0.0259,1.231575,0.790101,0.790262,0.694845,0.719168
8,0.021,1.225517,0.799267,0.795614,0.719998,0.737175
9,0.0179,1.262912,0.793767,0.787995,0.715548,0.734881
10,0.0129,1.323153,0.79835,0.816041,0.730634,0.750846


[I 2025-03-26 18:12:11,211] Trial 85 finished with value: 0.752148995927677 and parameters: {'learning_rate': 0.00018353869474219, 'weight_decay': 0.0, 'warmup_steps': 32}. Best is trial 43 with value: 0.7614573567738171.


Trial 86 with params: {'learning_rate': 4.0534446710776905e-05, 'weight_decay': 0.01, 'warmup_steps': 11}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9401,2.276172,0.565536,0.239845,0.193548,0.180334
2,1.7818,1.596093,0.704858,0.341156,0.324496,0.308052
3,1.2114,1.306585,0.741522,0.37861,0.3959,0.374655
4,0.8973,1.17142,0.756187,0.43921,0.42972,0.415943
5,0.704,1.101656,0.764436,0.477498,0.460128,0.448779


[I 2025-03-26 18:13:39,523] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.00010223215028219842, 'weight_decay': 0.002, 'warmup_steps': 23}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2182,1.412389,0.728689,0.351322,0.36591,0.341517
2,0.8346,1.08369,0.759853,0.426652,0.455373,0.431055
3,0.4459,1.006231,0.769936,0.480663,0.500983,0.483302
4,0.2667,1.010143,0.780018,0.62442,0.556927,0.570614
5,0.1703,1.016674,0.784601,0.637821,0.59582,0.602334
6,0.1167,1.048949,0.787351,0.686869,0.606801,0.628601
7,0.0834,1.071805,0.797434,0.735709,0.651548,0.672015
8,0.0642,1.097301,0.79835,0.81205,0.704523,0.729731
9,0.0523,1.134763,0.789184,0.78936,0.677722,0.708767
10,0.0421,1.178502,0.784601,0.780714,0.695805,0.718053


[I 2025-03-26 18:16:35,034] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 2.8421889789283416e-05, 'weight_decay': 0.007, 'warmup_steps': 25}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1732,2.621999,0.485793,0.103814,0.135388,0.111409
2,2.188,1.950622,0.617782,0.270578,0.245207,0.224745
3,1.6149,1.580114,0.703025,0.342244,0.323243,0.307764
4,1.2608,1.372948,0.72319,0.360303,0.363947,0.342604
5,1.0265,1.253776,0.743355,0.378037,0.400241,0.376715
6,0.8686,1.172712,0.753437,0.422683,0.431215,0.413143
7,0.7483,1.121256,0.76352,0.450157,0.446985,0.429996
8,0.6675,1.090278,0.768103,0.480686,0.46437,0.451802
9,0.6026,1.072729,0.769019,0.507844,0.474628,0.465409
10,0.5508,1.05711,0.768103,0.532428,0.479582,0.473011


[I 2025-03-26 18:19:30,990] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.00021236351538358482, 'weight_decay': 0.002, 'warmup_steps': 21}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5959,1.067597,0.766269,0.425186,0.46023,0.433808
2,0.3624,0.992832,0.782768,0.617266,0.578889,0.581699
3,0.1386,1.068973,0.791934,0.716848,0.645023,0.663795
4,0.0697,1.111979,0.799267,0.790397,0.676502,0.715565
5,0.0431,1.15371,0.8011,0.782069,0.719234,0.737373
6,0.0307,1.210745,0.788268,0.79161,0.688782,0.719714
7,0.0209,1.282017,0.793767,0.79178,0.687726,0.716046
8,0.0179,1.279752,0.794684,0.78977,0.713825,0.733956
9,0.0141,1.291146,0.80385,0.791789,0.715631,0.73509
10,0.0105,1.351209,0.7956,0.788604,0.716034,0.734323


[I 2025-03-26 18:22:28,826] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.00021132069647997404, 'weight_decay': 0.0, 'warmup_steps': 15}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5854,1.070158,0.769019,0.434313,0.462762,0.439449
2,0.3623,0.983667,0.786434,0.615037,0.589782,0.588977
3,0.1385,1.055049,0.789184,0.723892,0.64664,0.663764
4,0.0695,1.095825,0.797434,0.788719,0.679743,0.714759
5,0.043,1.133614,0.802016,0.794762,0.720254,0.743519
6,0.0307,1.192623,0.790101,0.814183,0.685922,0.724364
7,0.0215,1.246189,0.794684,0.811855,0.697994,0.730969
8,0.0175,1.260379,0.789184,0.795201,0.691847,0.719861
9,0.0136,1.302141,0.79835,0.792023,0.72815,0.744936
10,0.0103,1.340954,0.796517,0.798267,0.707658,0.731639


[I 2025-03-26 18:25:23,959] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.00014745364610342952, 'weight_decay': 0.0, 'warmup_steps': 27}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9084,1.201298,0.75802,0.417934,0.429086,0.411375
2,0.5674,1.006971,0.773602,0.479879,0.497004,0.482303
3,0.2556,0.996254,0.786434,0.649396,0.587373,0.599961
4,0.135,1.054873,0.784601,0.678785,0.601853,0.625677
5,0.0813,1.070136,0.791934,0.739475,0.667687,0.687225
6,0.0546,1.135798,0.783685,0.769878,0.674188,0.700557
7,0.0383,1.191172,0.789184,0.796442,0.702069,0.724041
8,0.0296,1.187198,0.79835,0.793522,0.72152,0.737511
9,0.0248,1.239887,0.790101,0.789217,0.7082,0.727407
10,0.0188,1.277359,0.790101,0.794827,0.726503,0.738724


[I 2025-03-26 18:29:48,288] Trial 91 finished with value: 0.7466199181329478 and parameters: {'learning_rate': 0.00014745364610342952, 'weight_decay': 0.0, 'warmup_steps': 27}. Best is trial 43 with value: 0.7614573567738171.


Trial 92 with params: {'learning_rate': 0.00022444623275857606, 'weight_decay': 0.0, 'warmup_steps': 25}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.562,1.05207,0.765353,0.431297,0.458963,0.436147
2,0.3381,0.997316,0.783685,0.619119,0.582135,0.586841
3,0.1275,1.074886,0.789184,0.706757,0.635832,0.653387
4,0.0638,1.126029,0.791017,0.785767,0.665809,0.704074
5,0.0399,1.167208,0.799267,0.793218,0.712135,0.735282
6,0.0282,1.21999,0.794684,0.793201,0.696472,0.72393
7,0.0195,1.279549,0.791934,0.794703,0.685131,0.715371
8,0.0167,1.295902,0.7956,0.774358,0.715186,0.728624
9,0.0136,1.308391,0.79835,0.779756,0.703746,0.723105
10,0.0094,1.383204,0.791934,0.795439,0.72156,0.739554


[I 2025-03-26 18:34:15,134] Trial 92 finished with value: 0.7583362476463165 and parameters: {'learning_rate': 0.00022444623275857606, 'weight_decay': 0.0, 'warmup_steps': 25}. Best is trial 43 with value: 0.7614573567738171.


Trial 93 with params: {'learning_rate': 0.00016270738163122646, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7918,1.155212,0.764436,0.42165,0.446212,0.423039
2,0.5017,0.988433,0.777269,0.552059,0.523574,0.519528
3,0.2163,1.000687,0.788268,0.643187,0.600822,0.606811
4,0.1124,1.055753,0.793767,0.721605,0.62922,0.656688
5,0.0674,1.100757,0.792851,0.759817,0.691261,0.711033
6,0.0461,1.160984,0.783685,0.762853,0.655154,0.685183
7,0.0322,1.215389,0.788268,0.791559,0.691511,0.7187
8,0.0261,1.203449,0.8011,0.799624,0.735402,0.746821
9,0.0213,1.26196,0.787351,0.783219,0.710995,0.728369
10,0.0159,1.30125,0.791017,0.803846,0.709508,0.729784


[I 2025-03-26 18:37:12,522] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0003193050618049994, 'weight_decay': 0.0, 'warmup_steps': 35}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3301,1.010213,0.769936,0.481552,0.484538,0.47166
2,0.2123,1.066484,0.777269,0.732715,0.636249,0.659602
3,0.078,1.146333,0.792851,0.758637,0.709654,0.717869
4,0.0403,1.26039,0.793767,0.824985,0.693148,0.731901
5,0.0264,1.252647,0.800183,0.794083,0.720414,0.737742
6,0.0193,1.318067,0.7956,0.793712,0.70137,0.723125
7,0.0126,1.369838,0.784601,0.78537,0.695466,0.712464
8,0.0103,1.396246,0.79835,0.78548,0.711258,0.726682
9,0.0087,1.520243,0.790101,0.768005,0.717296,0.72595
10,0.0065,1.477016,0.7956,0.777715,0.72145,0.729892


[I 2025-03-26 18:40:12,983] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.00023822071579578048, 'weight_decay': 0.0, 'warmup_steps': 21}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5075,1.045757,0.774519,0.441561,0.470436,0.445785
2,0.3112,1.009107,0.778185,0.661051,0.605479,0.614456
3,0.1161,1.087095,0.792851,0.738409,0.662238,0.682665
4,0.0579,1.142994,0.800183,0.82157,0.703239,0.73926
5,0.0362,1.169389,0.802016,0.800119,0.717486,0.740659
6,0.0263,1.234681,0.7956,0.810956,0.705715,0.735852
7,0.0169,1.323992,0.792851,0.798914,0.697958,0.726027
8,0.0145,1.334612,0.7956,0.80159,0.726031,0.74245
9,0.0115,1.356112,0.794684,0.789714,0.718634,0.737559
10,0.0088,1.403568,0.797434,0.808858,0.734844,0.754175


[I 2025-03-26 18:44:38,474] Trial 95 finished with value: 0.7569455197553512 and parameters: {'learning_rate': 0.00023822071579578048, 'weight_decay': 0.0, 'warmup_steps': 21}. Best is trial 43 with value: 0.7614573567738171.


Trial 96 with params: {'learning_rate': 0.00033673304519787884, 'weight_decay': 0.0, 'warmup_steps': 24}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2683,1.003683,0.774519,0.484569,0.497134,0.481564
2,0.1941,1.078478,0.784601,0.717641,0.646522,0.659485
3,0.071,1.17705,0.786434,0.783061,0.732527,0.739119
4,0.0363,1.241735,0.788268,0.80133,0.720349,0.736548
5,0.0242,1.341781,0.793767,0.812752,0.698764,0.731664
6,0.018,1.330974,0.792851,0.817194,0.69835,0.73228
7,0.0128,1.40275,0.791017,0.791199,0.706895,0.727156
8,0.0105,1.440652,0.792851,0.801089,0.706124,0.7329
9,0.009,1.508214,0.792851,0.808186,0.721853,0.744907
10,0.0062,1.496335,0.791934,0.793951,0.726066,0.741226


[I 2025-03-26 18:49:06,115] Trial 96 finished with value: 0.7481376438758849 and parameters: {'learning_rate': 0.00033673304519787884, 'weight_decay': 0.0, 'warmup_steps': 24}. Best is trial 43 with value: 0.7614573567738171.


Trial 97 with params: {'learning_rate': 0.00012531422646113865, 'weight_decay': 0.0, 'warmup_steps': 19}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0331,1.284317,0.742438,0.386843,0.39433,0.373657
2,0.6766,1.038145,0.769936,0.475554,0.482008,0.471117
3,0.3304,0.989601,0.782768,0.601192,0.560089,0.562319
4,0.1829,1.030524,0.780935,0.667784,0.58889,0.611772
5,0.1122,1.037813,0.790101,0.686698,0.619709,0.635871
6,0.0759,1.098817,0.784601,0.726445,0.631832,0.655168
7,0.0535,1.131553,0.79835,0.77057,0.668599,0.693921
8,0.0413,1.147055,0.796517,0.801762,0.716693,0.736443
9,0.0342,1.180617,0.792851,0.783714,0.689026,0.71456
10,0.0264,1.238666,0.791017,0.793148,0.707268,0.724018


[I 2025-03-26 18:52:02,972] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.00013299797880802797, 'weight_decay': 0.002, 'warmup_steps': 36}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0142,1.254882,0.744271,0.385128,0.395805,0.374482
2,0.6416,1.035298,0.766269,0.475993,0.480928,0.471836
3,0.3066,0.988472,0.788268,0.603761,0.569229,0.573054
4,0.1662,1.028276,0.785518,0.674348,0.601324,0.622444
5,0.1012,1.04521,0.789184,0.686195,0.619806,0.635663
6,0.0678,1.112764,0.786434,0.76175,0.658579,0.686031
7,0.0475,1.159552,0.797434,0.804898,0.696553,0.7229
8,0.0367,1.168892,0.7956,0.786572,0.702298,0.72086
9,0.0297,1.201563,0.790101,0.798089,0.697473,0.723268
10,0.0228,1.252131,0.792851,0.805459,0.709774,0.731768


[I 2025-03-26 18:55:00,570] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0002852207883885828, 'weight_decay': 0.0, 'warmup_steps': 19}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3703,1.01867,0.773602,0.478306,0.483149,0.466719
2,0.2408,1.028592,0.787351,0.735762,0.637079,0.662222
3,0.087,1.127788,0.785518,0.759275,0.697272,0.710976
4,0.044,1.185486,0.797434,0.825537,0.695661,0.73588
5,0.0283,1.21319,0.79835,0.796125,0.733746,0.74758
6,0.0204,1.280735,0.80385,0.817908,0.712345,0.74307
7,0.0147,1.375015,0.791017,0.788551,0.702514,0.722507
8,0.0121,1.399865,0.794684,0.791533,0.712669,0.730331
9,0.0101,1.43329,0.7956,0.807669,0.721074,0.743726
10,0.0078,1.465977,0.793767,0.793808,0.717189,0.734604


[I 2025-03-26 18:57:56,813] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.00027722459123952225, 'weight_decay': 0.002, 'warmup_steps': 25}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4052,1.019569,0.772686,0.477397,0.482835,0.46538
2,0.2533,1.03152,0.786434,0.720029,0.633332,0.654135
3,0.0933,1.10334,0.791017,0.747428,0.692007,0.702084
4,0.047,1.185938,0.796517,0.824902,0.697278,0.735459
5,0.0292,1.224787,0.79835,0.779386,0.715859,0.731521
6,0.0213,1.280092,0.794684,0.815585,0.701589,0.733916
7,0.0148,1.359021,0.788268,0.804027,0.691831,0.719821
8,0.0122,1.330823,0.799267,0.791962,0.732905,0.747222
9,0.01,1.387956,0.796517,0.808934,0.725041,0.747478
10,0.0073,1.421648,0.800183,0.804502,0.739271,0.754951


[I 2025-03-26 19:02:24,317] Trial 100 finished with value: 0.7492466028265045 and parameters: {'learning_rate': 0.00027722459123952225, 'weight_decay': 0.002, 'warmup_steps': 25}. Best is trial 43 with value: 0.7614573567738171.


Trial 101 with params: {'learning_rate': 0.00021102178947558206, 'weight_decay': 0.0, 'warmup_steps': 22}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6032,1.067744,0.767186,0.424366,0.460625,0.434155
2,0.3656,0.990098,0.781852,0.611705,0.572096,0.575548
3,0.1401,1.066132,0.789184,0.71659,0.643392,0.661557
4,0.0702,1.113204,0.797434,0.791641,0.674936,0.713488
5,0.0434,1.158735,0.797434,0.79135,0.710063,0.733348
6,0.0309,1.19916,0.790101,0.802749,0.688778,0.722556
7,0.021,1.284387,0.793767,0.787472,0.693848,0.719795
8,0.0179,1.283825,0.794684,0.787632,0.712062,0.730182
9,0.0148,1.306771,0.797434,0.779994,0.712585,0.72836
10,0.0101,1.365645,0.79835,0.799469,0.723418,0.741086


[I 2025-03-26 19:06:51,609] Trial 101 finished with value: 0.7550164025005508 and parameters: {'learning_rate': 0.00021102178947558206, 'weight_decay': 0.0, 'warmup_steps': 22}. Best is trial 43 with value: 0.7614573567738171.


Trial 102 with params: {'learning_rate': 0.0001949924979053275, 'weight_decay': 0.0, 'warmup_steps': 27}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6776,1.088435,0.76352,0.437112,0.447925,0.426862
2,0.4067,0.973796,0.790101,0.622945,0.574859,0.581077
3,0.1617,1.025031,0.790101,0.729752,0.646007,0.666665
4,0.0806,1.088683,0.791017,0.762169,0.656761,0.687849
5,0.0488,1.136006,0.802016,0.804322,0.720009,0.744608
6,0.0338,1.205321,0.789184,0.787723,0.682053,0.710915
7,0.0236,1.254955,0.797434,0.811677,0.706333,0.734449
8,0.0199,1.230188,0.802933,0.813846,0.736153,0.752081
9,0.0166,1.273028,0.797434,0.786883,0.715113,0.732874
10,0.0118,1.326388,0.797434,0.811111,0.718835,0.739571


[I 2025-03-26 19:11:17,791] Trial 102 finished with value: 0.7444258680569698 and parameters: {'learning_rate': 0.0001949924979053275, 'weight_decay': 0.0, 'warmup_steps': 27}. Best is trial 43 with value: 0.7614573567738171.


Trial 103 with params: {'learning_rate': 0.00023123369442023856, 'weight_decay': 0.0, 'warmup_steps': 23}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5342,1.047065,0.769936,0.44512,0.464633,0.44045
2,0.3243,1.007515,0.779102,0.607487,0.578908,0.579541
3,0.1218,1.075992,0.791934,0.711672,0.646838,0.662708
4,0.0611,1.117502,0.794684,0.777942,0.668954,0.703364
5,0.0381,1.176484,0.799267,0.807797,0.712545,0.740703
6,0.0274,1.230822,0.789184,0.788031,0.689111,0.718779
7,0.0182,1.314367,0.787351,0.771317,0.682724,0.705759
8,0.0159,1.31187,0.789184,0.784367,0.710119,0.728802
9,0.0126,1.340078,0.796517,0.790144,0.725008,0.74101
10,0.0093,1.388225,0.796517,0.812286,0.725093,0.748942


[I 2025-03-26 19:15:42,903] Trial 103 finished with value: 0.7550644983354465 and parameters: {'learning_rate': 0.00023123369442023856, 'weight_decay': 0.0, 'warmup_steps': 23}. Best is trial 43 with value: 0.7614573567738171.


Trial 104 with params: {'learning_rate': 0.00025269094715993617, 'weight_decay': 0.0, 'warmup_steps': 23}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4683,1.034592,0.773602,0.470724,0.477329,0.458086
2,0.2874,1.010739,0.780018,0.669529,0.604861,0.618666
3,0.1055,1.101912,0.794684,0.753634,0.689278,0.705994
4,0.0525,1.148898,0.799267,0.820677,0.707714,0.743016
5,0.0334,1.191628,0.79835,0.798511,0.718536,0.740236
6,0.024,1.236551,0.79835,0.805105,0.70949,0.736025
7,0.016,1.330288,0.791934,0.794211,0.694347,0.720859
8,0.0138,1.344695,0.79835,0.80196,0.739353,0.753373
9,0.0114,1.414139,0.792851,0.793177,0.715352,0.737543
10,0.0079,1.41937,0.799267,0.808544,0.736518,0.754706


[I 2025-03-26 19:20:09,331] Trial 104 finished with value: 0.7483373707957668 and parameters: {'learning_rate': 0.00025269094715993617, 'weight_decay': 0.0, 'warmup_steps': 23}. Best is trial 43 with value: 0.7614573567738171.


Trial 105 with params: {'learning_rate': 0.0002523073998035621, 'weight_decay': 0.001, 'warmup_steps': 30}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4853,1.030514,0.769019,0.435887,0.464364,0.442512
2,0.2898,1.009915,0.782768,0.664135,0.612745,0.623546
3,0.1061,1.072136,0.796517,0.7443,0.65648,0.679715
4,0.0529,1.135736,0.804766,0.819261,0.70416,0.740661
5,0.0338,1.182982,0.80385,0.801037,0.725432,0.744485
6,0.0242,1.232134,0.797434,0.813685,0.696499,0.728508
7,0.0168,1.31786,0.787351,0.776591,0.693542,0.714704
8,0.0138,1.315716,0.792851,0.7923,0.731964,0.743606
9,0.012,1.360814,0.79835,0.795809,0.725628,0.744137
10,0.0084,1.421061,0.794684,0.800002,0.736472,0.74958


[I 2025-03-26 19:24:33,645] Trial 105 finished with value: 0.7556358967145542 and parameters: {'learning_rate': 0.0002523073998035621, 'weight_decay': 0.001, 'warmup_steps': 30}. Best is trial 43 with value: 0.7614573567738171.


Trial 106 with params: {'learning_rate': 0.0002586995153481563, 'weight_decay': 0.0, 'warmup_steps': 35}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4789,1.028582,0.765353,0.437095,0.461697,0.439673
2,0.2818,1.012196,0.788268,0.67449,0.616803,0.629665
3,0.1029,1.099101,0.793767,0.734044,0.656894,0.677598
4,0.0508,1.143148,0.793767,0.822195,0.704279,0.741297
5,0.0328,1.176076,0.8011,0.804944,0.715739,0.739068
6,0.0234,1.25394,0.79835,0.78748,0.70413,0.72672
7,0.0166,1.32875,0.791934,0.80516,0.707951,0.732896
8,0.0139,1.340167,0.790101,0.798432,0.707669,0.730254
9,0.0115,1.384876,0.794684,0.800536,0.714079,0.734952
10,0.0082,1.438387,0.789184,0.809227,0.714438,0.739326


[I 2025-03-26 19:28:59,473] Trial 106 finished with value: 0.7466024206307607 and parameters: {'learning_rate': 0.0002586995153481563, 'weight_decay': 0.0, 'warmup_steps': 35}. Best is trial 43 with value: 0.7614573567738171.


Trial 107 with params: {'learning_rate': 0.0004979241829976273, 'weight_decay': 0.002, 'warmup_steps': 32}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0596,1.027521,0.774519,0.525383,0.555018,0.532989
2,0.125,1.107579,0.785518,0.771993,0.70053,0.71792
3,0.0488,1.219461,0.796517,0.789689,0.711444,0.731116
4,0.0269,1.283285,0.794684,0.827269,0.711185,0.74472
5,0.0203,1.368419,0.791017,0.81229,0.72993,0.755618
6,0.014,1.414745,0.797434,0.818398,0.725881,0.748737
7,0.0111,1.493519,0.793767,0.813255,0.721259,0.747572
8,0.0086,1.510804,0.793767,0.797647,0.732132,0.749101
9,0.0071,1.578563,0.789184,0.809411,0.715268,0.744413
10,0.0057,1.618742,0.787351,0.775912,0.725212,0.736676


[I 2025-03-26 19:33:24,427] Trial 107 finished with value: 0.7358429845934819 and parameters: {'learning_rate': 0.0004979241829976273, 'weight_decay': 0.002, 'warmup_steps': 32}. Best is trial 43 with value: 0.7614573567738171.


Trial 108 with params: {'learning_rate': 0.00019711131424837047, 'weight_decay': 0.003, 'warmup_steps': 37}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6919,1.091588,0.762603,0.422127,0.447463,0.422442
2,0.405,0.991763,0.780018,0.596737,0.5504,0.556853
3,0.1608,1.045939,0.789184,0.736961,0.645095,0.668209
4,0.0795,1.10066,0.786434,0.788314,0.669935,0.706508
5,0.0483,1.153428,0.794684,0.803378,0.723953,0.745876
6,0.0331,1.232995,0.783685,0.783431,0.684474,0.711826
7,0.0227,1.265639,0.788268,0.793656,0.692168,0.719326
8,0.0184,1.273315,0.7956,0.800734,0.716527,0.734252
9,0.0158,1.2974,0.794684,0.801318,0.728556,0.748063
10,0.0114,1.35657,0.792851,0.818836,0.720257,0.745729


[I 2025-03-26 19:37:49,723] Trial 108 finished with value: 0.7409389116974938 and parameters: {'learning_rate': 0.00019711131424837047, 'weight_decay': 0.003, 'warmup_steps': 37}. Best is trial 43 with value: 0.7614573567738171.


Trial 109 with params: {'learning_rate': 0.0001762451893065269, 'weight_decay': 0.001, 'warmup_steps': 33}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7731,1.129173,0.76352,0.423652,0.448459,0.423885
2,0.4632,0.994804,0.773602,0.572639,0.524468,0.526472
3,0.193,1.01354,0.791934,0.693672,0.630639,0.646338
4,0.0977,1.068483,0.788268,0.764286,0.649746,0.685766
5,0.0584,1.11638,0.791934,0.797335,0.707733,0.733925
6,0.0396,1.170464,0.785518,0.789674,0.686707,0.716058
7,0.0273,1.220538,0.788268,0.791381,0.6919,0.716892
8,0.022,1.230603,0.796517,0.794215,0.737939,0.745056
9,0.0185,1.270713,0.791934,0.79609,0.705827,0.730774
10,0.0135,1.333149,0.794684,0.803619,0.721858,0.741093


[I 2025-03-26 19:42:15,423] Trial 109 finished with value: 0.7502924084340395 and parameters: {'learning_rate': 0.0001762451893065269, 'weight_decay': 0.001, 'warmup_steps': 33}. Best is trial 43 with value: 0.7614573567738171.


Trial 110 with params: {'learning_rate': 0.00022169023375838217, 'weight_decay': 0.001, 'warmup_steps': 29}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5808,1.05345,0.765353,0.436574,0.458528,0.43569
2,0.3448,0.983482,0.789184,0.623907,0.585103,0.591404
3,0.1304,1.057857,0.792851,0.74903,0.664942,0.685096
4,0.0646,1.114785,0.792851,0.788026,0.675956,0.712382
5,0.0401,1.167208,0.797434,0.798966,0.720106,0.74318
6,0.0286,1.211372,0.792851,0.79003,0.689707,0.719618
7,0.0192,1.265686,0.793767,0.815485,0.70862,0.737385
8,0.0159,1.294907,0.792851,0.784589,0.725351,0.736633
9,0.0134,1.297171,0.8011,0.790765,0.712557,0.732949
10,0.0092,1.354915,0.797434,0.815207,0.724153,0.747436


[I 2025-03-26 19:46:42,489] Trial 110 finished with value: 0.7565047850668336 and parameters: {'learning_rate': 0.00022169023375838217, 'weight_decay': 0.001, 'warmup_steps': 29}. Best is trial 43 with value: 0.7614573567738171.


Trial 111 with params: {'learning_rate': 0.00028611167841864334, 'weight_decay': 0.002, 'warmup_steps': 37}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4116,1.01286,0.777269,0.478265,0.483913,0.468052
2,0.2463,1.041615,0.784601,0.727839,0.641362,0.660777
3,0.0901,1.110381,0.789184,0.766811,0.697269,0.713629
4,0.0453,1.17658,0.79835,0.824894,0.709503,0.742982
5,0.0283,1.221083,0.796517,0.792959,0.710765,0.734915
6,0.0207,1.269257,0.791934,0.81503,0.705739,0.735845
7,0.0143,1.350842,0.7956,0.796673,0.716012,0.733533
8,0.012,1.346182,0.794684,0.808073,0.726243,0.744912
9,0.0099,1.385339,0.791017,0.785699,0.732554,0.743724
10,0.0071,1.460881,0.800183,0.826334,0.725022,0.752496


[I 2025-03-26 19:51:07,481] Trial 111 finished with value: 0.7506670070074365 and parameters: {'learning_rate': 0.00028611167841864334, 'weight_decay': 0.002, 'warmup_steps': 37}. Best is trial 43 with value: 0.7614573567738171.


Trial 112 with params: {'learning_rate': 0.00016484828255446262, 'weight_decay': 0.002, 'warmup_steps': 30}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8216,1.153078,0.764436,0.423322,0.443886,0.422828
2,0.5009,0.995392,0.774519,0.564934,0.518968,0.515558
3,0.2144,0.998614,0.790101,0.653403,0.60786,0.615647
4,0.1101,1.063095,0.787351,0.708539,0.619561,0.647791
5,0.0661,1.099372,0.788268,0.786036,0.702245,0.727006


[I 2025-03-26 19:52:35,880] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.00024227916778208096, 'weight_decay': 0.001, 'warmup_steps': 29}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5131,1.036932,0.768103,0.437919,0.462787,0.440375
2,0.3064,1.002504,0.788268,0.669194,0.606,0.620049
3,0.1135,1.069964,0.791017,0.73526,0.655183,0.676707
4,0.0566,1.1276,0.800183,0.795978,0.681306,0.718972
5,0.035,1.199812,0.7956,0.78955,0.710435,0.73405
6,0.0255,1.227617,0.796517,0.807696,0.707009,0.733955
7,0.0176,1.291624,0.792851,0.790174,0.69908,0.723487
8,0.0148,1.309078,0.794684,0.804526,0.733065,0.749259
9,0.0123,1.32245,0.79835,0.801145,0.705788,0.731755
10,0.0084,1.39918,0.7956,0.813822,0.723284,0.747895


[I 2025-03-26 19:57:09,140] Trial 113 finished with value: 0.7503720250041747 and parameters: {'learning_rate': 0.00024227916778208096, 'weight_decay': 0.001, 'warmup_steps': 29}. Best is trial 43 with value: 0.7614573567738171.


Trial 114 with params: {'learning_rate': 0.00026433630023306123, 'weight_decay': 0.001, 'warmup_steps': 27}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4444,1.024348,0.774519,0.465258,0.47345,0.45513
2,0.2707,1.010772,0.783685,0.705507,0.626082,0.644744
3,0.0984,1.108685,0.789184,0.744786,0.677593,0.693051
4,0.05,1.174203,0.80385,0.82373,0.709022,0.744406
5,0.0319,1.190183,0.802016,0.792044,0.714325,0.738191
6,0.0225,1.315335,0.790101,0.805071,0.699219,0.730201
7,0.0154,1.372543,0.785518,0.793613,0.692438,0.720178
8,0.0135,1.327056,0.794684,0.806554,0.712231,0.7364
9,0.011,1.408985,0.794684,0.790479,0.723351,0.741359
10,0.0079,1.452065,0.794684,0.802059,0.718343,0.741103


[I 2025-03-26 20:01:34,990] Trial 114 finished with value: 0.7487784671980193 and parameters: {'learning_rate': 0.00026433630023306123, 'weight_decay': 0.001, 'warmup_steps': 27}. Best is trial 43 with value: 0.7614573567738171.


Trial 115 with params: {'learning_rate': 0.00028898255544382984, 'weight_decay': 0.0, 'warmup_steps': 32}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3924,1.010311,0.772686,0.476416,0.475922,0.461695
2,0.2412,1.039173,0.785518,0.739518,0.634981,0.661415
3,0.0885,1.108041,0.796517,0.762051,0.695581,0.712166
4,0.0444,1.188445,0.788268,0.809121,0.689622,0.725302
5,0.0285,1.21563,0.802016,0.801242,0.713768,0.738484
6,0.0209,1.302737,0.800183,0.797112,0.704298,0.731236
7,0.0146,1.368104,0.785518,0.761803,0.70878,0.718385
8,0.0123,1.383414,0.796517,0.799028,0.724442,0.741534
9,0.0097,1.438217,0.789184,0.795581,0.7112,0.732392
10,0.0077,1.443222,0.7956,0.823715,0.724199,0.750957


[I 2025-03-26 20:06:14,461] Trial 115 finished with value: 0.75241362335037 and parameters: {'learning_rate': 0.00028898255544382984, 'weight_decay': 0.0, 'warmup_steps': 32}. Best is trial 43 with value: 0.7614573567738171.


Trial 116 with params: {'learning_rate': 0.00028128286124663276, 'weight_decay': 0.0, 'warmup_steps': 16}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3736,1.018324,0.776352,0.475502,0.489062,0.469252
2,0.2466,1.030546,0.790101,0.737343,0.646261,0.670203
3,0.0904,1.114647,0.787351,0.752826,0.691975,0.705786
4,0.0451,1.136188,0.79835,0.819077,0.693028,0.731874
5,0.0292,1.19873,0.7956,0.792437,0.722916,0.737953
6,0.0212,1.276953,0.796517,0.806349,0.699944,0.734388
7,0.0146,1.391791,0.791017,0.795247,0.720319,0.736224
8,0.0119,1.409644,0.788268,0.800195,0.723447,0.740875
9,0.0101,1.447427,0.785518,0.809159,0.715628,0.740746
10,0.007,1.46412,0.790101,0.801407,0.722755,0.742808


[I 2025-03-26 20:10:51,282] Trial 116 finished with value: 0.7412799608499024 and parameters: {'learning_rate': 0.00028128286124663276, 'weight_decay': 0.0, 'warmup_steps': 16}. Best is trial 43 with value: 0.7614573567738171.


Trial 117 with params: {'learning_rate': 0.00015138066752833573, 'weight_decay': 0.001, 'warmup_steps': 28}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8883,1.189333,0.76077,0.419697,0.432476,0.413805
2,0.551,1.002602,0.773602,0.47891,0.497474,0.482107
3,0.2451,0.995226,0.788268,0.651949,0.588641,0.601902
4,0.1285,1.053976,0.781852,0.693662,0.610699,0.635464
5,0.077,1.077703,0.790101,0.740246,0.6698,0.68971
6,0.0519,1.138995,0.782768,0.777531,0.676097,0.705326
7,0.0363,1.196288,0.790101,0.796391,0.702284,0.723936
8,0.0284,1.193086,0.79835,0.796454,0.737615,0.748613
9,0.0239,1.242984,0.788268,0.781597,0.718201,0.733528
10,0.018,1.277342,0.789184,0.783002,0.726301,0.735972


[I 2025-03-26 20:15:28,391] Trial 117 finished with value: 0.7475914714331914 and parameters: {'learning_rate': 0.00015138066752833573, 'weight_decay': 0.001, 'warmup_steps': 28}. Best is trial 43 with value: 0.7614573567738171.


Trial 118 with params: {'learning_rate': 0.00012755064561990304, 'weight_decay': 0.0, 'warmup_steps': 31}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0396,1.274897,0.745188,0.391265,0.397057,0.377321
2,0.6672,1.039499,0.766269,0.473863,0.477813,0.469147
3,0.3248,0.988809,0.785518,0.603029,0.567833,0.57213
4,0.1787,1.027778,0.784601,0.673177,0.600444,0.621322
5,0.1091,1.043621,0.787351,0.684726,0.618717,0.634645
6,0.0735,1.102923,0.786434,0.750004,0.651449,0.678025
7,0.0517,1.140301,0.796517,0.797234,0.689186,0.716743
8,0.04,1.156135,0.79835,0.799122,0.717053,0.735514
9,0.0325,1.194294,0.789184,0.785404,0.689057,0.714231
10,0.0251,1.239861,0.7956,0.806006,0.713102,0.733672


[I 2025-03-26 20:18:28,172] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.00026286043324977326, 'weight_decay': 0.002, 'warmup_steps': 26}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4463,1.02508,0.775435,0.471121,0.475661,0.458287
2,0.2729,1.00673,0.789184,0.711732,0.630895,0.650308
3,0.0994,1.099363,0.789184,0.752684,0.679703,0.697748
4,0.05,1.162507,0.802016,0.821796,0.711345,0.744579
5,0.0314,1.178853,0.806599,0.793067,0.723458,0.743802
6,0.023,1.285358,0.800183,0.795657,0.714388,0.737996
7,0.0155,1.346769,0.793767,0.789178,0.697532,0.722873
8,0.0133,1.332444,0.793767,0.807398,0.720035,0.744216
9,0.0109,1.414648,0.791934,0.78589,0.707161,0.726639
10,0.0076,1.442106,0.79835,0.80628,0.72368,0.746272


[I 2025-03-26 20:22:57,712] Trial 119 finished with value: 0.750452832364819 and parameters: {'learning_rate': 0.00026286043324977326, 'weight_decay': 0.002, 'warmup_steps': 26}. Best is trial 43 with value: 0.7614573567738171.


Trial 120 with params: {'learning_rate': 0.0002472976711967465, 'weight_decay': 0.0, 'warmup_steps': 25}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4883,1.034767,0.771769,0.439435,0.468471,0.444568
2,0.2965,1.011748,0.779102,0.656657,0.601354,0.611862
3,0.1093,1.082567,0.793767,0.737518,0.65537,0.677813
4,0.0546,1.141569,0.807516,0.807449,0.691519,0.726752
5,0.0345,1.186851,0.800183,0.791263,0.723271,0.742238
6,0.0253,1.270477,0.791934,0.793517,0.692903,0.723824
7,0.0171,1.339865,0.786434,0.777273,0.67898,0.705669
8,0.0144,1.309631,0.7956,0.805617,0.724831,0.747272
9,0.012,1.358372,0.79835,0.796026,0.725243,0.744338
10,0.0084,1.411381,0.799267,0.812646,0.716707,0.744346


[I 2025-03-26 20:27:28,419] Trial 120 finished with value: 0.7497509051832413 and parameters: {'learning_rate': 0.0002472976711967465, 'weight_decay': 0.0, 'warmup_steps': 25}. Best is trial 43 with value: 0.7614573567738171.


Trial 121 with params: {'learning_rate': 0.00021059484737596867, 'weight_decay': 0.0, 'warmup_steps': 21}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6025,1.0692,0.765353,0.424646,0.459742,0.433335
2,0.3663,0.991789,0.783685,0.61748,0.579127,0.581944
3,0.1405,1.067339,0.791934,0.720787,0.645023,0.665031
4,0.0706,1.111359,0.8011,0.790106,0.67949,0.717078
5,0.0438,1.154084,0.8011,0.791601,0.714624,0.73852
6,0.031,1.203931,0.792851,0.792643,0.69133,0.720892
7,0.021,1.267773,0.791934,0.78788,0.683981,0.712789
8,0.0177,1.28995,0.791017,0.784406,0.709581,0.727767
9,0.0147,1.308882,0.79835,0.789114,0.716211,0.733966
10,0.0106,1.359456,0.796517,0.808619,0.72269,0.743013


[I 2025-03-26 20:31:58,840] Trial 121 finished with value: 0.7589326278735906 and parameters: {'learning_rate': 0.00021059484737596867, 'weight_decay': 0.0, 'warmup_steps': 21}. Best is trial 43 with value: 0.7614573567738171.


Trial 122 with params: {'learning_rate': 0.00018250113902552832, 'weight_decay': 0.0, 'warmup_steps': 20}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7153,1.111533,0.759853,0.430717,0.446445,0.423179
2,0.4391,0.981934,0.779102,0.598082,0.543195,0.546038
3,0.1798,1.023135,0.786434,0.702731,0.630021,0.649546
4,0.091,1.08452,0.792851,0.723936,0.64741,0.671568
5,0.0547,1.122177,0.79835,0.804233,0.715315,0.741202
6,0.038,1.193614,0.786434,0.772069,0.670201,0.700748
7,0.0267,1.249801,0.790101,0.782824,0.692612,0.714339
8,0.0219,1.24184,0.8011,0.808535,0.73341,0.748841
9,0.0181,1.268911,0.793767,0.798154,0.711494,0.733313
10,0.0133,1.323109,0.7956,0.805738,0.714032,0.736368


[I 2025-03-26 20:36:27,434] Trial 122 finished with value: 0.7523571752469066 and parameters: {'learning_rate': 0.00018250113902552832, 'weight_decay': 0.0, 'warmup_steps': 20}. Best is trial 43 with value: 0.7614573567738171.


Trial 123 with params: {'learning_rate': 0.00011876356812732051, 'weight_decay': 0.0, 'warmup_steps': 19}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0797,1.314955,0.742438,0.387211,0.393414,0.371505
2,0.7159,1.048954,0.766269,0.470034,0.472388,0.463097
3,0.3587,0.990562,0.781852,0.605961,0.546943,0.552123
4,0.2027,1.025412,0.779102,0.645977,0.577241,0.595646
5,0.1254,1.032408,0.787351,0.683075,0.618528,0.633897


[I 2025-03-26 20:37:59,865] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0003830353687624567, 'weight_decay': 0.0, 'warmup_steps': 24}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1871,1.009573,0.777269,0.552442,0.531274,0.523628
2,0.1667,1.081845,0.793767,0.719877,0.648694,0.666361
3,0.0613,1.231619,0.790101,0.773495,0.737622,0.738406
4,0.0328,1.22584,0.794684,0.800667,0.714841,0.735379
5,0.0223,1.292222,0.7956,0.787202,0.727031,0.740053
6,0.0167,1.328178,0.802933,0.837746,0.723968,0.758859
7,0.0113,1.460276,0.790101,0.787043,0.725642,0.74117
8,0.0096,1.460432,0.780935,0.77832,0.717181,0.73401
9,0.0074,1.543519,0.788268,0.808021,0.710518,0.740884
10,0.0059,1.564367,0.780935,0.789831,0.719152,0.740143


[I 2025-03-26 20:42:29,021] Trial 124 finished with value: 0.7417938715066441 and parameters: {'learning_rate': 0.0003830353687624567, 'weight_decay': 0.0, 'warmup_steps': 24}. Best is trial 43 with value: 0.7614573567738171.


Trial 125 with params: {'learning_rate': 0.00034992243489958674, 'weight_decay': 0.0, 'warmup_steps': 15}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2238,1.004777,0.775435,0.533918,0.516155,0.504875
2,0.1872,1.07981,0.780018,0.720376,0.642368,0.660648
3,0.0707,1.130233,0.792851,0.781201,0.728818,0.738493
4,0.035,1.246857,0.793767,0.823274,0.704497,0.737995
5,0.0236,1.287273,0.791934,0.793744,0.696299,0.722507


[I 2025-03-26 20:44:02,251] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.00019099471169581992, 'weight_decay': 0.007, 'warmup_steps': 35}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7124,1.10122,0.76077,0.417703,0.446482,0.420515
2,0.4207,0.990862,0.779102,0.593859,0.545648,0.551487
3,0.1695,1.036288,0.791017,0.737056,0.6444,0.669513
4,0.0842,1.090235,0.788268,0.778909,0.665595,0.702019
5,0.0507,1.139208,0.799267,0.804486,0.719196,0.742995
6,0.0351,1.20629,0.780935,0.781071,0.673584,0.704775
7,0.024,1.23769,0.793767,0.794905,0.713888,0.734385
8,0.0195,1.24857,0.79835,0.790371,0.711729,0.728855
9,0.0165,1.280712,0.792851,0.794425,0.725255,0.743446
10,0.0121,1.36275,0.794684,0.81519,0.727569,0.74894


[I 2025-03-26 20:48:38,283] Trial 126 finished with value: 0.7414787201603622 and parameters: {'learning_rate': 0.00019099471169581992, 'weight_decay': 0.007, 'warmup_steps': 35}. Best is trial 43 with value: 0.7614573567738171.


Trial 127 with params: {'learning_rate': 0.00013712317084921553, 'weight_decay': 0.002, 'warmup_steps': 21}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9585,1.235224,0.744271,0.39151,0.398902,0.379131
2,0.6136,1.018923,0.768103,0.479713,0.485152,0.475149
3,0.2862,0.988159,0.785518,0.629696,0.579354,0.585974
4,0.1537,1.036074,0.788268,0.693557,0.616699,0.639188
5,0.0935,1.050406,0.789184,0.688578,0.619298,0.636395


[I 2025-03-26 20:50:09,317] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.00029317115299699333, 'weight_decay': 0.0, 'warmup_steps': 30}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3772,1.014413,0.770852,0.473801,0.485412,0.470065
2,0.2371,1.046101,0.787351,0.739585,0.637117,0.662915
3,0.0865,1.126031,0.793767,0.748175,0.706604,0.711588
4,0.0443,1.166564,0.800183,0.824475,0.706529,0.741706
5,0.0275,1.257734,0.796517,0.799118,0.712133,0.73777
6,0.021,1.292385,0.791934,0.779807,0.702296,0.722854
7,0.0139,1.381695,0.791934,0.779215,0.722862,0.734279
8,0.0114,1.38084,0.792851,0.792911,0.721456,0.737316
9,0.0092,1.433458,0.785518,0.794034,0.720123,0.738441
10,0.0071,1.446218,0.789184,0.79188,0.714903,0.733758


[I 2025-03-26 20:53:08,350] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.00028605110574087016, 'weight_decay': 0.002, 'warmup_steps': 23}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.377,1.01535,0.768103,0.475582,0.482142,0.464849
2,0.2424,1.042764,0.786434,0.746782,0.642863,0.669893
3,0.0878,1.103939,0.7956,0.763245,0.69083,0.706867
4,0.0448,1.170905,0.8011,0.817256,0.71101,0.743336
5,0.0278,1.234377,0.79835,0.799015,0.715497,0.738176
6,0.02,1.263358,0.8011,0.810751,0.72879,0.749373
7,0.0137,1.370972,0.788268,0.776783,0.699584,0.716272
8,0.0114,1.371852,0.785518,0.768706,0.720358,0.72672
9,0.0092,1.445267,0.791017,0.802037,0.710564,0.734818
10,0.0071,1.443235,0.7956,0.801252,0.728541,0.744373


[I 2025-03-26 20:57:42,145] Trial 129 finished with value: 0.7504907032096793 and parameters: {'learning_rate': 0.00028605110574087016, 'weight_decay': 0.002, 'warmup_steps': 23}. Best is trial 43 with value: 0.7614573567738171.


Trial 130 with params: {'learning_rate': 0.00010571983924941356, 'weight_decay': 0.01, 'warmup_steps': 36}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2153,1.39791,0.731439,0.364706,0.372166,0.350219
2,0.8152,1.07839,0.76077,0.427667,0.45609,0.434607
3,0.432,1.002864,0.770852,0.47283,0.495929,0.479447
4,0.2558,1.013784,0.778185,0.607845,0.546774,0.556726
5,0.1619,1.019009,0.785518,0.681802,0.615357,0.631274
6,0.1104,1.058596,0.784601,0.691344,0.60546,0.628322
7,0.0779,1.079119,0.7956,0.716831,0.640164,0.659149
8,0.0604,1.100953,0.800183,0.80449,0.704333,0.729594
9,0.0489,1.140247,0.7956,0.796206,0.690392,0.719902
10,0.0391,1.182472,0.791934,0.792793,0.704462,0.72491


[I 2025-03-26 21:00:45,805] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.000186818886898241, 'weight_decay': 0.0, 'warmup_steps': 22}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7009,1.102498,0.762603,0.423792,0.447951,0.424231
2,0.4269,0.977954,0.780935,0.601072,0.55005,0.553851
3,0.1728,1.025464,0.785518,0.704562,0.631955,0.650776
4,0.087,1.085198,0.790101,0.76214,0.651706,0.68516
5,0.0524,1.125716,0.799267,0.802491,0.717242,0.741668
6,0.0367,1.195038,0.789184,0.810518,0.692821,0.726487
7,0.0257,1.256318,0.788268,0.799341,0.688946,0.718798
8,0.0209,1.239148,0.805683,0.814548,0.734799,0.752445
9,0.0173,1.265257,0.794684,0.789323,0.709664,0.729317
10,0.0128,1.330151,0.800183,0.812916,0.725603,0.745206


[I 2025-03-26 21:05:19,567] Trial 131 finished with value: 0.756810667448188 and parameters: {'learning_rate': 0.000186818886898241, 'weight_decay': 0.0, 'warmup_steps': 22}. Best is trial 43 with value: 0.7614573567738171.


Trial 132 with params: {'learning_rate': 0.00012236202286405423, 'weight_decay': 0.0, 'warmup_steps': 6}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0245,1.293612,0.742438,0.381322,0.39303,0.370708
2,0.6892,1.037555,0.769936,0.484512,0.48267,0.474831
3,0.3408,0.990782,0.781852,0.59895,0.560212,0.560492
4,0.1903,1.030706,0.782768,0.645515,0.57838,0.596411
5,0.117,1.035384,0.791934,0.68612,0.618809,0.634593
6,0.0794,1.096472,0.789184,0.724871,0.63369,0.656328
7,0.0563,1.131468,0.79835,0.77082,0.671136,0.695424
8,0.0434,1.149211,0.7956,0.801149,0.71645,0.734763
9,0.0356,1.180332,0.790101,0.786725,0.699821,0.722111
10,0.0279,1.234927,0.791017,0.782288,0.706092,0.720978


[I 2025-03-26 21:08:22,840] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0001583018986052076, 'weight_decay': 0.0, 'warmup_steps': 18}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8291,1.164547,0.76077,0.423509,0.438298,0.418641
2,0.5197,0.992997,0.776352,0.533608,0.515234,0.510121
3,0.2265,0.99524,0.789184,0.645586,0.597742,0.605889
4,0.1181,1.056115,0.791017,0.705994,0.62135,0.647905
5,0.071,1.086919,0.793767,0.754338,0.681763,0.700686


[I 2025-03-26 21:09:55,201] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.000246707626667932, 'weight_decay': 0.0, 'warmup_steps': 24}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.488,1.036367,0.770852,0.439455,0.468368,0.444451
2,0.2975,1.013068,0.780018,0.663563,0.605819,0.616526
3,0.1099,1.082511,0.793767,0.74589,0.665371,0.685581
4,0.0549,1.127784,0.802933,0.82262,0.709242,0.74347
5,0.0347,1.181495,0.802933,0.793929,0.717006,0.739665
6,0.0248,1.250523,0.797434,0.815299,0.707647,0.739656
7,0.016,1.328836,0.793767,0.790153,0.682584,0.713398
8,0.0141,1.320774,0.7956,0.794791,0.735505,0.748008
9,0.0119,1.347179,0.799267,0.799321,0.715623,0.737558
10,0.0084,1.401607,0.802933,0.817185,0.718764,0.746535


[I 2025-03-26 21:14:40,092] Trial 134 finished with value: 0.7457965746717565 and parameters: {'learning_rate': 0.000246707626667932, 'weight_decay': 0.0, 'warmup_steps': 24}. Best is trial 43 with value: 0.7614573567738171.


Trial 135 with params: {'learning_rate': 8.338465745809332e-05, 'weight_decay': 0.006, 'warmup_steps': 23}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3974,1.580758,0.696609,0.33482,0.325545,0.309035
2,1.0174,1.142907,0.758937,0.44813,0.443655,0.429263
3,0.581,1.04202,0.768103,0.467776,0.483718,0.469975
4,0.3751,1.001436,0.771769,0.539532,0.507827,0.503649
5,0.2509,0.990145,0.787351,0.623104,0.57278,0.583887


[I 2025-03-26 21:16:09,714] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.00017198531425921562, 'weight_decay': 0.0, 'warmup_steps': 25}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7757,1.131487,0.76352,0.413033,0.442284,0.417473
2,0.4744,0.986329,0.778185,0.595577,0.541703,0.545274
3,0.1995,1.004754,0.787351,0.690234,0.6232,0.637726
4,0.1018,1.068647,0.791934,0.721785,0.632165,0.659724


[W 2025-03-26 21:17:28,414] Trial 136 failed with parameters: {'learning_rate': 0.00017198531425921562, 'weight_decay': 0.0, 'warmup_steps': 25} because of the following error: KeyboardInterrupt().
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/integration_utils.py", line 250, in _objective
    trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2241, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2553, in _inner_training_loop
    and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
KeyboardInterrupt
[W 2025-03-26 21:17:28,416] Trial 136 failed with value None.


KeyboardInterrupt: 

In [None]:
print(best_trial_normal_aug)

In [30]:
base.reset_seed()

## Prohledávání s destilací nad augmentovaným datasetem
Konfigurace jednotlivých tréninků.

In [31]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bert-distill-aug_fine_hp-search", logging_dir=f"~/logs/{DATASET}/bert-distill-aug_fine_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [32]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 5e-4, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [33]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [34]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_aug,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_Bert()
)
  

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Nastavení prohledávání.

In [35]:
best_trial_distill_aug = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Test-Distill-aug",
    n_trials=150
)

[I 2025-03-26 21:17:39,670] A new study created in memory with name: Test-Distill-aug


Trial 0 with params: {'learning_rate': 4.3284502212938785e-05, 'weight_decay': 0.01, 'warmup_steps': 39, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8672,1.424818,0.568286,0.233504,0.194461,0.180431
2,1.1139,1.00638,0.703941,0.327727,0.31011,0.289055
3,0.7704,0.844506,0.72594,0.32957,0.344225,0.319838
4,0.5919,0.769759,0.747021,0.409864,0.39576,0.382859
5,0.4839,0.725176,0.754354,0.467335,0.429483,0.422567


[I 2025-03-26 21:19:14,028] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 1.8408992080552506e-05, 'weight_decay': 0.0, 'warmup_steps': 46, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1553,1.886796,0.433547,0.111172,0.107404,0.091547
2,1.6761,1.524248,0.534372,0.185234,0.164608,0.144515
3,1.3587,1.292609,0.611366,0.276662,0.232775,0.220938
4,1.1409,1.132281,0.67736,0.271022,0.282613,0.263858
5,0.982,1.02103,0.687443,0.256528,0.289775,0.263951
6,0.8698,0.946549,0.708524,0.30559,0.314129,0.293213
7,0.7833,0.893389,0.716774,0.319496,0.324012,0.299977
8,0.7227,0.857051,0.724106,0.325415,0.336216,0.31299
9,0.6749,0.830223,0.726856,0.327737,0.341308,0.317423
10,0.6357,0.812256,0.730522,0.357735,0.352436,0.332309


[I 2025-03-26 21:22:22,935] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 1.0838581269344744e-05, 'weight_decay': 0.01, 'warmup_steps': 44, 'lambda_param': 0.2, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2615,2.079233,0.346471,0.070799,0.072089,0.050804
2,1.9382,1.815149,0.460128,0.107051,0.122784,0.104818
3,1.6976,1.617504,0.502291,0.138476,0.147729,0.124519
4,1.5133,1.468852,0.546288,0.225718,0.173858,0.155669
5,1.3667,1.354156,0.588451,0.24975,0.211437,0.196148


[I 2025-03-26 21:23:56,917] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 2.049268011541735e-05, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.4, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1195,1.833186,0.455545,0.107661,0.118924,0.100757
2,1.6092,1.457748,0.558203,0.214635,0.182249,0.164731
3,1.282,1.223375,0.647113,0.271988,0.260727,0.247697
4,1.0625,1.067356,0.68286,0.257785,0.284352,0.26041
5,0.9067,0.964824,0.703025,0.288267,0.306032,0.281832
6,0.7992,0.898095,0.71769,0.314864,0.326693,0.30445
7,0.7165,0.851239,0.725023,0.327778,0.339074,0.316077
8,0.66,0.819497,0.728689,0.346056,0.34864,0.328033
9,0.6152,0.797187,0.730522,0.353584,0.356958,0.337361
10,0.5782,0.782196,0.740605,0.389777,0.377902,0.361128


[I 2025-03-26 21:26:58,809] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.00010952662748632558, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3794,0.898934,0.72044,0.331071,0.329918,0.305141
2,0.5559,0.720243,0.761687,0.47491,0.446025,0.439547
3,0.3358,0.662921,0.777269,0.556938,0.512069,0.510193
4,0.2301,0.649787,0.790101,0.635974,0.555602,0.573167
5,0.1747,0.645456,0.797434,0.667146,0.615607,0.624179
6,0.1446,0.638353,0.800183,0.706646,0.627185,0.648862
7,0.1264,0.638341,0.7956,0.731939,0.640181,0.664322
8,0.1151,0.638088,0.7956,0.729741,0.65122,0.671329
9,0.1069,0.642829,0.796517,0.77111,0.672112,0.700855
10,0.1005,0.654328,0.793767,0.753872,0.673648,0.695856


[I 2025-03-26 21:30:01,814] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0002157696745589684, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0523,0.725032,0.762603,0.429626,0.42534,0.411861
2,0.3054,0.643991,0.7956,0.609284,0.567188,0.574325
3,0.1672,0.649918,0.796517,0.696634,0.629973,0.645928
4,0.1208,0.662573,0.796517,0.752054,0.634963,0.670027
5,0.1016,0.657606,0.796517,0.76529,0.674646,0.700381
6,0.0911,0.649584,0.79835,0.770861,0.668097,0.699744
7,0.084,0.665144,0.794684,0.809048,0.707134,0.738059
8,0.0806,0.66179,0.79835,0.80699,0.716651,0.742393
9,0.0774,0.665049,0.794684,0.806166,0.720945,0.745294
10,0.0745,0.673218,0.7956,0.808677,0.724367,0.747917


[I 2025-03-26 21:34:43,816] Trial 5 finished with value: 0.7549412251809023 and parameters: {'learning_rate': 0.0002157696745589684, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 5 with value: 0.7549412251809023.


Trial 6 with params: {'learning_rate': 0.00010769622478263136, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3711,0.899925,0.71769,0.323206,0.325772,0.299958
2,0.5603,0.723385,0.762603,0.449081,0.437775,0.42745
3,0.3394,0.668963,0.772686,0.510455,0.500577,0.492147
4,0.2336,0.652519,0.786434,0.640527,0.551436,0.573084
5,0.1781,0.64648,0.796517,0.666451,0.614243,0.622952
6,0.1473,0.642453,0.799267,0.697238,0.626364,0.645427
7,0.1282,0.637582,0.800183,0.691093,0.631636,0.647149
8,0.1165,0.635386,0.7956,0.720272,0.653018,0.669684
9,0.1081,0.646516,0.794684,0.727004,0.652117,0.67189
10,0.1013,0.652863,0.796517,0.723398,0.655973,0.672694


[I 2025-03-26 21:37:47,279] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.000236288641842364, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9752,0.715762,0.76077,0.44479,0.444696,0.431947
2,0.2763,0.649771,0.792851,0.603783,0.576167,0.578968
3,0.1547,0.654674,0.792851,0.688429,0.631922,0.647301
4,0.1141,0.660714,0.7956,0.750861,0.63067,0.667248
5,0.0967,0.660894,0.794684,0.740629,0.684009,0.698154
6,0.0877,0.653076,0.799267,0.784906,0.703466,0.728963
7,0.0818,0.657942,0.797434,0.805487,0.719958,0.746528
8,0.0787,0.663643,0.797434,0.813434,0.714118,0.743161
9,0.0754,0.661124,0.799267,0.808726,0.721465,0.747188
10,0.0729,0.676176,0.794684,0.812715,0.719091,0.748202


[I 2025-03-26 21:42:27,862] Trial 7 finished with value: 0.7492883805070879 and parameters: {'learning_rate': 0.000236288641842364, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 5 with value: 0.7549412251809023.


Trial 8 with params: {'learning_rate': 1.6119044727609182e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1681,1.930833,0.416132,0.097131,0.096793,0.080935
2,1.7406,1.595704,0.511457,0.18,0.152533,0.130495
3,1.444,1.373219,0.584785,0.229412,0.210122,0.193738
4,1.2345,1.214399,0.652612,0.27995,0.259541,0.246907
5,1.0768,1.097992,0.67736,0.25451,0.280483,0.257643
6,0.9616,1.01481,0.690192,0.276179,0.291802,0.266993
7,0.8715,0.954319,0.710357,0.313156,0.31082,0.288659
8,0.8067,0.912523,0.713107,0.323665,0.320588,0.296829
9,0.7557,0.880194,0.71769,0.316843,0.324989,0.301288
10,0.7142,0.858133,0.72319,0.323863,0.334697,0.311005


[I 2025-03-26 21:45:31,002] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.00013353819088790598, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.289,0.834961,0.732356,0.3785,0.361061,0.342909
2,0.4721,0.688384,0.770852,0.501207,0.473144,0.472248
3,0.2725,0.651342,0.786434,0.576381,0.530581,0.534847
4,0.186,0.653136,0.793767,0.67379,0.594159,0.615987
5,0.1452,0.652714,0.7956,0.685631,0.626076,0.64169
6,0.1232,0.644991,0.79835,0.727943,0.635756,0.662961
7,0.1087,0.642806,0.79835,0.744081,0.655631,0.680075
8,0.1003,0.647454,0.796517,0.73784,0.65804,0.682253
9,0.0942,0.648155,0.797434,0.787243,0.679687,0.712008
10,0.0893,0.661976,0.7956,0.780949,0.689758,0.716625


[I 2025-03-26 21:50:13,989] Trial 9 finished with value: 0.7245488761821971 and parameters: {'learning_rate': 0.00013353819088790598, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 5 with value: 0.7549412251809023.


Trial 10 with params: {'learning_rate': 0.0003740714100285732, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.1, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8298,0.674561,0.776352,0.502379,0.49406,0.489083
2,0.1901,0.666642,0.786434,0.698037,0.630011,0.641412
3,0.1151,0.65299,0.7956,0.744329,0.674087,0.692775
4,0.0925,0.668328,0.802016,0.796945,0.6966,0.725567
5,0.0822,0.675508,0.796517,0.808964,0.701099,0.73307
6,0.0765,0.674247,0.791934,0.8046,0.70822,0.738564
7,0.0731,0.681904,0.796517,0.809079,0.716277,0.744052
8,0.0714,0.686214,0.791934,0.789033,0.693392,0.722878
9,0.0696,0.695215,0.794684,0.794165,0.714599,0.736073
10,0.0677,0.694528,0.786434,0.8081,0.705329,0.734897


[I 2025-03-26 21:54:41,736] Trial 10 finished with value: 0.7428912800689917 and parameters: {'learning_rate': 0.0003740714100285732, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 5 with value: 0.7549412251809023.


Trial 11 with params: {'learning_rate': 0.00026589184366630346, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9197,0.702581,0.769019,0.460778,0.454911,0.440924
2,0.2498,0.641957,0.796517,0.640488,0.59009,0.601189
3,0.1419,0.650434,0.792851,0.715019,0.634091,0.657371
4,0.1063,0.658249,0.79835,0.772746,0.654844,0.690135
5,0.0924,0.652306,0.79835,0.808792,0.713001,0.743771
6,0.0841,0.649106,0.80385,0.823599,0.715375,0.752109
7,0.0786,0.672719,0.7956,0.818406,0.71677,0.7495
8,0.0761,0.663368,0.802016,0.82035,0.720876,0.753428
9,0.0733,0.670742,0.792851,0.806162,0.713787,0.739557
10,0.0711,0.677706,0.793767,0.823857,0.720509,0.753216


[I 2025-03-26 21:59:08,734] Trial 11 finished with value: 0.745820604275408 and parameters: {'learning_rate': 0.00026589184366630346, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 5 with value: 0.7549412251809023.


Trial 12 with params: {'learning_rate': 0.0002657573253284101, 'weight_decay': 0.001, 'warmup_steps': 20, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9485,0.707429,0.768103,0.466409,0.45352,0.44056
2,0.2506,0.642582,0.797434,0.635446,0.592819,0.599644
3,0.1412,0.655641,0.791017,0.698247,0.633112,0.650078
4,0.1064,0.661206,0.799267,0.776469,0.679395,0.709467
5,0.0921,0.667944,0.792851,0.79891,0.702999,0.733889
6,0.0842,0.659482,0.800183,0.798785,0.704385,0.73536
7,0.0786,0.663006,0.788268,0.796789,0.706567,0.732659
8,0.0762,0.661184,0.796517,0.817124,0.709817,0.740541
9,0.0737,0.670784,0.791934,0.799635,0.703904,0.731258
10,0.0711,0.681305,0.794684,0.813515,0.723044,0.749373


[I 2025-03-26 22:03:34,272] Trial 12 finished with value: 0.7352923247120513 and parameters: {'learning_rate': 0.0002657573253284101, 'weight_decay': 0.001, 'warmup_steps': 20, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 5 with value: 0.7549412251809023.


Trial 13 with params: {'learning_rate': 0.000329847374420809, 'weight_decay': 0.008, 'warmup_steps': 23, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8656,0.694612,0.772686,0.496184,0.490303,0.482927
2,0.2088,0.659241,0.788268,0.69865,0.628051,0.640141
3,0.1228,0.662466,0.7956,0.770017,0.680392,0.705737
4,0.0967,0.660278,0.79835,0.799012,0.681191,0.716874
5,0.0854,0.668515,0.800183,0.810635,0.714976,0.745907
6,0.079,0.681543,0.797434,0.788807,0.719433,0.74133
7,0.0747,0.684377,0.796517,0.790349,0.700093,0.72621
8,0.0726,0.692038,0.790101,0.801522,0.706677,0.734887
9,0.0708,0.692351,0.786434,0.81209,0.710353,0.740543
10,0.0687,0.699949,0.788268,0.800691,0.718612,0.741309


[I 2025-03-26 22:06:31,681] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.00038226914326652676, 'weight_decay': 0.0, 'warmup_steps': 41, 'lambda_param': 0.5, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.836,0.679528,0.771769,0.520415,0.496123,0.495901
2,0.187,0.651844,0.796517,0.715983,0.637795,0.655297
3,0.114,0.688554,0.784601,0.774427,0.688089,0.710344
4,0.0923,0.676134,0.793767,0.780673,0.698287,0.72206
5,0.082,0.669607,0.791934,0.804321,0.687196,0.722628
6,0.0769,0.682974,0.796517,0.82454,0.723074,0.750084
7,0.0736,0.679375,0.794684,0.817323,0.699105,0.731374
8,0.0709,0.691009,0.789184,0.797973,0.691562,0.720369
9,0.0693,0.688469,0.800183,0.816652,0.708992,0.737882
10,0.0679,0.70101,0.791934,0.816918,0.717586,0.744788


[I 2025-03-26 22:10:56,728] Trial 14 finished with value: 0.7474553171642878 and parameters: {'learning_rate': 0.00038226914326652676, 'weight_decay': 0.0, 'warmup_steps': 41, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 5 with value: 0.7549412251809023.


Trial 15 with params: {'learning_rate': 4.805219737775734e-05, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7859,1.343651,0.601283,0.264645,0.223177,0.211525
2,1.0295,0.948364,0.710357,0.320335,0.317366,0.292277
3,0.7036,0.809216,0.732356,0.384246,0.358505,0.340039
4,0.5382,0.745135,0.754354,0.461153,0.42751,0.419528
5,0.4371,0.706978,0.770852,0.523105,0.46778,0.46239
6,0.3698,0.689927,0.773602,0.493623,0.472738,0.467338
7,0.317,0.66941,0.781852,0.532157,0.503107,0.501958
8,0.2799,0.657162,0.783685,0.56718,0.511064,0.514434
9,0.253,0.655007,0.780935,0.576221,0.517075,0.519591
10,0.2317,0.649977,0.785518,0.576403,0.523613,0.525697


[I 2025-03-26 22:13:53,393] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.00010034827545605993, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.414,0.933546,0.712191,0.318662,0.316984,0.294214
2,0.5957,0.733735,0.758937,0.44669,0.428132,0.418613
3,0.3668,0.668949,0.775435,0.51198,0.500155,0.494431
4,0.2532,0.651152,0.789184,0.639858,0.546256,0.565635
5,0.1915,0.643072,0.794684,0.643855,0.591222,0.601088


[I 2025-03-26 22:15:22,105] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.00044246075223732244, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7362,0.651002,0.780935,0.586207,0.521286,0.52869
2,0.1677,0.668431,0.787351,0.702076,0.635704,0.650732
3,0.1067,0.666396,0.793767,0.748153,0.677292,0.696389
4,0.0879,0.661817,0.800183,0.823196,0.72911,0.75738
5,0.0803,0.707514,0.780018,0.790483,0.696888,0.721143
6,0.0751,0.705944,0.792851,0.812546,0.708048,0.73837
7,0.0722,0.708032,0.786434,0.813621,0.712841,0.743934
8,0.0704,0.695116,0.790101,0.805551,0.7132,0.741398
9,0.0683,0.711574,0.789184,0.820581,0.716059,0.746186
10,0.0669,0.719645,0.785518,0.805716,0.717903,0.744842


[I 2025-03-26 22:19:46,980] Trial 17 finished with value: 0.7593272693182541 and parameters: {'learning_rate': 0.00044246075223732244, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 17 with value: 0.7593272693182541.


Trial 18 with params: {'learning_rate': 0.0003114789869713292, 'weight_decay': 0.005, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8671,0.690117,0.784601,0.534213,0.50721,0.508347
2,0.2188,0.647092,0.791017,0.681202,0.607646,0.627089
3,0.1274,0.660736,0.7956,0.746077,0.674508,0.693828
4,0.0988,0.678635,0.790101,0.813886,0.701766,0.735175
5,0.0872,0.664142,0.796517,0.815417,0.705985,0.738267
6,0.0801,0.658535,0.805683,0.821685,0.726803,0.755879
7,0.0757,0.674482,0.793767,0.821178,0.710439,0.74834
8,0.0734,0.663738,0.79835,0.809551,0.716256,0.743201
9,0.0716,0.685485,0.794684,0.806681,0.724124,0.747261
10,0.0695,0.677126,0.800183,0.827924,0.714085,0.749853


[I 2025-03-26 22:24:12,679] Trial 18 finished with value: 0.7524786463434122 and parameters: {'learning_rate': 0.0003114789869713292, 'weight_decay': 0.005, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 6.5}. Best is trial 17 with value: 0.7593272693182541.


Trial 19 with params: {'learning_rate': 0.00018763871193579055, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.9, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0931,0.747857,0.758937,0.439814,0.419272,0.410522
2,0.3462,0.6478,0.791017,0.580684,0.532083,0.535614
3,0.191,0.647088,0.791934,0.645107,0.593926,0.60384
4,0.1346,0.651783,0.7956,0.723802,0.624181,0.653998
5,0.1109,0.663951,0.786434,0.742287,0.653561,0.680354
6,0.0977,0.652682,0.794684,0.76918,0.654337,0.689424
7,0.0887,0.65509,0.796517,0.812564,0.71364,0.742524
8,0.0846,0.660331,0.797434,0.813336,0.720306,0.745705
9,0.0806,0.660575,0.796517,0.800011,0.708809,0.734551
10,0.0775,0.671691,0.793767,0.811852,0.712549,0.740864


[I 2025-03-26 22:28:35,777] Trial 19 finished with value: 0.7600185253294757 and parameters: {'learning_rate': 0.00018763871193579055, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.9, 'temperature': 7.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 20 with params: {'learning_rate': 0.00011894522730480247, 'weight_decay': 0.006, 'warmup_steps': 23, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3434,0.870033,0.725023,0.338397,0.341511,0.316263
2,0.5197,0.706234,0.765353,0.487846,0.456846,0.455017
3,0.3076,0.659132,0.783685,0.578365,0.529688,0.533118
4,0.2104,0.652627,0.788268,0.651846,0.562075,0.584481
5,0.1615,0.648671,0.796517,0.66663,0.61974,0.629401
6,0.1351,0.643262,0.796517,0.729568,0.635067,0.662668
7,0.1184,0.638753,0.799267,0.740353,0.643731,0.671196
8,0.1085,0.640603,0.796517,0.729278,0.652077,0.673268
9,0.101,0.645869,0.793767,0.735482,0.649859,0.672604
10,0.0953,0.657568,0.792851,0.750552,0.673609,0.695912


[I 2025-03-26 22:31:30,648] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00021034558437245743, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.9, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0436,0.727383,0.76352,0.442097,0.432245,0.420891
2,0.3103,0.64923,0.790101,0.607359,0.559058,0.566679
3,0.1704,0.648252,0.794684,0.71229,0.637015,0.655701
4,0.1232,0.652755,0.792851,0.748429,0.632113,0.668577
5,0.1031,0.661333,0.792851,0.756098,0.662941,0.691572
6,0.0925,0.658759,0.79835,0.793824,0.677176,0.713755
7,0.0851,0.660307,0.793767,0.806093,0.702993,0.734074
8,0.0814,0.661722,0.8011,0.818127,0.714872,0.745248
9,0.078,0.661655,0.799267,0.810157,0.710916,0.738914
10,0.0751,0.679697,0.7956,0.813789,0.707716,0.740332


[I 2025-03-26 22:36:01,739] Trial 21 finished with value: 0.7558249197907113 and parameters: {'learning_rate': 0.00021034558437245743, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.9, 'temperature': 7.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 22 with params: {'learning_rate': 0.00036966704825076467, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.8, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7946,0.68015,0.779102,0.496131,0.499811,0.492054
2,0.1907,0.664932,0.791017,0.703535,0.630601,0.645275
3,0.1159,0.658276,0.796517,0.747757,0.681439,0.698657
4,0.0936,0.666409,0.800183,0.795601,0.69862,0.728369
5,0.0826,0.676988,0.793767,0.799607,0.7063,0.734164
6,0.0778,0.679546,0.796517,0.803826,0.718576,0.744381
7,0.0738,0.685131,0.792851,0.809153,0.699579,0.735031
8,0.0715,0.684717,0.791934,0.811697,0.728665,0.754425
9,0.0701,0.685774,0.792851,0.8049,0.719397,0.745287
10,0.0683,0.698638,0.790101,0.805696,0.713257,0.742195


[I 2025-03-26 22:40:30,504] Trial 22 finished with value: 0.7486478503420895 and parameters: {'learning_rate': 0.00036966704825076467, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.8, 'temperature': 7.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 23 with params: {'learning_rate': 0.0002112380926140512, 'weight_decay': 0.01, 'warmup_steps': 25, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0592,0.728525,0.76077,0.430299,0.423833,0.410903
2,0.3118,0.64747,0.7956,0.611484,0.568606,0.576237
3,0.1708,0.649829,0.797434,0.704073,0.630513,0.648707
4,0.1231,0.665848,0.7956,0.727718,0.624514,0.655095
5,0.103,0.662114,0.793767,0.776666,0.67464,0.70517
6,0.0921,0.651099,0.797434,0.768622,0.667734,0.698881
7,0.0846,0.663929,0.794684,0.808221,0.705925,0.73669
8,0.081,0.665898,0.800183,0.812511,0.716305,0.74487
9,0.0778,0.663836,0.7956,0.805968,0.711032,0.738442
10,0.0751,0.675487,0.7956,0.805963,0.723432,0.746781


[I 2025-03-26 22:45:01,953] Trial 23 finished with value: 0.7519826693593987 and parameters: {'learning_rate': 0.0002112380926140512, 'weight_decay': 0.01, 'warmup_steps': 25, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 24 with params: {'learning_rate': 0.00011615859910711042, 'weight_decay': 0.01, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3336,0.874633,0.72319,0.32687,0.335474,0.310969
2,0.5276,0.709148,0.76352,0.494946,0.45168,0.449517
3,0.3142,0.656365,0.790101,0.588513,0.537491,0.54351
4,0.2152,0.647123,0.790101,0.631933,0.555288,0.572289
5,0.1648,0.645612,0.797434,0.662675,0.621259,0.628119
6,0.1376,0.640107,0.797434,0.736064,0.638308,0.667341
7,0.1204,0.635177,0.8011,0.743328,0.658005,0.681068
8,0.1101,0.636906,0.796517,0.755575,0.672582,0.696113
9,0.1026,0.643409,0.7956,0.766302,0.670699,0.698379
10,0.0966,0.653778,0.791934,0.748273,0.672356,0.693951


[I 2025-03-26 22:47:59,702] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0003596274888695727, 'weight_decay': 0.004, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8051,0.676196,0.785518,0.515256,0.515998,0.506792
2,0.195,0.636602,0.800183,0.702527,0.641626,0.654482
3,0.1178,0.654661,0.793767,0.747357,0.675122,0.695898
4,0.0938,0.655611,0.797434,0.788393,0.685982,0.717842
5,0.0834,0.668861,0.805683,0.83322,0.719463,0.754988
6,0.078,0.667958,0.793767,0.821258,0.710351,0.746818
7,0.074,0.675318,0.791934,0.824706,0.705519,0.742906
8,0.0718,0.67261,0.793767,0.806979,0.715629,0.741662
9,0.0698,0.693686,0.792851,0.806974,0.724856,0.750545
10,0.0686,0.683582,0.794684,0.821655,0.722318,0.754881


[I 2025-03-26 22:52:28,704] Trial 25 finished with value: 0.7426835668181938 and parameters: {'learning_rate': 0.0003596274888695727, 'weight_decay': 0.004, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}. Best is trial 19 with value: 0.7600185253294757.


Trial 26 with params: {'learning_rate': 5.542464595560726e-05, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.9, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7324,1.261327,0.626948,0.282626,0.247458,0.236524
2,0.9361,0.888421,0.72319,0.343556,0.336293,0.31412
3,0.6255,0.77265,0.747021,0.411964,0.397728,0.385247
4,0.4744,0.721467,0.761687,0.472433,0.451237,0.443811
5,0.3795,0.687285,0.776352,0.502705,0.483123,0.480541
6,0.3146,0.669917,0.777269,0.514792,0.490225,0.490054
7,0.2668,0.654693,0.784601,0.561301,0.51401,0.518244
8,0.2351,0.646553,0.790101,0.609205,0.543073,0.553446
9,0.212,0.645854,0.793767,0.620874,0.554138,0.564543
10,0.1939,0.643955,0.799267,0.652629,0.58523,0.601819


[I 2025-03-26 22:55:27,391] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.00027302085104871567, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9046,0.703146,0.768103,0.47861,0.455535,0.443459
2,0.2431,0.651096,0.794684,0.645104,0.590696,0.601118
3,0.139,0.65447,0.790101,0.698107,0.633992,0.650168
4,0.105,0.661917,0.804766,0.787027,0.669846,0.706181
5,0.0912,0.663175,0.797434,0.81296,0.70242,0.73897
6,0.0833,0.670641,0.800183,0.812135,0.714426,0.747007
7,0.0783,0.673128,0.789184,0.795907,0.699889,0.731113
8,0.0755,0.654787,0.799267,0.822937,0.718691,0.750714
9,0.0732,0.667802,0.7956,0.803574,0.724246,0.746771
10,0.071,0.671061,0.797434,0.817497,0.722953,0.752447


[I 2025-03-26 22:59:56,388] Trial 27 finished with value: 0.7421789819990695 and parameters: {'learning_rate': 0.00027302085104871567, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 6.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 28 with params: {'learning_rate': 0.00036741108261561275, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8013,0.677133,0.776352,0.498969,0.499456,0.492616
2,0.1912,0.659738,0.791017,0.696858,0.619263,0.638235
3,0.1162,0.657667,0.791934,0.740581,0.662484,0.682021
4,0.0931,0.667461,0.794684,0.819829,0.701902,0.738352
5,0.0828,0.672836,0.794684,0.836663,0.713678,0.752788
6,0.0776,0.68686,0.789184,0.792899,0.691859,0.7233
7,0.0735,0.677232,0.790101,0.81555,0.720519,0.751213
8,0.0718,0.685723,0.794684,0.806966,0.736862,0.759999
9,0.0702,0.683288,0.796517,0.801313,0.73765,0.75756
10,0.0682,0.675948,0.797434,0.820043,0.716266,0.750105


[I 2025-03-26 23:04:29,934] Trial 28 finished with value: 0.7589415855310878 and parameters: {'learning_rate': 0.00036741108261561275, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 19 with value: 0.7600185253294757.


Trial 29 with params: {'learning_rate': 0.0001489003810410246, 'weight_decay': 0.01, 'warmup_steps': 6, 'lambda_param': 0.8, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2033,0.800334,0.736022,0.414419,0.373694,0.358812
2,0.426,0.673343,0.775435,0.522796,0.489779,0.491306
3,0.2411,0.64381,0.790101,0.606413,0.557855,0.563971
4,0.1661,0.652343,0.7956,0.688174,0.607859,0.628647
5,0.1322,0.658318,0.791934,0.684024,0.623761,0.638156
6,0.113,0.644476,0.799267,0.751865,0.659429,0.688
7,0.1009,0.646532,0.79835,0.77413,0.668677,0.700251
8,0.0939,0.65338,0.797434,0.779528,0.689624,0.716248
9,0.0887,0.654366,0.7956,0.768709,0.688764,0.71046
10,0.0845,0.67077,0.785518,0.762183,0.681587,0.705886


[I 2025-03-26 23:07:27,275] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0002970354897818153, 'weight_decay': 0.008, 'warmup_steps': 22, 'lambda_param': 1.0, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9053,0.699111,0.769936,0.499933,0.47122,0.468171
2,0.2273,0.653273,0.7956,0.686509,0.615193,0.631174
3,0.1311,0.660651,0.787351,0.698791,0.632322,0.649791
4,0.1011,0.678342,0.788268,0.772685,0.674129,0.702085
5,0.0886,0.666308,0.797434,0.814802,0.698963,0.73489
6,0.0813,0.671173,0.793767,0.805965,0.703604,0.73606
7,0.0766,0.682925,0.787351,0.781696,0.688795,0.718411
8,0.0742,0.673123,0.793767,0.800347,0.713202,0.741251
9,0.0719,0.668903,0.796517,0.81895,0.71634,0.748549
10,0.07,0.689072,0.791934,0.813915,0.719659,0.749528


[I 2025-03-26 23:11:57,325] Trial 30 finished with value: 0.7451001856233275 and parameters: {'learning_rate': 0.0002970354897818153, 'weight_decay': 0.008, 'warmup_steps': 22, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 31 with params: {'learning_rate': 0.0004253205164729637, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 0.8, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7504,0.668114,0.771769,0.526444,0.503091,0.496948
2,0.1722,0.682315,0.781852,0.705652,0.638729,0.655685
3,0.1088,0.686785,0.790101,0.748592,0.678774,0.698611
4,0.0892,0.664468,0.797434,0.806991,0.720365,0.746885
5,0.0807,0.690673,0.793767,0.819557,0.728401,0.754206
6,0.0759,0.672626,0.8011,0.808442,0.739944,0.759421
7,0.0723,0.689227,0.794684,0.798915,0.733234,0.749484
8,0.07,0.692025,0.790101,0.792558,0.726979,0.743676
9,0.0689,0.698522,0.79835,0.790474,0.738205,0.752223
10,0.0674,0.694882,0.796517,0.790293,0.729318,0.744714


[I 2025-03-26 23:14:55,757] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.00010832473126308801, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3776,0.9002,0.721357,0.343738,0.330424,0.307672
2,0.5588,0.72217,0.758937,0.450264,0.436046,0.42525
3,0.3385,0.664726,0.781852,0.560809,0.515349,0.51384
4,0.2324,0.650052,0.785518,0.635792,0.55105,0.569908
5,0.1767,0.644564,0.796517,0.664282,0.615119,0.62314
6,0.1464,0.640286,0.799267,0.707607,0.627286,0.649707
7,0.1277,0.638296,0.800183,0.722192,0.641884,0.66292
8,0.1162,0.636671,0.79835,0.732199,0.652305,0.673572
9,0.1078,0.644588,0.796517,0.759438,0.672112,0.696346
10,0.1011,0.65347,0.7956,0.748959,0.674804,0.695355


[I 2025-03-26 23:17:53,740] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.00037173515654081735, 'weight_decay': 0.006, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8081,0.678153,0.781852,0.502106,0.503779,0.494757
2,0.1906,0.660481,0.793767,0.693521,0.642357,0.652978
3,0.1162,0.651732,0.796517,0.742267,0.660355,0.685391
4,0.093,0.671037,0.791017,0.799498,0.702129,0.731546
5,0.0831,0.676677,0.790101,0.804124,0.690653,0.724932
6,0.0774,0.677056,0.790101,0.812108,0.705363,0.740713
7,0.0736,0.673293,0.794684,0.810423,0.712676,0.742676
8,0.0716,0.670544,0.794684,0.81734,0.718731,0.750689
9,0.0701,0.699324,0.792851,0.815967,0.710128,0.74295
10,0.068,0.691228,0.791017,0.82009,0.710863,0.745326


[I 2025-03-26 23:22:23,306] Trial 33 finished with value: 0.7413270866084656 and parameters: {'learning_rate': 0.00037173515654081735, 'weight_decay': 0.006, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 19 with value: 0.7600185253294757.


Trial 34 with params: {'learning_rate': 1.5205336589627063e-05, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1921,1.961166,0.401467,0.098368,0.089825,0.072463
2,1.7769,1.633149,0.502291,0.13776,0.147415,0.124073
3,1.4861,1.411949,0.580202,0.237275,0.20692,0.192076
4,1.2782,1.25328,0.624198,0.277163,0.241053,0.230137
5,1.1204,1.134856,0.672777,0.25915,0.276275,0.255996
6,1.0038,1.048402,0.68561,0.256095,0.28532,0.260142
7,0.9121,0.984796,0.696609,0.269928,0.298718,0.273678
8,0.8455,0.940257,0.708524,0.323371,0.315027,0.294612
9,0.7929,0.90551,0.713107,0.322617,0.321189,0.297072
10,0.7503,0.881427,0.71769,0.318022,0.326292,0.303246


[I 2025-03-26 23:25:21,209] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 6.087267598950881e-05, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.672,1.200107,0.655362,0.282972,0.26355,0.252334
2,0.8739,0.854799,0.722273,0.355417,0.34484,0.325922
3,0.5787,0.75243,0.75527,0.443038,0.41877,0.406547
4,0.4356,0.706276,0.770852,0.528063,0.477964,0.477185
5,0.3426,0.671215,0.780018,0.522079,0.498519,0.49448


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-26 23:27:23,066] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0002599795583855664, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.9, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9458,0.710302,0.767186,0.471748,0.452345,0.439543
2,0.255,0.646478,0.796517,0.635328,0.593114,0.599856
3,0.144,0.652069,0.793767,0.70301,0.64604,0.658926
4,0.1082,0.661572,0.797434,0.750507,0.646102,0.67616
5,0.0933,0.663557,0.793767,0.796175,0.713258,0.739336
6,0.0852,0.656498,0.79835,0.818304,0.709416,0.744265
7,0.0796,0.667694,0.793767,0.801932,0.705475,0.736822
8,0.0764,0.67051,0.799267,0.814497,0.713995,0.744087
9,0.0741,0.683006,0.792851,0.800471,0.710647,0.737542
10,0.0717,0.68269,0.797434,0.817392,0.710671,0.741838


[I 2025-03-26 23:31:52,007] Trial 36 finished with value: 0.7483363102569595 and parameters: {'learning_rate': 0.0002599795583855664, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.9, 'temperature': 6.5}. Best is trial 19 with value: 0.7600185253294757.


Trial 37 with params: {'learning_rate': 1.0728159166824396e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2504,2.073695,0.349221,0.070858,0.073143,0.051965
2,1.9351,1.814251,0.459212,0.106445,0.12252,0.104
3,1.6986,1.6196,0.502291,0.138324,0.147729,0.124465
4,1.5168,1.472792,0.546288,0.226418,0.17382,0.156091
5,1.372,1.359288,0.587534,0.251077,0.210949,0.195738


[I 2025-03-26 23:33:20,988] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 2.4269221144679105e-05, 'weight_decay': 0.005, 'warmup_steps': 35, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0747,1.751875,0.480293,0.105902,0.133662,0.110845
2,1.5055,1.354619,0.595784,0.26929,0.217419,0.202569
3,1.1625,1.117745,0.679193,0.271906,0.283509,0.262373
4,0.9424,0.975678,0.703025,0.297432,0.310316,0.28807
5,0.7941,0.889312,0.716774,0.322179,0.324102,0.301155


[I 2025-03-26 23:34:49,867] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0003962293213782471, 'weight_decay': 0.003, 'warmup_steps': 22, 'lambda_param': 0.9, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7967,0.679678,0.778185,0.502948,0.502401,0.494795
2,0.1816,0.663835,0.794684,0.708402,0.646877,0.659692
3,0.1122,0.671654,0.791934,0.768524,0.702468,0.719009
4,0.0912,0.655425,0.79835,0.789589,0.684903,0.716544
5,0.0818,0.660173,0.799267,0.812534,0.713136,0.742081
6,0.0767,0.674599,0.796517,0.815552,0.716809,0.749036
7,0.0736,0.672335,0.79835,0.799991,0.705689,0.734261
8,0.0711,0.671901,0.796517,0.799659,0.710232,0.737582
9,0.0694,0.695778,0.790101,0.792392,0.716238,0.735209
10,0.0675,0.686443,0.788268,0.791848,0.71436,0.737009


[I 2025-03-26 23:39:19,027] Trial 39 finished with value: 0.7447416821710183 and parameters: {'learning_rate': 0.0003962293213782471, 'weight_decay': 0.003, 'warmup_steps': 22, 'lambda_param': 0.9, 'temperature': 7.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 40 with params: {'learning_rate': 8.788434110215489e-05, 'weight_decay': 0.001, 'warmup_steps': 49, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5384,1.010105,0.699358,0.304788,0.304749,0.284802
2,0.6719,0.758142,0.748854,0.418533,0.406561,0.394858
3,0.4228,0.695719,0.770852,0.504402,0.483787,0.480263
4,0.2998,0.668679,0.780018,0.565306,0.507386,0.517081
5,0.2265,0.648239,0.793767,0.648972,0.565386,0.582369


[I 2025-03-26 23:40:48,689] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.00022121801488502837, 'weight_decay': 0.001, 'warmup_steps': 25, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0382,0.721113,0.762603,0.432607,0.428929,0.417962
2,0.2985,0.646878,0.794684,0.610878,0.573757,0.578235
3,0.1636,0.651481,0.79835,0.699838,0.628626,0.645752
4,0.1186,0.664587,0.793767,0.755093,0.632649,0.669968
5,0.1001,0.660312,0.799267,0.77867,0.688789,0.715589
6,0.09,0.651949,0.797434,0.759145,0.668525,0.697813
7,0.0832,0.666428,0.791934,0.810921,0.712183,0.741811
8,0.0799,0.665278,0.802016,0.820285,0.716839,0.748092
9,0.0769,0.664222,0.794684,0.812827,0.720623,0.747384
10,0.0741,0.676155,0.7956,0.811912,0.724695,0.749748


[I 2025-03-26 23:45:15,463] Trial 41 finished with value: 0.7451705705566783 and parameters: {'learning_rate': 0.00022121801488502837, 'weight_decay': 0.001, 'warmup_steps': 25, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 42 with params: {'learning_rate': 0.00044248034786486986, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7183,0.685821,0.778185,0.582094,0.522056,0.52764
2,0.1671,0.66955,0.785518,0.673595,0.634073,0.637807
3,0.1063,0.6745,0.7956,0.771757,0.701094,0.720334
4,0.0881,0.690778,0.793767,0.811691,0.727413,0.749035
5,0.0796,0.696013,0.7956,0.806958,0.732927,0.755132
6,0.0747,0.673151,0.8011,0.81048,0.716407,0.743438
7,0.0718,0.702034,0.790101,0.802818,0.713791,0.740575
8,0.0702,0.705551,0.794684,0.825763,0.72187,0.754321
9,0.0682,0.716383,0.786434,0.804769,0.700922,0.731153
10,0.0667,0.723533,0.788268,0.805468,0.716054,0.74038


[I 2025-03-26 23:48:14,869] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.00041559190163094063, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.9, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7809,0.676699,0.777269,0.527431,0.505251,0.501083
2,0.1753,0.666276,0.791934,0.701037,0.648877,0.657123
3,0.1101,0.674592,0.79835,0.785028,0.708106,0.729241
4,0.09,0.66865,0.794684,0.819275,0.728825,0.756859
5,0.0807,0.683719,0.790101,0.799733,0.708552,0.734129
6,0.0759,0.685208,0.79835,0.818741,0.729465,0.755572
7,0.0728,0.710258,0.788268,0.793005,0.716674,0.735516
8,0.071,0.682481,0.793767,0.813977,0.719633,0.745787
9,0.0693,0.697208,0.791934,0.80499,0.723469,0.743304
10,0.0676,0.694424,0.788268,0.804243,0.714905,0.738059


[I 2025-03-26 23:52:42,861] Trial 43 finished with value: 0.746460643495475 and parameters: {'learning_rate': 0.00041559190163094063, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 44 with params: {'learning_rate': 5.8679697914208696e-05, 'weight_decay': 0.002, 'warmup_steps': 45, 'lambda_param': 0.4, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7361,1.242909,0.629698,0.279414,0.246738,0.235578
2,0.9104,0.873133,0.722273,0.339135,0.337323,0.314828
3,0.6018,0.761232,0.750687,0.447478,0.404894,0.395153
4,0.4532,0.713322,0.769019,0.477479,0.467272,0.459935
5,0.3587,0.678056,0.776352,0.497588,0.484887,0.482543


[I 2025-03-26 23:54:12,298] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0004400351450582787, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7349,0.656418,0.786434,0.565741,0.52504,0.52948
2,0.1677,0.641065,0.796517,0.706116,0.660668,0.670401
3,0.1068,0.664303,0.790101,0.772422,0.697043,0.717623
4,0.0882,0.668833,0.792851,0.809467,0.717977,0.74648
5,0.0798,0.67367,0.791934,0.805543,0.71683,0.741346
6,0.0754,0.658705,0.802933,0.822719,0.713001,0.746591
7,0.072,0.697599,0.781852,0.808593,0.702182,0.73184
8,0.0703,0.695166,0.788268,0.797821,0.709478,0.732951
9,0.0684,0.70926,0.785518,0.79244,0.72469,0.74273
10,0.0671,0.692999,0.787351,0.81615,0.706215,0.735396


[I 2025-03-26 23:57:09,743] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00043974319957485277, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7303,0.674102,0.783685,0.538176,0.52274,0.516854
2,0.1682,0.665521,0.787351,0.726835,0.659144,0.674462
3,0.108,0.690644,0.785518,0.729513,0.683418,0.691823
4,0.088,0.676429,0.792851,0.811203,0.722068,0.746776
5,0.0798,0.697013,0.793767,0.812554,0.730729,0.755179
6,0.0752,0.685663,0.791934,0.827123,0.718355,0.75236
7,0.0722,0.704202,0.784601,0.802527,0.713216,0.737999
8,0.0704,0.712653,0.781852,0.800739,0.712673,0.738315
9,0.0686,0.706876,0.787351,0.800313,0.727775,0.749264
10,0.0673,0.702005,0.783685,0.792899,0.714249,0.73818


[I 2025-03-27 00:00:08,186] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.00015215383624977904, 'weight_decay': 0.007, 'warmup_steps': 15, 'lambda_param': 0.9, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2052,0.796133,0.738772,0.413702,0.379663,0.365719
2,0.4208,0.670716,0.780018,0.530892,0.49885,0.500686
3,0.2372,0.640469,0.790101,0.615051,0.561897,0.570429
4,0.1634,0.652206,0.797434,0.697592,0.608533,0.634509
5,0.1303,0.657647,0.793767,0.723775,0.648563,0.670019
6,0.1116,0.643,0.7956,0.757914,0.662517,0.69303
7,0.0998,0.645558,0.802933,0.783304,0.673311,0.706547
8,0.093,0.654944,0.796517,0.784997,0.696143,0.721864
9,0.088,0.653601,0.797434,0.788223,0.694632,0.721258
10,0.0839,0.668226,0.791934,0.781203,0.692286,0.717882


[I 2025-03-27 00:03:05,336] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0003077516411329231, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 1.0, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.898,0.695806,0.774519,0.501555,0.485274,0.479361
2,0.2217,0.652579,0.790101,0.657724,0.595389,0.607421
3,0.1285,0.651745,0.791017,0.705853,0.64256,0.659888
4,0.0995,0.662386,0.8011,0.818311,0.697155,0.734701
5,0.0873,0.656998,0.794684,0.808749,0.686004,0.724251
6,0.0801,0.674112,0.79835,0.814868,0.714886,0.746496
7,0.0757,0.69415,0.791017,0.8207,0.703626,0.741536
8,0.0732,0.677641,0.790101,0.814937,0.699643,0.736457
9,0.0714,0.683134,0.791017,0.804788,0.717714,0.7437
10,0.0692,0.696895,0.787351,0.812057,0.711925,0.743504


[I 2025-03-27 00:07:32,743] Trial 48 finished with value: 0.7486403418034737 and parameters: {'learning_rate': 0.0003077516411329231, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 1.0, 'temperature': 3.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 49 with params: {'learning_rate': 0.0003937805064293631, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8102,0.690728,0.769936,0.495508,0.497928,0.486501
2,0.1814,0.658804,0.791017,0.701316,0.630373,0.645358
3,0.112,0.676276,0.791017,0.775818,0.692402,0.714856
4,0.0906,0.688143,0.796517,0.796319,0.707183,0.733062
5,0.0813,0.694597,0.7956,0.824723,0.704682,0.739059
6,0.0762,0.687491,0.797434,0.811315,0.730045,0.752956
7,0.0728,0.708428,0.785518,0.813128,0.702227,0.733621
8,0.0705,0.685248,0.792851,0.807151,0.713583,0.738854
9,0.0691,0.717326,0.784601,0.814843,0.724051,0.749501
10,0.0676,0.705008,0.791934,0.819238,0.726423,0.755108


[I 2025-03-27 00:12:00,092] Trial 49 finished with value: 0.754853385443597 and parameters: {'learning_rate': 0.0003937805064293631, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 50 with params: {'learning_rate': 0.00043095971529618233, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.4, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7622,0.675808,0.769936,0.495409,0.498824,0.489851
2,0.172,0.669856,0.790101,0.689196,0.635756,0.644707
3,0.1086,0.678554,0.789184,0.746339,0.682019,0.69894
4,0.0893,0.688248,0.782768,0.787845,0.715823,0.733275
5,0.0803,0.680452,0.793767,0.798364,0.718954,0.739887
6,0.0752,0.684377,0.789184,0.807936,0.727278,0.751319
7,0.0723,0.725451,0.781852,0.790355,0.71291,0.733597
8,0.0704,0.699643,0.790101,0.811863,0.700117,0.734728
9,0.0684,0.697055,0.789184,0.791836,0.709673,0.734166
10,0.0669,0.708925,0.792851,0.811981,0.719145,0.748248


[I 2025-03-27 00:16:28,022] Trial 50 finished with value: 0.7406982393506796 and parameters: {'learning_rate': 0.00043095971529618233, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 19 with value: 0.7600185253294757.


Trial 51 with params: {'learning_rate': 0.00042580526682643374, 'weight_decay': 0.003, 'warmup_steps': 40, 'lambda_param': 0.8, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7956,0.68604,0.773602,0.515832,0.49735,0.4909
2,0.1727,0.675604,0.778185,0.675438,0.62522,0.632599
3,0.1089,0.668837,0.79835,0.78894,0.72152,0.738959
4,0.0887,0.665248,0.79835,0.82934,0.719853,0.75311
5,0.0801,0.682409,0.793767,0.828498,0.726233,0.756194
6,0.0753,0.683485,0.791934,0.810341,0.728691,0.751272
7,0.0724,0.717004,0.785518,0.815342,0.712246,0.741604
8,0.0705,0.692499,0.787351,0.807719,0.715608,0.742294
9,0.0684,0.710038,0.780935,0.79854,0.72793,0.749887
10,0.067,0.709444,0.780935,0.795021,0.715596,0.740521


[I 2025-03-27 00:20:54,307] Trial 51 finished with value: 0.7621312725763352 and parameters: {'learning_rate': 0.00042580526682643374, 'weight_decay': 0.003, 'warmup_steps': 40, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 52 with params: {'learning_rate': 0.00024320473429530182, 'weight_decay': 0.003, 'warmup_steps': 41, 'lambda_param': 0.8, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0175,0.715999,0.769019,0.448881,0.447546,0.436059
2,0.2749,0.643116,0.79835,0.591901,0.574086,0.571826
3,0.151,0.6565,0.793767,0.70442,0.637627,0.656258
4,0.1113,0.658957,0.797434,0.761717,0.636131,0.675993
5,0.0951,0.660604,0.802016,0.770438,0.701534,0.721727
6,0.0865,0.652477,0.8011,0.785426,0.679567,0.709241
7,0.0806,0.660546,0.793767,0.791302,0.700641,0.725514
8,0.0777,0.656414,0.799267,0.799968,0.708721,0.736235
9,0.075,0.666637,0.796517,0.808372,0.721526,0.744539
10,0.0724,0.680266,0.791017,0.804024,0.723357,0.746161


[I 2025-03-27 00:25:23,249] Trial 52 finished with value: 0.7392967418844495 and parameters: {'learning_rate': 0.00024320473429530182, 'weight_decay': 0.003, 'warmup_steps': 41, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 53 with params: {'learning_rate': 0.00046218411755430524, 'weight_decay': 0.004, 'warmup_steps': 52, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7869,0.668436,0.778185,0.556157,0.501349,0.50048
2,0.1663,0.671971,0.787351,0.689949,0.628088,0.638641
3,0.1053,0.70024,0.789184,0.782077,0.701331,0.722094
4,0.0876,0.703605,0.786434,0.801775,0.707636,0.735244
5,0.0795,0.690088,0.786434,0.809424,0.70738,0.739668
6,0.0752,0.737824,0.788268,0.833793,0.719207,0.755172
7,0.0723,0.705841,0.782768,0.813321,0.706885,0.73778
8,0.0697,0.728427,0.784601,0.815932,0.719382,0.749777
9,0.0682,0.738613,0.782768,0.817967,0.712882,0.745681
10,0.0669,0.730296,0.783685,0.808511,0.711801,0.743389


[I 2025-03-27 00:29:49,425] Trial 53 finished with value: 0.7463780969218667 and parameters: {'learning_rate': 0.00046218411755430524, 'weight_decay': 0.004, 'warmup_steps': 52, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 54 with params: {'learning_rate': 2.2869967933363696e-05, 'weight_decay': 0.007, 'warmup_steps': 45, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0976,1.785137,0.472961,0.108522,0.130445,0.109234
2,1.5457,1.392995,0.584785,0.245193,0.210871,0.196953
3,1.2058,1.154546,0.670027,0.270742,0.277688,0.258672
4,0.9842,1.006436,0.696609,0.274844,0.299355,0.275607
5,0.8327,0.914099,0.708524,0.322099,0.316248,0.292869


[I 2025-03-27 00:31:18,045] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.0004482970235527668, 'weight_decay': 0.002, 'warmup_steps': 37, 'lambda_param': 0.9, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7749,0.662315,0.785518,0.529734,0.508625,0.504247
2,0.1666,0.657271,0.8011,0.717436,0.666788,0.676002
3,0.1066,0.664644,0.800183,0.769965,0.708046,0.722556
4,0.0886,0.681766,0.786434,0.783296,0.684065,0.714722
5,0.0798,0.685247,0.791017,0.803853,0.710827,0.737886
6,0.0746,0.704536,0.786434,0.81331,0.691745,0.728593
7,0.0718,0.701902,0.784601,0.800887,0.695881,0.727154
8,0.07,0.702193,0.782768,0.787462,0.697117,0.723094
9,0.0681,0.7104,0.784601,0.782856,0.709941,0.729741
10,0.0667,0.716712,0.784601,0.794673,0.702788,0.729731


[I 2025-03-27 00:34:16,286] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.00014720454397539664, 'weight_decay': 0.009000000000000001, 'warmup_steps': 50, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2695,0.814201,0.735105,0.368061,0.360141,0.338038
2,0.439,0.678672,0.776352,0.527626,0.494758,0.496519
3,0.2486,0.644338,0.788268,0.579576,0.54212,0.546126
4,0.1706,0.656273,0.788268,0.677466,0.590471,0.615279
5,0.1346,0.655234,0.794684,0.700039,0.624662,0.645568


[I 2025-03-27 00:35:45,941] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00025929682944343565, 'weight_decay': 0.005, 'warmup_steps': 51, 'lambda_param': 0.4, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0042,0.70618,0.765353,0.447611,0.454661,0.442126
2,0.2607,0.642225,0.797434,0.599018,0.573684,0.573202
3,0.1444,0.662658,0.790101,0.709215,0.631806,0.655855
4,0.1079,0.650568,0.800183,0.766612,0.653768,0.685961
5,0.0927,0.648658,0.802933,0.771403,0.683297,0.710754
6,0.0848,0.65556,0.8011,0.827528,0.703538,0.740532
7,0.0791,0.662504,0.7956,0.787261,0.691453,0.717967
8,0.0764,0.651391,0.80385,0.820616,0.725231,0.752978
9,0.0737,0.657249,0.79835,0.822304,0.716532,0.74756
10,0.0716,0.670517,0.794684,0.812106,0.720096,0.746057


[I 2025-03-27 00:40:13,300] Trial 57 finished with value: 0.7383194081881118 and parameters: {'learning_rate': 0.00025929682944343565, 'weight_decay': 0.005, 'warmup_steps': 51, 'lambda_param': 0.4, 'temperature': 3.0}. Best is trial 51 with value: 0.7621312725763352.


Trial 58 with params: {'learning_rate': 0.00033159441463755617, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8376,0.690736,0.781852,0.509691,0.503557,0.499336
2,0.2066,0.649496,0.79835,0.690424,0.631157,0.644744
3,0.1226,0.653988,0.797434,0.736202,0.667764,0.68629
4,0.0964,0.67261,0.792851,0.774202,0.672768,0.701859
5,0.0851,0.6707,0.797434,0.786609,0.698635,0.722901
6,0.0788,0.662532,0.80385,0.822528,0.717208,0.748214
7,0.075,0.679438,0.7956,0.809638,0.710301,0.742713
8,0.0724,0.677692,0.792851,0.810294,0.709336,0.739557
9,0.0709,0.681263,0.790101,0.804201,0.705237,0.734269
10,0.0689,0.681345,0.790101,0.805663,0.714925,0.744735


[I 2025-03-27 00:44:52,835] Trial 58 finished with value: 0.7459187170709238 and parameters: {'learning_rate': 0.00033159441463755617, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 59 with params: {'learning_rate': 9.856270612834072e-05, 'weight_decay': 0.001, 'warmup_steps': 21, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4417,0.941909,0.71494,0.328485,0.318337,0.293131
2,0.6045,0.736012,0.75527,0.435499,0.419143,0.408768
3,0.3743,0.677375,0.772686,0.500088,0.488166,0.483342
4,0.2604,0.659205,0.785518,0.626823,0.540772,0.562789
5,0.1968,0.64668,0.7956,0.651679,0.590892,0.602952


[I 2025-03-27 00:46:21,205] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 2.9068676100418608e-05, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 1.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9995,1.645242,0.501375,0.161658,0.146233,0.122252
2,1.3797,1.236079,0.631531,0.279453,0.250064,0.236581
3,1.0303,1.012637,0.703941,0.327938,0.307753,0.287884
4,0.8189,0.892699,0.710357,0.311574,0.31976,0.294161
5,0.6829,0.824123,0.729606,0.327506,0.348586,0.325386
6,0.5944,0.781707,0.737855,0.413674,0.381027,0.370421
7,0.5273,0.752109,0.751604,0.422634,0.404278,0.393609
8,0.4821,0.730687,0.756187,0.46464,0.428457,0.422158
9,0.4465,0.720323,0.762603,0.486492,0.451564,0.446136
10,0.4172,0.710333,0.764436,0.477576,0.458101,0.451425


[I 2025-03-27 00:49:17,261] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0002744501800271801, 'weight_decay': 0.003, 'warmup_steps': 38, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9617,0.700102,0.765353,0.444239,0.446145,0.431536
2,0.2458,0.643356,0.792851,0.62487,0.594946,0.595625
3,0.1379,0.65969,0.791017,0.691436,0.63758,0.650042
4,0.1049,0.665798,0.79835,0.789079,0.678172,0.712396
5,0.0914,0.662331,0.788268,0.762259,0.673887,0.700127
6,0.0837,0.655808,0.799267,0.779562,0.678955,0.710047
7,0.0781,0.657894,0.796517,0.787024,0.681545,0.713213
8,0.0755,0.660953,0.79835,0.801219,0.708265,0.736882
9,0.0729,0.675135,0.791934,0.785998,0.702683,0.729737
10,0.0709,0.686919,0.790101,0.789536,0.709618,0.733468


[I 2025-03-27 00:52:14,127] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 6.19670485759995e-05, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.2, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6731,1.191568,0.659028,0.282645,0.266096,0.254074
2,0.863,0.847308,0.724106,0.330821,0.342975,0.321597
3,0.5691,0.748878,0.75527,0.437279,0.415597,0.403798
4,0.4277,0.703969,0.769936,0.500894,0.473986,0.471061
5,0.3362,0.670801,0.779102,0.51932,0.493988,0.493132
6,0.276,0.656442,0.785518,0.563917,0.514729,0.519142
7,0.233,0.644614,0.791934,0.602093,0.538662,0.545568
8,0.2052,0.639857,0.791017,0.642949,0.566243,0.581992
9,0.1853,0.639331,0.79835,0.655456,0.589741,0.605358
10,0.1699,0.640106,0.799267,0.663407,0.607629,0.619636


[I 2025-03-27 00:55:11,797] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0003269851540147738, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.5, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8676,0.696021,0.773602,0.497307,0.492134,0.485432
2,0.2098,0.655411,0.788268,0.702589,0.621441,0.638917
3,0.1237,0.665752,0.787351,0.761469,0.671028,0.697546
4,0.097,0.651311,0.8011,0.778872,0.694826,0.720336
5,0.0855,0.672783,0.791017,0.80948,0.69969,0.732375
6,0.0793,0.678606,0.793767,0.79533,0.711397,0.738334
7,0.075,0.676565,0.796517,0.785395,0.69801,0.724095
8,0.0727,0.692875,0.789184,0.798342,0.709902,0.735396
9,0.0708,0.686182,0.791017,0.812139,0.723526,0.749231
10,0.0691,0.682928,0.7956,0.820592,0.726007,0.752914


[I 2025-03-27 00:59:39,680] Trial 63 finished with value: 0.7617530399634793 and parameters: {'learning_rate': 0.0003269851540147738, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.5, 'temperature': 5.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 64 with params: {'learning_rate': 5.986275918990953e-05, 'weight_decay': 0.008, 'warmup_steps': 47, 'lambda_param': 0.0, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7285,1.23155,0.630614,0.278078,0.247167,0.235616
2,0.898,0.86616,0.721357,0.338254,0.33696,0.31431
3,0.5919,0.756884,0.752521,0.447885,0.407765,0.397739
4,0.445,0.710508,0.771769,0.4942,0.472312,0.46709
5,0.3509,0.675188,0.777269,0.514958,0.488887,0.489156
6,0.2885,0.66049,0.782768,0.545438,0.504379,0.50678
7,0.2439,0.648022,0.786434,0.575879,0.521185,0.524646
8,0.2148,0.641883,0.791017,0.643451,0.56401,0.581621
9,0.1937,0.640963,0.797434,0.659441,0.581935,0.601218
10,0.1774,0.641473,0.799267,0.66635,0.599465,0.614709


[I 2025-03-27 01:02:37,005] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0003438151739495363, 'weight_decay': 0.001, 'warmup_steps': 16, 'lambda_param': 0.5, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8404,0.69158,0.771769,0.495745,0.493002,0.484089
2,0.2024,0.647193,0.797434,0.712331,0.633285,0.655642
3,0.12,0.666176,0.791017,0.734555,0.663001,0.684555
4,0.0948,0.6791,0.797434,0.789966,0.707784,0.731728
5,0.0842,0.674661,0.790101,0.790397,0.693744,0.720806
6,0.0786,0.660323,0.800183,0.809083,0.719577,0.743781
7,0.0748,0.669862,0.797434,0.828767,0.703522,0.743855
8,0.0722,0.669829,0.792851,0.805549,0.699646,0.732162
9,0.0708,0.694448,0.791017,0.807601,0.708641,0.736697
10,0.0685,0.696348,0.791934,0.808474,0.708529,0.738198


[I 2025-03-27 01:05:35,656] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.00023577468506900238, 'weight_decay': 0.002, 'warmup_steps': 38, 'lambda_param': 0.5, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0272,0.718113,0.768103,0.43089,0.440722,0.425999
2,0.2828,0.640934,0.8011,0.588613,0.576663,0.573444
3,0.155,0.652044,0.792851,0.699222,0.638928,0.653732
4,0.1136,0.66182,0.79835,0.758893,0.645535,0.679582
5,0.0967,0.663945,0.793767,0.765485,0.674081,0.701921
6,0.0873,0.645708,0.8011,0.782387,0.668388,0.702141
7,0.0813,0.658874,0.792851,0.788866,0.684551,0.714497
8,0.0783,0.657993,0.79835,0.818288,0.713714,0.744428
9,0.0755,0.664133,0.799267,0.813143,0.712203,0.742539
10,0.073,0.674755,0.791017,0.809842,0.720639,0.746272


[I 2025-03-27 01:10:02,557] Trial 66 finished with value: 0.7400104937027346 and parameters: {'learning_rate': 0.00023577468506900238, 'weight_decay': 0.002, 'warmup_steps': 38, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 51 with value: 0.7621312725763352.


Trial 67 with params: {'learning_rate': 0.000380476328390809, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.5, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8058,0.686745,0.769936,0.500522,0.500627,0.491683
2,0.1871,0.655962,0.792851,0.70022,0.645078,0.656547
3,0.1142,0.660142,0.79835,0.769045,0.683008,0.708175
4,0.0924,0.677648,0.792851,0.810696,0.70051,0.73186
5,0.0827,0.660755,0.7956,0.800096,0.69982,0.72969
6,0.0773,0.684727,0.7956,0.797536,0.703199,0.731759
7,0.0738,0.690934,0.792851,0.809759,0.718071,0.745959
8,0.0715,0.676464,0.797434,0.8042,0.709161,0.736857
9,0.0699,0.671888,0.80385,0.817274,0.722114,0.750713
10,0.068,0.684374,0.7956,0.814394,0.727042,0.753902


[I 2025-03-27 01:14:28,509] Trial 67 finished with value: 0.7453675686363961 and parameters: {'learning_rate': 0.000380476328390809, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 51 with value: 0.7621312725763352.


Trial 68 with params: {'learning_rate': 0.00010517624896066003, 'weight_decay': 0.002, 'warmup_steps': 29, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4179,0.917855,0.715857,0.33564,0.32483,0.301841
2,0.5762,0.72519,0.759853,0.462813,0.440435,0.432572
3,0.351,0.668662,0.780018,0.536085,0.507654,0.504594
4,0.2423,0.654256,0.789184,0.637335,0.549298,0.568573
5,0.1832,0.646858,0.794684,0.64877,0.601119,0.608025


[I 2025-03-27 01:15:57,867] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.00022627704070743494, 'weight_decay': 0.005, 'warmup_steps': 9, 'lambda_param': 0.8, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0043,0.719738,0.76352,0.435726,0.440367,0.42558
2,0.2885,0.648481,0.796517,0.620137,0.572783,0.582103
3,0.1602,0.651158,0.796517,0.706835,0.632862,0.650816
4,0.1176,0.658402,0.797434,0.747477,0.638374,0.672075
5,0.0994,0.657907,0.7956,0.733545,0.667642,0.685503
6,0.0893,0.657561,0.8011,0.813159,0.71286,0.74668
7,0.0829,0.661984,0.797434,0.794154,0.714184,0.738575
8,0.0797,0.664089,0.799267,0.814586,0.710869,0.740901
9,0.0766,0.663471,0.79835,0.816287,0.718887,0.747597
10,0.0738,0.680816,0.796517,0.81547,0.722454,0.749808


[I 2025-03-27 01:20:26,090] Trial 69 finished with value: 0.7459437290185895 and parameters: {'learning_rate': 0.00022627704070743494, 'weight_decay': 0.005, 'warmup_steps': 9, 'lambda_param': 0.8, 'temperature': 7.0}. Best is trial 51 with value: 0.7621312725763352.


Trial 70 with params: {'learning_rate': 0.00020217971206811072, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0608,0.734107,0.75802,0.419002,0.416056,0.401513
2,0.3222,0.645009,0.792851,0.611649,0.557525,0.566122
3,0.1772,0.648086,0.791934,0.687606,0.618272,0.632472
4,0.127,0.652751,0.793767,0.753242,0.635492,0.671016
5,0.1058,0.664142,0.790101,0.755158,0.659737,0.687301
6,0.0943,0.657386,0.797434,0.795496,0.668796,0.708931
7,0.0865,0.658272,0.791017,0.812138,0.70781,0.73885
8,0.0824,0.663254,0.799267,0.807208,0.712892,0.739858
9,0.079,0.654954,0.802933,0.807713,0.721326,0.745746
10,0.0759,0.675895,0.796517,0.814127,0.708282,0.740716


[I 2025-03-27 01:24:53,995] Trial 70 finished with value: 0.7560155310479446 and parameters: {'learning_rate': 0.00020217971206811072, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 51 with value: 0.7621312725763352.


Trial 71 with params: {'learning_rate': 0.00019228065670426477, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19, 'lambda_param': 0.8, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0949,0.743743,0.757104,0.447858,0.42981,0.421825
2,0.3401,0.649476,0.791017,0.589543,0.54186,0.546339
3,0.1868,0.649104,0.797434,0.695376,0.622458,0.637824
4,0.1325,0.654482,0.7956,0.731291,0.621651,0.654213
5,0.1093,0.665222,0.791934,0.753293,0.669653,0.69198


[I 2025-03-27 01:26:24,669] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0003257924720701566, 'weight_decay': 0.006, 'warmup_steps': 44, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.901,0.684889,0.771769,0.491416,0.47978,0.474971
2,0.2128,0.651668,0.792851,0.690841,0.622798,0.637314
3,0.1239,0.657049,0.802016,0.763305,0.673014,0.699211
4,0.0973,0.673325,0.794684,0.78519,0.682819,0.709713
5,0.0857,0.672598,0.792851,0.787728,0.690401,0.717362
6,0.0788,0.657726,0.805683,0.813876,0.709288,0.740667
7,0.0753,0.675247,0.8011,0.824159,0.710864,0.745068
8,0.0726,0.66914,0.797434,0.827511,0.705943,0.742266
9,0.0704,0.689093,0.797434,0.819741,0.730121,0.756304
10,0.0687,0.696196,0.793767,0.80284,0.713847,0.742599


[I 2025-03-27 01:30:54,481] Trial 72 finished with value: 0.7437193687737823 and parameters: {'learning_rate': 0.0003257924720701566, 'weight_decay': 0.006, 'warmup_steps': 44, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 73 with params: {'learning_rate': 0.00029835040132382163, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8736,0.691822,0.772686,0.50104,0.475855,0.471194
2,0.2263,0.64482,0.792851,0.645437,0.600673,0.609131
3,0.1307,0.654592,0.792851,0.731772,0.648381,0.673268
4,0.1006,0.664407,0.791017,0.787915,0.665829,0.70194
5,0.0887,0.651003,0.800183,0.82459,0.701015,0.740927
6,0.0812,0.664014,0.799267,0.813838,0.712031,0.7457
7,0.0767,0.670995,0.791017,0.807955,0.717376,0.746068
8,0.0744,0.673551,0.79835,0.814667,0.716805,0.749544
9,0.0719,0.672307,0.797434,0.822201,0.724623,0.75387
10,0.0701,0.672689,0.797434,0.818501,0.722116,0.753087


[I 2025-03-27 01:35:26,526] Trial 73 finished with value: 0.7389774858643093 and parameters: {'learning_rate': 0.00029835040132382163, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 51 with value: 0.7621312725763352.


Trial 74 with params: {'learning_rate': 0.00034269616032774053, 'weight_decay': 0.004, 'warmup_steps': 26, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8547,0.694119,0.772686,0.48808,0.488214,0.4795
2,0.2028,0.668045,0.787351,0.696733,0.626772,0.639431
3,0.1203,0.666724,0.787351,0.72774,0.657505,0.675267
4,0.0954,0.663261,0.79835,0.767453,0.691515,0.711069
5,0.0844,0.657855,0.802933,0.807414,0.710052,0.737578
6,0.078,0.665557,0.799267,0.79341,0.708172,0.734774
7,0.0746,0.678235,0.790101,0.805154,0.703618,0.732914
8,0.0721,0.675363,0.792851,0.815063,0.700077,0.735391
9,0.0704,0.684677,0.787351,0.803735,0.703723,0.732545
10,0.0686,0.695758,0.785518,0.804414,0.706335,0.735297


[I 2025-03-27 01:38:25,125] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.000422780629913313, 'weight_decay': 0.003, 'warmup_steps': 35, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7909,0.6764,0.777269,0.522243,0.503039,0.496618
2,0.1734,0.66805,0.788268,0.725727,0.65605,0.671435
3,0.1091,0.656462,0.8011,0.779077,0.721034,0.734503
4,0.0896,0.669552,0.791017,0.808224,0.705545,0.733845
5,0.0805,0.689774,0.792851,0.779704,0.7207,0.732403
6,0.0758,0.688187,0.796517,0.805137,0.722968,0.746541
7,0.0719,0.700129,0.788268,0.805525,0.720344,0.745523
8,0.0705,0.689968,0.7956,0.804306,0.725321,0.746283
9,0.069,0.706064,0.785518,0.809437,0.730387,0.75329
10,0.0674,0.706657,0.781852,0.807798,0.70544,0.738373


[I 2025-03-27 01:41:24,807] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.00015790650376126586, 'weight_decay': 0.002, 'warmup_steps': 33, 'lambda_param': 0.9, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2106,0.791616,0.738772,0.411606,0.381076,0.367346
2,0.41,0.667548,0.780935,0.530006,0.499888,0.498404
3,0.229,0.641516,0.790101,0.616928,0.557077,0.568278
4,0.1579,0.658653,0.796517,0.696022,0.607091,0.631895
5,0.1265,0.657932,0.791017,0.689529,0.623248,0.640748
6,0.1089,0.642311,0.797434,0.761603,0.662802,0.694959
7,0.0974,0.651625,0.796517,0.77474,0.668699,0.700426
8,0.0913,0.655276,0.79835,0.782296,0.690752,0.71788
9,0.0864,0.65395,0.796517,0.78568,0.691894,0.719174
10,0.0827,0.66741,0.794684,0.790575,0.698842,0.727975


[I 2025-03-27 01:44:23,402] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0001778617002751221, 'weight_decay': 0.009000000000000001, 'warmup_steps': 7, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1161,0.760226,0.75527,0.434546,0.411782,0.401372
2,0.3632,0.657349,0.781852,0.554498,0.517261,0.521166
3,0.201,0.646259,0.791017,0.624238,0.586181,0.592537
4,0.1408,0.652867,0.79835,0.695588,0.616415,0.63929
5,0.1151,0.661416,0.789184,0.718233,0.63706,0.660185


[I 2025-03-27 01:45:53,262] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.00021926733388471974, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0285,0.721152,0.76352,0.432366,0.43529,0.421423
2,0.2991,0.647133,0.792851,0.613567,0.56461,0.571967
3,0.1643,0.654196,0.793767,0.705868,0.628974,0.647482
4,0.1195,0.655786,0.793767,0.746609,0.634811,0.668988
5,0.1007,0.663684,0.797434,0.753995,0.679814,0.70091
6,0.0907,0.659698,0.797434,0.7845,0.677857,0.709627
7,0.0838,0.657052,0.7956,0.808606,0.708264,0.738735
8,0.0803,0.662452,0.796517,0.813879,0.710238,0.739908
9,0.0773,0.661801,0.79835,0.812887,0.706658,0.738591
10,0.0745,0.680349,0.797434,0.820693,0.719899,0.751934


[I 2025-03-27 01:50:22,464] Trial 78 finished with value: 0.7533454381226101 and parameters: {'learning_rate': 0.00021926733388471974, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 79 with params: {'learning_rate': 0.0001636841920990703, 'weight_decay': 0.01, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1726,0.778256,0.740605,0.394483,0.380329,0.363992
2,0.3948,0.665249,0.785518,0.552646,0.519473,0.523547
3,0.2199,0.642167,0.786434,0.61256,0.565023,0.573068
4,0.1523,0.654993,0.799267,0.71518,0.618771,0.648458
5,0.1229,0.660162,0.792851,0.746774,0.659327,0.684956
6,0.1063,0.64451,0.796517,0.771131,0.662711,0.698329
7,0.0955,0.6513,0.799267,0.781803,0.672106,0.704494
8,0.0899,0.657828,0.7956,0.78407,0.703759,0.726587
9,0.0852,0.66036,0.792851,0.773301,0.694621,0.712986
10,0.0815,0.670365,0.791017,0.798081,0.711435,0.73609


[I 2025-03-27 01:53:20,949] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0002787420748443775, 'weight_decay': 0.007, 'warmup_steps': 38, 'lambda_param': 0.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9552,0.696971,0.768103,0.447873,0.449768,0.435195
2,0.2422,0.64772,0.7956,0.628644,0.600024,0.599291
3,0.1363,0.659104,0.790101,0.690584,0.635961,0.648101
4,0.1038,0.667655,0.794684,0.787028,0.666107,0.702008
5,0.0909,0.664326,0.793767,0.768772,0.678058,0.702922
6,0.0829,0.654745,0.794684,0.784465,0.670967,0.706312
7,0.0779,0.66311,0.796517,0.782504,0.684919,0.715422
8,0.0751,0.672506,0.796517,0.80619,0.700834,0.730968
9,0.0727,0.681132,0.791934,0.782023,0.707585,0.729625
10,0.0705,0.68741,0.786434,0.784714,0.706272,0.728117


[I 2025-03-27 01:56:21,212] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.00041886493671855284, 'weight_decay': 0.0, 'warmup_steps': 33, 'lambda_param': 0.4, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7919,0.674488,0.771769,0.518334,0.499932,0.492506
2,0.1739,0.674323,0.788268,0.67644,0.637927,0.638798
3,0.1086,0.678635,0.793767,0.771622,0.70327,0.720245
4,0.0893,0.666289,0.793767,0.821906,0.712628,0.745921
5,0.0804,0.690158,0.791017,0.812283,0.71036,0.741075
6,0.0754,0.680707,0.797434,0.821236,0.722714,0.753289
7,0.0723,0.696706,0.792851,0.815548,0.716976,0.745642
8,0.07,0.705038,0.790101,0.810141,0.705421,0.7351
9,0.0689,0.686215,0.792851,0.811365,0.724414,0.749556
10,0.0672,0.706534,0.787351,0.798737,0.710859,0.73562


[I 2025-03-27 01:59:21,881] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.00019287931040767018, 'weight_decay': 0.0, 'warmup_steps': 38, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1202,0.748811,0.758937,0.445675,0.432537,0.423952
2,0.3426,0.649975,0.791017,0.56926,0.53325,0.535129
3,0.188,0.647017,0.791017,0.646206,0.591871,0.601255
4,0.1335,0.657608,0.796517,0.753494,0.643386,0.676556
5,0.1094,0.668825,0.790101,0.755086,0.67158,0.694544


[I 2025-03-27 02:00:51,246] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.00024889333178909064, 'weight_decay': 0.001, 'warmup_steps': 22, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9802,0.711215,0.767186,0.446176,0.449485,0.435783
2,0.2673,0.646183,0.8011,0.620443,0.583358,0.588309
3,0.149,0.655976,0.7956,0.711417,0.647133,0.663941
4,0.1105,0.663139,0.794684,0.785371,0.653522,0.695915
5,0.0946,0.670393,0.789184,0.781125,0.705655,0.728375
6,0.0858,0.657547,0.79835,0.806929,0.693255,0.727126
7,0.0801,0.673316,0.788268,0.80754,0.691682,0.727702
8,0.0771,0.673124,0.797434,0.812105,0.704304,0.737585
9,0.0746,0.668472,0.796517,0.814756,0.721361,0.749683
10,0.0718,0.687519,0.784601,0.814844,0.708665,0.741045


[I 2025-03-27 02:05:19,175] Trial 83 finished with value: 0.7394552401280714 and parameters: {'learning_rate': 0.00024889333178909064, 'weight_decay': 0.001, 'warmup_steps': 22, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 84 with params: {'learning_rate': 0.0003112612834598753, 'weight_decay': 0.0, 'warmup_steps': 34, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9049,0.697209,0.771769,0.494296,0.476805,0.470046
2,0.22,0.655687,0.791017,0.654523,0.601756,0.608251
3,0.1272,0.647965,0.797434,0.725533,0.671107,0.684647
4,0.0985,0.663246,0.79835,0.809474,0.696258,0.730353
5,0.0869,0.673594,0.793767,0.799864,0.723139,0.742059
6,0.0799,0.676786,0.800183,0.828575,0.731353,0.761943
7,0.0756,0.687532,0.792851,0.801703,0.709724,0.735769
8,0.0732,0.681125,0.79835,0.820692,0.713534,0.74677
9,0.0711,0.686808,0.792851,0.810892,0.709751,0.739568
10,0.0692,0.689917,0.791017,0.811437,0.709063,0.739606


[I 2025-03-27 02:08:18,891] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 6.256012751877123e-05, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6784,1.190022,0.659028,0.279415,0.266005,0.253452
2,0.8599,0.845661,0.725023,0.340615,0.345767,0.326015
3,0.5659,0.747696,0.753437,0.424525,0.413805,0.400526
4,0.425,0.703224,0.770852,0.501424,0.47435,0.471512
5,0.3336,0.669595,0.780018,0.514823,0.494659,0.492377
6,0.2734,0.655818,0.789184,0.573989,0.517971,0.523305
7,0.2307,0.64498,0.791934,0.602255,0.537814,0.545332
8,0.203,0.640692,0.791934,0.644775,0.566347,0.582671
9,0.1833,0.640349,0.79835,0.663226,0.598846,0.61255
10,0.1683,0.64116,0.79835,0.663979,0.607193,0.619718


[I 2025-03-27 02:11:18,607] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0004333510509755652, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 1.0, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7594,0.692221,0.772686,0.508493,0.502015,0.491187
2,0.1708,0.685575,0.783685,0.694518,0.640003,0.652792
3,0.1078,0.689572,0.787351,0.743123,0.674592,0.692616
4,0.09,0.680827,0.789184,0.79773,0.69743,0.726717
5,0.0803,0.678227,0.791017,0.81997,0.703909,0.737846
6,0.0758,0.666032,0.791934,0.805543,0.712444,0.740816
7,0.0724,0.692522,0.787351,0.788572,0.703787,0.730314
8,0.0703,0.697609,0.784601,0.803006,0.703337,0.735202
9,0.0686,0.694945,0.787351,0.805873,0.705323,0.736499
10,0.0672,0.700298,0.792851,0.806682,0.716267,0.743643


[I 2025-03-27 02:14:14,465] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.000454956825370539, 'weight_decay': 0.003, 'warmup_steps': 53, 'lambda_param': 0.8, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7932,0.671402,0.779102,0.545103,0.504756,0.502473
2,0.1683,0.654031,0.799267,0.744271,0.651835,0.68091
3,0.1063,0.69687,0.782768,0.779617,0.693135,0.720274
4,0.088,0.681531,0.792851,0.81221,0.705673,0.734869
5,0.0796,0.682886,0.794684,0.817202,0.730495,0.754798
6,0.0751,0.689731,0.780935,0.80285,0.690321,0.723032
7,0.0714,0.70657,0.780018,0.789922,0.689544,0.719639
8,0.0698,0.713161,0.776352,0.810413,0.685531,0.722091
9,0.0685,0.718895,0.782768,0.802289,0.702479,0.729105
10,0.0668,0.720306,0.783685,0.80586,0.699215,0.730463


[I 2025-03-27 02:17:15,528] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.00046987899906866565, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7124,0.663725,0.789184,0.599671,0.534764,0.547629
2,0.1625,0.687865,0.783685,0.691263,0.649439,0.657628
3,0.1049,0.727399,0.781852,0.76662,0.708145,0.720379
4,0.0877,0.671171,0.791017,0.803823,0.718494,0.741978
5,0.0793,0.678191,0.793767,0.798991,0.725038,0.746043
6,0.0749,0.68691,0.791934,0.812618,0.726562,0.75247
7,0.072,0.712378,0.787351,0.809842,0.711443,0.739138
8,0.0694,0.697191,0.793767,0.797311,0.72402,0.743245
9,0.0678,0.702724,0.791017,0.808865,0.73746,0.759443
10,0.0664,0.709416,0.785518,0.797981,0.728046,0.745973


[I 2025-03-27 02:21:42,775] Trial 88 finished with value: 0.7614159257587808 and parameters: {'learning_rate': 0.00046987899906866565, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 3.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 89 with params: {'learning_rate': 0.00046088735144171096, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7222,0.655089,0.781852,0.57713,0.536899,0.535648
2,0.1638,0.660456,0.791934,0.737965,0.657017,0.677887
3,0.1047,0.678954,0.785518,0.734382,0.687181,0.697028
4,0.0875,0.681826,0.800183,0.806207,0.72113,0.746754
5,0.0788,0.710602,0.790101,0.810555,0.728072,0.751161
6,0.0748,0.672098,0.797434,0.813513,0.728739,0.75208
7,0.0719,0.714714,0.781852,0.788023,0.71794,0.737446
8,0.0695,0.675884,0.791934,0.779898,0.707852,0.725935
9,0.0684,0.678681,0.7956,0.797321,0.731651,0.748581
10,0.0671,0.700027,0.789184,0.799148,0.717971,0.741373


[I 2025-03-27 02:24:38,838] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.00041772560018200775, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 1.0, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7555,0.664481,0.777269,0.51732,0.504899,0.497248
2,0.1736,0.668419,0.790101,0.676919,0.631699,0.641738
3,0.1091,0.669811,0.791934,0.732191,0.682736,0.694869
4,0.0895,0.6646,0.800183,0.806011,0.715755,0.743624
5,0.0805,0.683066,0.793767,0.803749,0.726589,0.749837
6,0.076,0.678554,0.800183,0.815655,0.726911,0.753628
7,0.0722,0.703945,0.786434,0.788573,0.71538,0.737092
8,0.0704,0.677641,0.7956,0.805962,0.73167,0.750151
9,0.069,0.695895,0.790101,0.80858,0.727331,0.748317
10,0.0674,0.698286,0.791017,0.807631,0.727785,0.751889


[I 2025-03-27 02:29:06,164] Trial 90 finished with value: 0.7515088403108894 and parameters: {'learning_rate': 0.00041772560018200775, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 1.0, 'temperature': 3.0}. Best is trial 51 with value: 0.7621312725763352.


Trial 91 with params: {'learning_rate': 0.00019999234429283712, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0497,0.736608,0.759853,0.43296,0.432277,0.420376
2,0.3229,0.648758,0.790101,0.611686,0.557331,0.565436
3,0.1784,0.647008,0.797434,0.668094,0.611929,0.623896
4,0.1273,0.650836,0.7956,0.755216,0.632401,0.670299
5,0.1058,0.651806,0.796517,0.735396,0.663194,0.683107
6,0.0941,0.644862,0.799267,0.748077,0.664531,0.691824
7,0.0865,0.652296,0.796517,0.810627,0.707454,0.739688
8,0.0827,0.65072,0.8011,0.812079,0.725579,0.750333
9,0.0792,0.65865,0.799267,0.802001,0.720137,0.745598
10,0.0761,0.668978,0.797434,0.814499,0.717861,0.746896


[I 2025-03-27 02:33:54,468] Trial 91 finished with value: 0.7531369567121389 and parameters: {'learning_rate': 0.00019999234429283712, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 51 with value: 0.7621312725763352.


Trial 92 with params: {'learning_rate': 0.0004922578519032032, 'weight_decay': 0.008, 'warmup_steps': 6, 'lambda_param': 1.0, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6995,0.658343,0.790101,0.583285,0.53598,0.541213
2,0.1559,0.66762,0.793767,0.735033,0.661338,0.677852
3,0.1021,0.665423,0.794684,0.77259,0.715973,0.728441
4,0.0859,0.687791,0.789184,0.795279,0.731258,0.747571
5,0.0781,0.72014,0.781852,0.805389,0.730091,0.746613
6,0.0739,0.690937,0.791017,0.807294,0.737379,0.756175
7,0.071,0.693691,0.788268,0.810638,0.731036,0.753822
8,0.069,0.689324,0.792851,0.799652,0.736334,0.752324
9,0.0677,0.720453,0.787351,0.799598,0.732047,0.74634
10,0.0666,0.717346,0.787351,0.814258,0.737241,0.75972


[I 2025-03-27 02:38:23,729] Trial 92 finished with value: 0.7644517643387146 and parameters: {'learning_rate': 0.0004922578519032032, 'weight_decay': 0.008, 'warmup_steps': 6, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 93 with params: {'learning_rate': 0.0004920237513932797, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7005,0.662033,0.785518,0.599624,0.548762,0.552814
2,0.1562,0.667579,0.790101,0.720201,0.658583,0.672911
3,0.1019,0.66753,0.786434,0.78235,0.726772,0.738842
4,0.0861,0.705966,0.788268,0.815567,0.722792,0.750012
5,0.0785,0.689834,0.785518,0.815164,0.708252,0.741718
6,0.0746,0.694127,0.7956,0.842628,0.733455,0.767516
7,0.071,0.712043,0.780018,0.803384,0.714663,0.740876
8,0.0698,0.712244,0.784601,0.818252,0.716712,0.746762
9,0.0678,0.709023,0.780018,0.774604,0.730246,0.737404
10,0.0665,0.710949,0.780935,0.812277,0.728898,0.75323


[I 2025-03-27 02:42:53,107] Trial 93 finished with value: 0.7607969747995779 and parameters: {'learning_rate': 0.0004920237513932797, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 4.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 94 with params: {'learning_rate': 0.00037109539198303706, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.792,0.667562,0.787351,0.558389,0.518438,0.520431
2,0.1902,0.646888,0.794684,0.722155,0.639136,0.657682
3,0.1159,0.651468,0.7956,0.707214,0.654353,0.6656
4,0.0926,0.656821,0.799267,0.785617,0.690785,0.717733
5,0.0824,0.665996,0.791934,0.809557,0.698635,0.730422
6,0.0773,0.700938,0.786434,0.803062,0.702839,0.734281
7,0.0739,0.695373,0.789184,0.787109,0.71221,0.731342
8,0.0716,0.67908,0.787351,0.811927,0.721549,0.746949
9,0.0697,0.688091,0.791017,0.803416,0.720373,0.742684
10,0.0685,0.685219,0.791017,0.819265,0.711946,0.744709


[I 2025-03-27 02:45:51,533] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.000279973895831848, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9146,0.70524,0.770852,0.474713,0.460313,0.447762
2,0.2392,0.643209,0.794684,0.647926,0.588141,0.60305
3,0.1367,0.652464,0.791017,0.726015,0.645318,0.667316
4,0.1044,0.671544,0.797434,0.77507,0.650957,0.687722
5,0.0908,0.65444,0.800183,0.814278,0.709206,0.74161
6,0.0829,0.654603,0.802016,0.823127,0.72139,0.755152
7,0.0779,0.670092,0.797434,0.813542,0.718357,0.746436
8,0.0753,0.661426,0.800183,0.818445,0.714559,0.745061
9,0.0729,0.672944,0.79835,0.817703,0.724277,0.750339
10,0.071,0.674359,0.802016,0.825338,0.720849,0.751335


[I 2025-03-27 02:50:20,880] Trial 95 finished with value: 0.750708681805861 and parameters: {'learning_rate': 0.000279973895831848, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 92 with value: 0.7644517643387146.


Trial 96 with params: {'learning_rate': 1.0675005523304308e-05, 'weight_decay': 0.0, 'warmup_steps': 43, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2636,2.083632,0.346471,0.070881,0.072089,0.050887
2,1.9443,1.822462,0.457379,0.107089,0.121229,0.103648
3,1.7062,1.626155,0.501375,0.13524,0.147241,0.123733
4,1.5232,1.478107,0.543538,0.224975,0.172036,0.154484
5,1.3774,1.363869,0.587534,0.255111,0.210689,0.195545


[I 2025-03-27 02:51:55,559] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0004532253306997267, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15, 'lambda_param': 0.9, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7392,0.663945,0.774519,0.51516,0.511017,0.504953
2,0.1652,0.651811,0.79835,0.687312,0.644018,0.653908
3,0.1057,0.665063,0.791934,0.763309,0.697278,0.713898
4,0.0879,0.661039,0.793767,0.815281,0.720041,0.749705
5,0.0792,0.675285,0.800183,0.816164,0.733915,0.758856
6,0.0752,0.676748,0.793767,0.792607,0.715563,0.737421
7,0.072,0.697914,0.790101,0.811263,0.723288,0.750664
8,0.0696,0.683697,0.792851,0.792964,0.726549,0.743768
9,0.0678,0.670961,0.8011,0.798213,0.730764,0.749686
10,0.0669,0.685611,0.788268,0.794821,0.712302,0.737038


[I 2025-03-27 02:54:53,516] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.00035248979435431767, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8093,0.680405,0.787351,0.520226,0.510901,0.501498
2,0.1976,0.64628,0.7956,0.701316,0.629518,0.647402
3,0.1199,0.658645,0.7956,0.754513,0.676118,0.700567
4,0.0946,0.651384,0.802016,0.80174,0.7031,0.73074
5,0.0839,0.665245,0.790101,0.819994,0.714983,0.745652
6,0.0783,0.674528,0.797434,0.816913,0.716203,0.748292
7,0.0745,0.681591,0.794684,0.821829,0.726697,0.7551
8,0.072,0.675378,0.794684,0.815177,0.725815,0.753486
9,0.0701,0.672871,0.797434,0.815052,0.723309,0.749779
10,0.0685,0.681863,0.792851,0.806532,0.718075,0.74288


[I 2025-03-27 02:59:24,582] Trial 98 finished with value: 0.7383546195551596 and parameters: {'learning_rate': 0.00035248979435431767, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 99 with params: {'learning_rate': 1.6023858648203628e-05, 'weight_decay': 0.01, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1805,1.940521,0.407883,0.096859,0.092319,0.075278
2,1.7493,1.603128,0.508708,0.181015,0.152079,0.130582
3,1.4513,1.379294,0.582951,0.245538,0.209732,0.195184
4,1.2406,1.219545,0.647113,0.279377,0.256934,0.244727
5,1.082,1.102283,0.679193,0.258792,0.281212,0.258976


[I 2025-03-27 03:00:54,947] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.00013391379320794176, 'weight_decay': 0.005, 'warmup_steps': 18, 'lambda_param': 1.0, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2762,0.83331,0.730522,0.35717,0.357751,0.338465
2,0.4704,0.684407,0.772686,0.499149,0.480887,0.477827
3,0.2713,0.644058,0.788268,0.575245,0.538586,0.540886
4,0.185,0.647863,0.791934,0.67129,0.587206,0.608287
5,0.1444,0.653186,0.79835,0.715542,0.633561,0.655476
6,0.1226,0.6422,0.799267,0.775596,0.661153,0.695629
7,0.1087,0.640945,0.7956,0.770956,0.668869,0.699255
8,0.1002,0.645878,0.800183,0.769258,0.675705,0.704759
9,0.0942,0.645437,0.799267,0.777974,0.681189,0.709279
10,0.0893,0.659903,0.791934,0.773353,0.681944,0.709129


[I 2025-03-27 03:03:54,406] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.00028131770461181036, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9077,0.701872,0.777269,0.476645,0.468678,0.458016
2,0.2387,0.647965,0.791017,0.636002,0.587675,0.597456
3,0.1366,0.655061,0.793767,0.714658,0.645863,0.664491
4,0.1039,0.669887,0.799267,0.778535,0.675789,0.707233
5,0.0905,0.654941,0.794684,0.794282,0.693813,0.722851
6,0.0826,0.651467,0.800183,0.821579,0.721401,0.753611
7,0.078,0.6701,0.79835,0.812768,0.730779,0.756538
8,0.075,0.663644,0.79835,0.81007,0.726001,0.749858
9,0.0728,0.67316,0.7956,0.818078,0.721577,0.749117
10,0.0708,0.679983,0.7956,0.823116,0.723893,0.752881


[I 2025-03-27 03:08:21,323] Trial 101 finished with value: 0.7489120683422849 and parameters: {'learning_rate': 0.00028131770461181036, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 102 with params: {'learning_rate': 0.00047300971399577313, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.697,0.675313,0.780935,0.586446,0.533091,0.541796
2,0.1589,0.6706,0.787351,0.695413,0.646768,0.655051
3,0.1029,0.690371,0.794684,0.763506,0.728174,0.733515
4,0.0861,0.700948,0.791017,0.810593,0.715549,0.745403
5,0.078,0.67447,0.800183,0.805311,0.730469,0.752833
6,0.0746,0.706832,0.791934,0.819981,0.723897,0.754216
7,0.072,0.711244,0.787351,0.81788,0.714338,0.743893
8,0.0695,0.699236,0.7956,0.803415,0.73425,0.752108
9,0.068,0.734618,0.785518,0.816174,0.710749,0.743087
10,0.0663,0.706773,0.791017,0.802239,0.729899,0.752626


[I 2025-03-27 03:12:49,952] Trial 102 finished with value: 0.7626002475427971 and parameters: {'learning_rate': 0.00047300971399577313, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 3.5}. Best is trial 92 with value: 0.7644517643387146.


Trial 103 with params: {'learning_rate': 0.0004072826259549412, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7592,0.675556,0.786434,0.541831,0.514618,0.5135
2,0.1777,0.660413,0.796517,0.720068,0.659074,0.671099
3,0.1103,0.685885,0.791017,0.788918,0.697071,0.724164
4,0.0898,0.666319,0.796517,0.811217,0.707836,0.739227
5,0.0802,0.668106,0.796517,0.804437,0.712438,0.741053
6,0.0764,0.698634,0.785518,0.794114,0.711209,0.734905
7,0.0727,0.688051,0.793767,0.805124,0.717026,0.740863
8,0.0705,0.680274,0.800183,0.827457,0.724732,0.755426
9,0.069,0.713561,0.782768,0.810163,0.706916,0.737344
10,0.0673,0.682555,0.793767,0.806858,0.725604,0.748414


[I 2025-03-27 03:17:25,319] Trial 103 finished with value: 0.7500379053298729 and parameters: {'learning_rate': 0.0004072826259549412, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 92 with value: 0.7644517643387146.


Trial 104 with params: {'learning_rate': 0.00047794578608353563, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.9, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7127,0.659104,0.785518,0.544473,0.517515,0.517338
2,0.1603,0.661395,0.799267,0.738763,0.659988,0.681914
3,0.1042,0.664964,0.791934,0.767772,0.702551,0.716345
4,0.0868,0.672805,0.796517,0.811103,0.717716,0.745976
5,0.0795,0.685794,0.796517,0.824621,0.722759,0.756674
6,0.0742,0.70525,0.786434,0.813224,0.712014,0.74602
7,0.0719,0.720934,0.783685,0.813498,0.707721,0.738443
8,0.0693,0.686725,0.786434,0.800015,0.712388,0.735142
9,0.068,0.709202,0.785518,0.804077,0.717341,0.745278
10,0.0666,0.709706,0.786434,0.814094,0.722854,0.750388


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-27 03:22:06,836] Trial 104 finished with value: 0.7491473951940438 and parameters: {'learning_rate': 0.00047794578608353563, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.9, 'temperature': 3.5}. Best is trial 92 with value: 0.7644517643387146.


Trial 105 with params: {'learning_rate': 2.1347915500916424e-05, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.1, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1019,1.809544,0.465628,0.108182,0.125732,0.106419
2,1.5812,1.43068,0.567369,0.220411,0.193408,0.179069
3,1.2511,1.195626,0.661778,0.270725,0.27012,0.252915
4,1.0316,1.042671,0.686526,0.26593,0.28729,0.264418
5,0.8777,0.94425,0.704858,0.294217,0.308625,0.28402


[I 2025-03-27 03:23:36,064] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 2.777716320805797e-05, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0134,1.669433,0.496792,0.103141,0.140472,0.113931
2,1.4094,1.264322,0.622365,0.277905,0.241054,0.229056
3,1.0623,1.037101,0.695692,0.309059,0.295618,0.273493
4,0.8486,0.911504,0.710357,0.307167,0.318445,0.291437
5,0.7093,0.838673,0.726856,0.324861,0.34033,0.316068
6,0.6182,0.793373,0.732356,0.397489,0.371021,0.357881
7,0.5491,0.762523,0.746104,0.410745,0.397336,0.385501
8,0.5025,0.739663,0.753437,0.443439,0.414118,0.406799
9,0.466,0.728072,0.75527,0.462219,0.429378,0.422281
10,0.436,0.717743,0.759853,0.473863,0.445457,0.440432


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-27 03:26:39,590] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0002529175340539921, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9404,0.705218,0.765353,0.458219,0.448645,0.434894
2,0.2623,0.646172,0.7956,0.641919,0.592038,0.602819
3,0.1477,0.650747,0.797434,0.718903,0.646869,0.667192
4,0.1098,0.651221,0.8011,0.762867,0.636538,0.675641
5,0.0948,0.655592,0.79835,0.815302,0.712452,0.746545
6,0.0858,0.649882,0.805683,0.813681,0.725821,0.754529
7,0.08,0.66324,0.7956,0.804472,0.711116,0.738078
8,0.0767,0.653073,0.796517,0.812292,0.712362,0.742754
9,0.0742,0.670246,0.792851,0.808214,0.725683,0.747779
10,0.0717,0.670397,0.79835,0.829216,0.722653,0.75671


[I 2025-03-27 03:31:11,247] Trial 107 finished with value: 0.7515722024777713 and parameters: {'learning_rate': 0.0002529175340539921, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 3.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 108 with params: {'learning_rate': 0.00045425283942045593, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7203,0.673605,0.782768,0.546894,0.514024,0.516486
2,0.1651,0.658323,0.799267,0.721064,0.659058,0.67301
3,0.1061,0.697693,0.786434,0.757993,0.682172,0.70186
4,0.0874,0.707223,0.780935,0.810564,0.709946,0.737792
5,0.0798,0.698317,0.7956,0.814059,0.732929,0.75752
6,0.0749,0.687327,0.797434,0.795844,0.732958,0.749767
7,0.0717,0.704606,0.791017,0.803937,0.727373,0.747887
8,0.0699,0.704511,0.792851,0.804075,0.715755,0.740548
9,0.0687,0.708355,0.787351,0.794752,0.723165,0.742784
10,0.0668,0.703828,0.791017,0.780823,0.723977,0.738659


[I 2025-03-27 03:34:12,384] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.00019347904213013904, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0601,0.739061,0.758937,0.431255,0.415326,0.405136
2,0.3362,0.635913,0.792851,0.580635,0.536655,0.542391
3,0.1851,0.63936,0.797434,0.684583,0.612151,0.630187
4,0.1315,0.654138,0.794684,0.739771,0.628138,0.662272
5,0.109,0.656346,0.787351,0.727975,0.64884,0.671194
6,0.0964,0.64339,0.799267,0.795299,0.685595,0.721891
7,0.088,0.642407,0.802016,0.823913,0.715191,0.74873
8,0.0838,0.648555,0.802016,0.807297,0.723678,0.744351
9,0.0798,0.648788,0.802016,0.808215,0.724651,0.750496
10,0.0769,0.669258,0.792851,0.814066,0.703624,0.737772


[I 2025-03-27 03:37:10,848] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0004591624044512961, 'weight_decay': 0.006, 'warmup_steps': 21, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7455,0.681768,0.772686,0.551307,0.518322,0.518773
2,0.1642,0.679103,0.785518,0.711194,0.645417,0.660991
3,0.1056,0.675434,0.786434,0.762388,0.693663,0.7103
4,0.0881,0.697918,0.788268,0.809267,0.709374,0.737568
5,0.0795,0.697899,0.792851,0.819864,0.710205,0.743386
6,0.075,0.697197,0.787351,0.822799,0.712722,0.747094
7,0.0717,0.702821,0.785518,0.801608,0.711261,0.735397
8,0.0702,0.720687,0.782768,0.798488,0.709392,0.732759
9,0.0683,0.718364,0.784601,0.80083,0.719598,0.740793
10,0.067,0.715802,0.786434,0.808785,0.71347,0.741726


[I 2025-03-27 03:41:40,692] Trial 110 finished with value: 0.743195261913703 and parameters: {'learning_rate': 0.0004591624044512961, 'weight_decay': 0.006, 'warmup_steps': 21, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 92 with value: 0.7644517643387146.


Trial 111 with params: {'learning_rate': 0.0004114403307727495, 'weight_decay': 0.003, 'warmup_steps': 26, 'lambda_param': 0.2, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7893,0.679818,0.783685,0.552493,0.510032,0.50698
2,0.1754,0.671367,0.789184,0.690217,0.650106,0.653847
3,0.1099,0.67275,0.796517,0.781341,0.707284,0.7232
4,0.09,0.669341,0.794684,0.804766,0.716328,0.738928
5,0.0814,0.691249,0.791934,0.771188,0.706317,0.722302
6,0.0763,0.692472,0.7956,0.810056,0.716052,0.744429
7,0.0729,0.705797,0.790101,0.812252,0.714444,0.743669
8,0.0709,0.67899,0.794684,0.802218,0.721164,0.744428
9,0.0691,0.693706,0.793767,0.805326,0.712075,0.741226
10,0.0674,0.683713,0.791017,0.800987,0.710811,0.73655


[I 2025-03-27 03:44:38,851] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.00020395568647362994, 'weight_decay': 0.01, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.04,0.732235,0.759853,0.432332,0.432241,0.419967
2,0.3171,0.647764,0.788268,0.609537,0.558371,0.566077
3,0.1753,0.646743,0.796517,0.669664,0.611225,0.624066
4,0.1257,0.650222,0.79835,0.755162,0.634608,0.672013
5,0.1046,0.651739,0.799267,0.744586,0.673796,0.692071
6,0.0932,0.643377,0.796517,0.744919,0.664405,0.691002
7,0.0859,0.650249,0.79835,0.82416,0.72427,0.757456
8,0.0821,0.652998,0.79835,0.810569,0.714635,0.74243
9,0.0786,0.661706,0.800183,0.801408,0.72659,0.751106
10,0.0757,0.672429,0.797434,0.808925,0.724424,0.750941


[I 2025-03-27 03:49:05,946] Trial 112 finished with value: 0.7522814226576898 and parameters: {'learning_rate': 0.00020395568647362994, 'weight_decay': 0.01, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 113 with params: {'learning_rate': 0.0004258741317669566, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7434,0.671482,0.789184,0.566524,0.531261,0.5332
2,0.1724,0.652242,0.791017,0.704791,0.63113,0.65107
3,0.1079,0.673231,0.789184,0.737746,0.693584,0.699146
4,0.0889,0.669778,0.791934,0.787111,0.693326,0.719975
5,0.0801,0.662871,0.79835,0.792242,0.696505,0.723549
6,0.0758,0.676868,0.793767,0.79967,0.715924,0.741858
7,0.0723,0.6764,0.794684,0.813906,0.718882,0.748453
8,0.0704,0.704492,0.793767,0.808017,0.732368,0.75452
9,0.0688,0.705045,0.787351,0.791011,0.722985,0.742326
10,0.067,0.704106,0.790101,0.795334,0.717326,0.741399


[I 2025-03-27 03:52:06,891] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.00043145474647907273, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7366,0.68447,0.777269,0.558541,0.521175,0.522087
2,0.1704,0.654689,0.789184,0.690564,0.641995,0.651389
3,0.1079,0.674628,0.784601,0.742671,0.68581,0.697748
4,0.0886,0.672651,0.791017,0.783465,0.6945,0.719701
5,0.0807,0.6636,0.802016,0.806117,0.73897,0.758848
6,0.0757,0.670853,0.802933,0.824293,0.733697,0.763393
7,0.0723,0.675679,0.7956,0.823495,0.729407,0.758385
8,0.0705,0.685382,0.793767,0.808962,0.730629,0.753051
9,0.0688,0.689623,0.787351,0.800046,0.72142,0.745039
10,0.0677,0.697918,0.792851,0.813345,0.73008,0.757422


[I 2025-03-27 03:56:36,049] Trial 114 finished with value: 0.7587922225683519 and parameters: {'learning_rate': 0.00043145474647907273, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 115 with params: {'learning_rate': 0.00041717431487517826, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.748,0.680802,0.780935,0.562245,0.525002,0.526096
2,0.1743,0.648615,0.796517,0.748541,0.651939,0.678248
3,0.1094,0.662473,0.796517,0.751522,0.68104,0.699764
4,0.0897,0.656836,0.80385,0.806038,0.717202,0.74062
5,0.0807,0.680267,0.790101,0.815367,0.710831,0.742188
6,0.0763,0.662255,0.800183,0.812863,0.733369,0.755104
7,0.0729,0.68179,0.79835,0.807061,0.719866,0.744921
8,0.0708,0.669947,0.802016,0.809407,0.734639,0.755127
9,0.0695,0.688775,0.792851,0.790155,0.720105,0.73885
10,0.0677,0.693649,0.793767,0.803197,0.728843,0.749596


[I 2025-03-27 04:01:07,309] Trial 115 finished with value: 0.7553667764672197 and parameters: {'learning_rate': 0.00041717431487517826, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 116 with params: {'learning_rate': 0.0004111487149533104, 'weight_decay': 0.009000000000000001, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.759,0.6672,0.778185,0.518899,0.504773,0.497767
2,0.1765,0.661695,0.791934,0.682571,0.643498,0.647351
3,0.1097,0.673141,0.791017,0.740021,0.682974,0.697775
4,0.0892,0.682676,0.792851,0.813492,0.715567,0.743613
5,0.0813,0.681412,0.794684,0.816105,0.721898,0.748842
6,0.0761,0.698729,0.789184,0.812773,0.70275,0.735282
7,0.0731,0.70084,0.787351,0.791803,0.713206,0.735138
8,0.0701,0.694712,0.791934,0.801384,0.722495,0.745219
9,0.0691,0.702429,0.786434,0.79403,0.715079,0.73637
10,0.0677,0.707621,0.790101,0.818433,0.71384,0.744589


[I 2025-03-27 04:05:32,947] Trial 116 finished with value: 0.7461073696526364 and parameters: {'learning_rate': 0.0004111487149533104, 'weight_decay': 0.009000000000000001, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 117 with params: {'learning_rate': 0.0003207579028760763, 'weight_decay': 0.003, 'warmup_steps': 27, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8817,0.691903,0.771769,0.491335,0.485925,0.480069
2,0.2142,0.639643,0.796517,0.704463,0.615511,0.638649
3,0.1253,0.662832,0.786434,0.706077,0.642662,0.65874
4,0.0979,0.673018,0.7956,0.79371,0.681142,0.715735
5,0.0863,0.660574,0.802933,0.804418,0.706977,0.732762
6,0.0797,0.673011,0.7956,0.807328,0.708283,0.737485
7,0.0751,0.677221,0.7956,0.801466,0.70228,0.731081
8,0.073,0.668368,0.804766,0.834718,0.720993,0.755796
9,0.071,0.681766,0.7956,0.816692,0.718527,0.74736
10,0.069,0.691467,0.789184,0.809846,0.718698,0.744185


[I 2025-03-27 04:08:29,272] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.00037705707877799745, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7835,0.677224,0.785518,0.548018,0.510495,0.508539
2,0.1871,0.661151,0.791934,0.684582,0.63195,0.640196
3,0.1147,0.663983,0.797434,0.747773,0.676815,0.69423
4,0.0921,0.653974,0.800183,0.836186,0.724105,0.756118
5,0.0827,0.676976,0.792851,0.828318,0.715378,0.75093
6,0.0775,0.676428,0.793767,0.801341,0.720745,0.745578
7,0.0737,0.668514,0.7956,0.832725,0.721797,0.758978
8,0.0714,0.681145,0.794684,0.819886,0.724645,0.754826
9,0.0697,0.691289,0.794684,0.808354,0.715111,0.74315
10,0.0679,0.695829,0.791934,0.821753,0.716129,0.749995


[I 2025-03-27 04:13:01,078] Trial 118 finished with value: 0.7416565460370578 and parameters: {'learning_rate': 0.00037705707877799745, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 119 with params: {'learning_rate': 0.0004296747924955775, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7384,0.66839,0.785518,0.542391,0.516567,0.512668
2,0.1709,0.655871,0.791934,0.701491,0.648258,0.66105
3,0.1084,0.671512,0.786434,0.76962,0.700111,0.71862
4,0.089,0.661373,0.796517,0.821849,0.730123,0.755167
5,0.08,0.673154,0.792851,0.81375,0.723245,0.75033
6,0.0761,0.695991,0.786434,0.793539,0.725908,0.743549
7,0.0717,0.711415,0.783685,0.797408,0.711592,0.736713
8,0.0703,0.711967,0.783685,0.793272,0.706423,0.728378
9,0.0689,0.721292,0.780935,0.78816,0.722658,0.739051
10,0.0671,0.716624,0.782768,0.801085,0.724542,0.747595


[I 2025-03-27 04:17:32,959] Trial 119 finished with value: 0.7605065094388407 and parameters: {'learning_rate': 0.0004296747924955775, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 120 with params: {'learning_rate': 0.0003791496115871342, 'weight_decay': 0.008, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 2.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7902,0.681579,0.776352,0.498762,0.497699,0.491648
2,0.1871,0.660016,0.791017,0.685598,0.623844,0.638952
3,0.1147,0.66064,0.794684,0.764969,0.676148,0.703912
4,0.0926,0.669551,0.794684,0.801695,0.699449,0.729908
5,0.0823,0.672827,0.794684,0.820848,0.715665,0.746883
6,0.0773,0.665827,0.794684,0.776525,0.709398,0.728636
7,0.0735,0.680638,0.793767,0.808899,0.725819,0.751705
8,0.0712,0.669996,0.797434,0.808751,0.722641,0.748659
9,0.0699,0.679226,0.792851,0.799509,0.717158,0.74413
10,0.068,0.675415,0.7956,0.825604,0.717615,0.753392


[I 2025-03-27 04:22:05,331] Trial 120 finished with value: 0.7532031380207524 and parameters: {'learning_rate': 0.0003791496115871342, 'weight_decay': 0.008, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 92 with value: 0.7644517643387146.


Trial 121 with params: {'learning_rate': 1.5745418122329243e-05, 'weight_decay': 0.003, 'warmup_steps': 34, 'lambda_param': 1.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1868,1.949074,0.404216,0.097918,0.090901,0.073736
2,1.7598,1.614061,0.505958,0.140948,0.149763,0.126652
3,1.4637,1.390694,0.582035,0.231379,0.207771,0.192557
4,1.2535,1.23101,0.641613,0.279433,0.253513,0.242128
5,1.0949,1.113172,0.678277,0.259135,0.279413,0.257813
6,0.9787,1.028422,0.688359,0.266274,0.291166,0.265212
7,0.8877,0.966632,0.705775,0.293907,0.307582,0.283584
8,0.8221,0.923631,0.711274,0.323652,0.317633,0.296265
9,0.7703,0.890247,0.716774,0.318573,0.324863,0.301427
10,0.7283,0.867275,0.721357,0.320998,0.331751,0.307245


[I 2025-03-27 04:25:03,676] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0004393550916036238, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.727,0.67796,0.781852,0.560425,0.520101,0.520686
2,0.169,0.687645,0.783685,0.706087,0.631077,0.649685
3,0.1081,0.677044,0.791017,0.768365,0.69923,0.718437
4,0.0888,0.676439,0.79835,0.796255,0.711658,0.737741
5,0.0803,0.680225,0.793767,0.813172,0.730543,0.75456
6,0.076,0.697251,0.790101,0.824657,0.71961,0.751881
7,0.0722,0.701776,0.7956,0.819303,0.72684,0.755051
8,0.0702,0.684328,0.7956,0.802175,0.72932,0.749422
9,0.069,0.718751,0.782768,0.804465,0.716987,0.744278
10,0.0675,0.703505,0.788268,0.814109,0.725186,0.752234


[I 2025-03-27 04:29:32,207] Trial 122 finished with value: 0.755600047104299 and parameters: {'learning_rate': 0.0004393550916036238, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 123 with params: {'learning_rate': 0.0004503110805633973, 'weight_decay': 0.006, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7273,0.663102,0.783685,0.562538,0.520874,0.526769
2,0.1656,0.674814,0.791017,0.719635,0.654579,0.670846
3,0.1054,0.679028,0.791934,0.753325,0.699723,0.707248
4,0.0876,0.683767,0.789184,0.803525,0.714763,0.737549
5,0.0798,0.678465,0.794684,0.808615,0.715293,0.741866
6,0.0753,0.694025,0.784601,0.799116,0.701973,0.729333
7,0.0715,0.692257,0.785518,0.801563,0.708042,0.730355
8,0.0698,0.693761,0.786434,0.803686,0.716073,0.738716
9,0.0682,0.693843,0.791017,0.79063,0.717278,0.730862
10,0.067,0.683059,0.793767,0.804521,0.735665,0.751482


[I 2025-03-27 04:34:00,861] Trial 123 finished with value: 0.7491344130184907 and parameters: {'learning_rate': 0.0004503110805633973, 'weight_decay': 0.006, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 92 with value: 0.7644517643387146.


Trial 124 with params: {'learning_rate': 0.0002248717943367038, 'weight_decay': 0.005, 'warmup_steps': 9, 'lambda_param': 0.8, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0071,0.72032,0.761687,0.434543,0.438213,0.4241
2,0.2902,0.648187,0.797434,0.614156,0.573735,0.580161
3,0.1609,0.651052,0.794684,0.691486,0.626092,0.641324
4,0.1181,0.657454,0.797434,0.741788,0.638465,0.670724
5,0.0997,0.657092,0.794684,0.731447,0.668,0.685298


[I 2025-03-27 04:35:29,604] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.00040325095799879554, 'weight_decay': 0.007, 'warmup_steps': 6, 'lambda_param': 1.0, 'temperature': 2.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7649,0.667342,0.788268,0.543345,0.517809,0.51385
2,0.1788,0.662636,0.791017,0.699753,0.639028,0.653152
3,0.1112,0.656867,0.794684,0.748706,0.684661,0.699431
4,0.0901,0.663611,0.804766,0.836,0.716074,0.751909
5,0.0817,0.660645,0.792851,0.804511,0.71804,0.740928
6,0.0762,0.696404,0.789184,0.786904,0.712869,0.734168
7,0.0725,0.689136,0.797434,0.820938,0.725576,0.757191
8,0.0706,0.678495,0.792851,0.816264,0.723824,0.752001
9,0.0692,0.680268,0.797434,0.821121,0.726014,0.756363
10,0.0677,0.68807,0.791017,0.806056,0.720186,0.744688


[I 2025-03-27 04:39:55,764] Trial 125 finished with value: 0.7543191786921076 and parameters: {'learning_rate': 0.00040325095799879554, 'weight_decay': 0.007, 'warmup_steps': 6, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 126 with params: {'learning_rate': 0.00011704977597501867, 'weight_decay': 0.0, 'warmup_steps': 42, 'lambda_param': 0.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3777,0.880741,0.72319,0.333972,0.334897,0.311371
2,0.5308,0.70958,0.76077,0.49375,0.447368,0.443655
3,0.3149,0.659586,0.784601,0.561641,0.521652,0.522781
4,0.2157,0.651912,0.788268,0.649728,0.561701,0.583045
5,0.1647,0.648827,0.7956,0.66818,0.616652,0.628096


[I 2025-03-27 04:41:24,566] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.000426459691039338, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.741,0.671336,0.784601,0.568103,0.52605,0.527828
2,0.1716,0.662013,0.789184,0.703759,0.634753,0.646019
3,0.1088,0.675304,0.788268,0.766074,0.704425,0.719918
4,0.089,0.674606,0.787351,0.791123,0.694793,0.719899
5,0.0806,0.685527,0.791017,0.801405,0.716503,0.740175
6,0.0754,0.697275,0.790101,0.812546,0.722706,0.748034
7,0.0724,0.687244,0.791934,0.812308,0.7184,0.746437
8,0.0706,0.698476,0.790101,0.799873,0.726542,0.747166
9,0.0688,0.700189,0.790101,0.812574,0.71481,0.743534
10,0.0672,0.689363,0.7956,0.813451,0.721686,0.749137


[I 2025-03-27 04:45:58,684] Trial 127 finished with value: 0.7620678718863323 and parameters: {'learning_rate': 0.000426459691039338, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 6.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 128 with params: {'learning_rate': 0.0003385556098094082, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8248,0.679432,0.785518,0.509783,0.505475,0.496461
2,0.2034,0.637909,0.802016,0.718314,0.62955,0.652312
3,0.1221,0.666616,0.793767,0.725449,0.65256,0.671844
4,0.0959,0.666399,0.8011,0.791701,0.689015,0.718747
5,0.0852,0.66947,0.7956,0.818535,0.690648,0.730787
6,0.0784,0.67676,0.794684,0.814253,0.723991,0.753061
7,0.0748,0.684404,0.792851,0.804476,0.715194,0.740987
8,0.0728,0.683049,0.799267,0.813109,0.719597,0.749776
9,0.0708,0.697787,0.789184,0.81187,0.713079,0.740002
10,0.069,0.684767,0.7956,0.814838,0.724695,0.753343


[I 2025-03-27 04:50:30,217] Trial 128 finished with value: 0.7291603920049852 and parameters: {'learning_rate': 0.0003385556098094082, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 129 with params: {'learning_rate': 6.728117982290995e-05, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 4.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6251,1.142155,0.667278,0.285865,0.273121,0.259587
2,0.8128,0.822709,0.729606,0.360197,0.356511,0.339318
3,0.5317,0.733211,0.76077,0.461016,0.43308,0.424296
4,0.3952,0.690064,0.768103,0.506347,0.480202,0.47722
5,0.3063,0.658163,0.784601,0.551302,0.509224,0.510291


[I 2025-03-27 04:51:58,298] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0003518988602880114, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8099,0.680767,0.787351,0.522899,0.510901,0.502926
2,0.198,0.646583,0.791934,0.698331,0.627064,0.644392
3,0.12,0.663717,0.797434,0.746779,0.681913,0.699394
4,0.0943,0.658774,0.7956,0.795248,0.707589,0.730727
5,0.0836,0.667331,0.79835,0.816999,0.697244,0.73437
6,0.0784,0.673673,0.793767,0.806691,0.711505,0.739551
7,0.0743,0.681169,0.790101,0.814015,0.708179,0.740952
8,0.0719,0.670998,0.796517,0.801686,0.721016,0.744057
9,0.0701,0.679965,0.791017,0.811251,0.713059,0.741952
10,0.0685,0.688622,0.790101,0.815359,0.713573,0.74467


[I 2025-03-27 04:54:57,989] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.00046898598700731013, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.8, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7335,0.674208,0.776352,0.582369,0.523778,0.530873
2,0.1626,0.690974,0.782768,0.672874,0.653694,0.65214
3,0.1043,0.671444,0.7956,0.771376,0.703098,0.721618
4,0.088,0.659777,0.8011,0.809313,0.71345,0.739552
5,0.0786,0.672951,0.7956,0.806743,0.732496,0.756438
6,0.0747,0.668317,0.79835,0.797003,0.739326,0.755832
7,0.0717,0.683294,0.796517,0.811728,0.713662,0.745585
8,0.069,0.687365,0.791934,0.808893,0.71405,0.743145
9,0.0677,0.676325,0.796517,0.807563,0.714765,0.739856
10,0.0664,0.664262,0.799267,0.792929,0.722719,0.742225


[I 2025-03-27 04:59:24,971] Trial 131 finished with value: 0.7561777696498254 and parameters: {'learning_rate': 0.00046898598700731013, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.8, 'temperature': 6.5}. Best is trial 92 with value: 0.7644517643387146.


Trial 132 with params: {'learning_rate': 0.00048071045176241575, 'weight_decay': 0.005, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.725,0.658171,0.773602,0.57881,0.517548,0.52431
2,0.161,0.675098,0.791017,0.695907,0.648214,0.660699
3,0.1039,0.657342,0.796517,0.774585,0.711926,0.729176
4,0.0869,0.67055,0.799267,0.831291,0.731574,0.762547
5,0.079,0.678695,0.792851,0.812058,0.713659,0.743517
6,0.0744,0.679737,0.8011,0.827336,0.729546,0.760813
7,0.0716,0.709448,0.785518,0.791097,0.718148,0.74002
8,0.0695,0.692248,0.788268,0.793808,0.712133,0.734342
9,0.0681,0.707029,0.790101,0.800933,0.717721,0.742891
10,0.0671,0.692441,0.783685,0.799525,0.708482,0.736386


[I 2025-03-27 05:02:22,877] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.00022248107699699603, 'weight_decay': 0.005, 'warmup_steps': 12, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0175,0.720631,0.762603,0.413796,0.432364,0.41388
2,0.2948,0.647829,0.793767,0.615273,0.574647,0.581192
3,0.1626,0.655747,0.793767,0.705405,0.637438,0.654983
4,0.1186,0.659527,0.792851,0.74818,0.633508,0.668823
5,0.1,0.666655,0.794684,0.780101,0.683709,0.713111
6,0.0902,0.660746,0.794684,0.793586,0.683453,0.715987
7,0.0832,0.669181,0.791017,0.808514,0.70248,0.733507
8,0.0798,0.668766,0.796517,0.816861,0.709174,0.739667
9,0.0768,0.667022,0.796517,0.807386,0.706108,0.735173
10,0.0741,0.68214,0.790101,0.811601,0.704679,0.736616


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-27 05:05:22,203] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0004669695806329286, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7008,0.683615,0.779102,0.576471,0.523896,0.530238
2,0.1614,0.691842,0.782768,0.693494,0.635771,0.646608
3,0.1035,0.68963,0.790101,0.783104,0.706771,0.728036
4,0.0864,0.697065,0.7956,0.816214,0.707273,0.7392
5,0.0787,0.70721,0.789184,0.804847,0.711126,0.739359
6,0.0742,0.695231,0.791017,0.811903,0.715632,0.743822
7,0.0722,0.724609,0.784601,0.799093,0.708109,0.732649
8,0.0694,0.709801,0.791934,0.811243,0.730325,0.752167
9,0.0681,0.706258,0.789184,0.804299,0.705696,0.733382
10,0.0664,0.719765,0.786434,0.812933,0.708745,0.739656


[I 2025-03-27 05:08:22,044] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0004991644632130081, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6964,0.648397,0.790101,0.582876,0.539857,0.540773
2,0.1552,0.674511,0.788268,0.749908,0.659002,0.680173
3,0.1012,0.660817,0.797434,0.79586,0.734181,0.750313
4,0.0863,0.699551,0.784601,0.786591,0.733907,0.744225
5,0.0784,0.691592,0.790101,0.795787,0.717779,0.737683
6,0.0742,0.706065,0.788268,0.823112,0.722519,0.752762
7,0.0712,0.710445,0.787351,0.800998,0.719146,0.745089
8,0.0692,0.718559,0.793767,0.815072,0.724271,0.749965
9,0.0679,0.69426,0.785518,0.815425,0.708567,0.742108
10,0.0665,0.713184,0.785518,0.787807,0.727137,0.743574


[I 2025-03-27 05:12:52,047] Trial 135 finished with value: 0.7446037996563569 and parameters: {'learning_rate': 0.0004991644632130081, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 136 with params: {'learning_rate': 0.00031300780411857334, 'weight_decay': 0.006, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8806,0.699391,0.774519,0.492259,0.487974,0.479773
2,0.217,0.655214,0.789184,0.691679,0.617213,0.634027
3,0.1266,0.663251,0.788268,0.736029,0.649815,0.673798
4,0.099,0.659755,0.802016,0.770718,0.681214,0.707508
5,0.0864,0.670775,0.791934,0.81961,0.719775,0.75071
6,0.0799,0.673739,0.796517,0.81741,0.719658,0.745145
7,0.076,0.680874,0.796517,0.802356,0.701786,0.73092
8,0.0732,0.684463,0.794684,0.808305,0.710508,0.739374
9,0.0711,0.694073,0.790101,0.815165,0.71302,0.742457
10,0.0695,0.695688,0.786434,0.811196,0.708819,0.740226


[I 2025-03-27 05:15:51,688] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0002762423005400957, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.5, 'temperature': 5.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9154,0.704378,0.772686,0.473267,0.458734,0.445362
2,0.2419,0.64666,0.793767,0.642803,0.58895,0.601796
3,0.1383,0.652854,0.791934,0.714447,0.646089,0.664067
4,0.1048,0.662442,0.802016,0.780575,0.675436,0.707949
5,0.0912,0.651604,0.796517,0.797777,0.697623,0.728779
6,0.0831,0.65733,0.805683,0.831649,0.724056,0.759078
7,0.0782,0.671288,0.794684,0.807785,0.727617,0.752317
8,0.0753,0.662457,0.802016,0.813801,0.734523,0.759165
9,0.0732,0.67565,0.79835,0.821147,0.723026,0.75202
10,0.0709,0.680197,0.797434,0.826883,0.717506,0.748723


[I 2025-03-27 05:20:21,548] Trial 137 finished with value: 0.7476729521364613 and parameters: {'learning_rate': 0.0002762423005400957, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 138 with params: {'learning_rate': 0.0002882384993712165, 'weight_decay': 0.006, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9022,0.703108,0.775435,0.476556,0.467644,0.456326
2,0.2336,0.641633,0.793767,0.640883,0.589988,0.603645
3,0.1343,0.65562,0.794684,0.733375,0.646926,0.671889
4,0.1027,0.670551,0.799267,0.758022,0.647676,0.679944
5,0.0902,0.656036,0.800183,0.818965,0.712631,0.744925
6,0.0821,0.649719,0.808433,0.830966,0.733511,0.766975
7,0.0774,0.670008,0.796517,0.805702,0.721392,0.745351
8,0.0745,0.665995,0.800183,0.807862,0.711622,0.740508
9,0.0725,0.674163,0.792851,0.812485,0.719602,0.745889
10,0.0704,0.672495,0.79835,0.819091,0.726306,0.753844


[I 2025-03-27 05:24:49,085] Trial 138 finished with value: 0.7470175429323399 and parameters: {'learning_rate': 0.0002882384993712165, 'weight_decay': 0.006, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 139 with params: {'learning_rate': 0.0002913900631482027, 'weight_decay': 0.007, 'warmup_steps': 22, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9132,0.705058,0.774519,0.501685,0.4775,0.472504
2,0.231,0.655439,0.793767,0.680165,0.607913,0.624706
3,0.1324,0.653322,0.793767,0.714733,0.644698,0.665059
4,0.102,0.66947,0.799267,0.779779,0.668163,0.702192
5,0.0892,0.668575,0.796517,0.797671,0.698017,0.730282
6,0.0817,0.665626,0.79835,0.823047,0.710341,0.747951
7,0.0769,0.677979,0.796517,0.818198,0.711352,0.745712
8,0.0742,0.67426,0.791934,0.798009,0.711801,0.739188
9,0.0724,0.673151,0.7956,0.831122,0.712316,0.74777
10,0.0701,0.689195,0.791017,0.808086,0.710147,0.741151


[I 2025-03-27 05:27:47,120] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0004959453867214615, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6824,0.667205,0.781852,0.55577,0.525047,0.529079
2,0.1532,0.677779,0.782768,0.714461,0.664088,0.671594
3,0.1017,0.678146,0.7956,0.776178,0.722293,0.733236
4,0.0859,0.72336,0.784601,0.799098,0.715436,0.735343
5,0.0782,0.724903,0.783685,0.787024,0.71266,0.731394
6,0.0736,0.711444,0.785518,0.800524,0.720964,0.742143
7,0.0711,0.739619,0.776352,0.800036,0.710489,0.733716
8,0.0692,0.737539,0.773602,0.781487,0.714453,0.733203
9,0.0674,0.73562,0.782768,0.789907,0.707691,0.733088
10,0.0659,0.737741,0.778185,0.782767,0.707393,0.729854


[I 2025-03-27 05:30:44,944] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0003962280767843101, 'weight_decay': 0.003, 'warmup_steps': 37, 'lambda_param': 1.0, 'temperature': 6.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8162,0.667307,0.776352,0.497187,0.496975,0.489224
2,0.1831,0.672703,0.782768,0.66987,0.607954,0.618113
3,0.1119,0.678892,0.791017,0.779835,0.701083,0.724147
4,0.0904,0.684224,0.7956,0.809086,0.717794,0.742822
5,0.0816,0.68829,0.791934,0.812775,0.704264,0.737623
6,0.0763,0.684266,0.79835,0.809389,0.720146,0.74729
7,0.0729,0.68888,0.791934,0.820727,0.713254,0.744013
8,0.0707,0.680816,0.802016,0.837108,0.721361,0.755512
9,0.0691,0.699783,0.789184,0.825966,0.72571,0.756962
10,0.0674,0.698533,0.790101,0.816434,0.715614,0.745607


[I 2025-03-27 05:35:12,651] Trial 141 finished with value: 0.7468542399219683 and parameters: {'learning_rate': 0.0003962280767843101, 'weight_decay': 0.003, 'warmup_steps': 37, 'lambda_param': 1.0, 'temperature': 6.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 142 with params: {'learning_rate': 0.00011048239252562487, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.9, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3669,0.892749,0.724106,0.33453,0.336109,0.311875
2,0.55,0.719244,0.76077,0.452993,0.438935,0.429191
3,0.3316,0.662811,0.781852,0.560752,0.514923,0.513659
4,0.2274,0.649654,0.787351,0.634467,0.55376,0.571118
5,0.1732,0.645672,0.796517,0.663803,0.615019,0.622625
6,0.1438,0.640806,0.799267,0.707607,0.627286,0.649707
7,0.1256,0.637575,0.8011,0.726938,0.642811,0.665121
8,0.1144,0.637373,0.79835,0.734934,0.652305,0.674192
9,0.1062,0.645186,0.7956,0.765465,0.670573,0.698236
10,0.0998,0.654116,0.7956,0.747779,0.675269,0.695417


[I 2025-03-27 05:38:08,941] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.00028608073093940457, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.9, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9074,0.705306,0.768103,0.470487,0.461237,0.449784
2,0.235,0.638689,0.797434,0.644549,0.591595,0.604152
3,0.1346,0.656597,0.791017,0.723944,0.645436,0.666543
4,0.1029,0.673788,0.793767,0.741409,0.643633,0.671856
5,0.0902,0.666345,0.797434,0.789101,0.702006,0.729235
6,0.0821,0.657252,0.802016,0.820265,0.71567,0.750886
7,0.0774,0.677089,0.796517,0.815399,0.723887,0.752633
8,0.0748,0.668327,0.800183,0.818075,0.715465,0.7451
9,0.0727,0.67396,0.79835,0.814962,0.723264,0.748321
10,0.0706,0.673747,0.79835,0.823678,0.72963,0.759044


[I 2025-03-27 05:42:36,377] Trial 143 finished with value: 0.7517614008331276 and parameters: {'learning_rate': 0.00028608073093940457, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.9, 'temperature': 7.0}. Best is trial 92 with value: 0.7644517643387146.


Trial 144 with params: {'learning_rate': 0.0004987574800305614, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 4.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6991,0.664259,0.779102,0.556716,0.526868,0.525338
2,0.1557,0.674133,0.790101,0.741276,0.646871,0.67556
3,0.1017,0.671532,0.797434,0.788122,0.713488,0.734971
4,0.0855,0.693324,0.791934,0.803398,0.707082,0.736602
5,0.0779,0.692474,0.7956,0.809943,0.724833,0.750528
6,0.0743,0.673189,0.804766,0.823282,0.726921,0.755698
7,0.0707,0.724129,0.788268,0.818236,0.711537,0.74654
8,0.0695,0.712846,0.790101,0.805077,0.717917,0.743753
9,0.0676,0.711119,0.794684,0.814756,0.724119,0.751746
10,0.0666,0.717041,0.792851,0.796309,0.713701,0.738529


[I 2025-03-27 05:45:36,204] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0004424758983506504, 'weight_decay': 0.007, 'warmup_steps': 34, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7751,0.674487,0.775435,0.539348,0.510688,0.506158
2,0.1671,0.680328,0.791934,0.722546,0.65261,0.669387
3,0.1064,0.685562,0.794684,0.780791,0.717435,0.732471
4,0.0888,0.674118,0.7956,0.814195,0.718953,0.74401
5,0.0798,0.688017,0.788268,0.807325,0.699463,0.72501
6,0.0756,0.688909,0.793767,0.810104,0.725177,0.749213
7,0.0722,0.689412,0.794684,0.803397,0.727763,0.748452
8,0.0702,0.695088,0.788268,0.801995,0.721927,0.740727
9,0.0687,0.692046,0.791017,0.799017,0.716339,0.737807
10,0.0668,0.714496,0.784601,0.799485,0.710738,0.73463


[I 2025-03-27 05:48:35,197] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.00019515656490385515, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0907,0.741017,0.759853,0.451866,0.422822,0.415471
2,0.3355,0.648257,0.793767,0.596087,0.546887,0.55277
3,0.1844,0.651432,0.7956,0.697528,0.62301,0.639114
4,0.131,0.657454,0.7956,0.728878,0.621228,0.652568
5,0.1083,0.666512,0.791017,0.754774,0.659526,0.686451
6,0.0957,0.648594,0.796517,0.77018,0.65816,0.692209
7,0.0873,0.661418,0.793767,0.805294,0.708067,0.736636
8,0.0834,0.655718,0.797434,0.802761,0.716344,0.741559
9,0.0797,0.663999,0.796517,0.806275,0.715717,0.740688
10,0.0766,0.674625,0.791934,0.801277,0.721926,0.7437


[I 2025-03-27 05:53:03,809] Trial 146 finished with value: 0.7532487916985212 and parameters: {'learning_rate': 0.00019515656490385515, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 92 with value: 0.7644517643387146.


Trial 147 with params: {'learning_rate': 8.5277304406668e-05, 'weight_decay': 0.01, 'warmup_steps': 5, 'lambda_param': 0.9, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.503,1.010337,0.693859,0.304206,0.297285,0.276174
2,0.6782,0.76064,0.746104,0.418433,0.405416,0.394008
3,0.4305,0.693484,0.773602,0.506555,0.481427,0.478444
4,0.3057,0.663284,0.787351,0.586799,0.520515,0.529181
5,0.2312,0.643818,0.796517,0.633259,0.560318,0.572337


[I 2025-03-27 05:54:32,107] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.00042273047075564943, 'weight_decay': 0.005, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7637,0.673012,0.780018,0.508232,0.501519,0.496941
2,0.173,0.670573,0.787351,0.710229,0.653977,0.665853
3,0.1085,0.65239,0.79835,0.734206,0.666877,0.683988
4,0.0901,0.699853,0.780018,0.781749,0.704871,0.723697
5,0.0811,0.675038,0.792851,0.784465,0.696817,0.722463


[I 2025-03-27 05:56:00,613] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.00012239133611176964, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 1.0, 'temperature': 7.0}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-2_H-128_A-2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3088,0.859274,0.726856,0.342461,0.343372,0.319868
2,0.5059,0.697547,0.767186,0.488707,0.457179,0.454539
3,0.2978,0.650554,0.791934,0.580677,0.538162,0.54094
4,0.2035,0.64656,0.788268,0.648662,0.562095,0.58295
5,0.1568,0.648908,0.797434,0.684762,0.628207,0.641011
6,0.1317,0.639798,0.796517,0.752715,0.643859,0.675339
7,0.1158,0.636609,0.799267,0.762955,0.676112,0.701476
8,0.1063,0.638701,0.800183,0.770711,0.677244,0.705169
9,0.0994,0.644222,0.794684,0.75982,0.66901,0.694522
10,0.0938,0.654706,0.793767,0.75266,0.673049,0.696033


[I 2025-03-27 05:58:58,291] Trial 149 pruned. 


In [36]:
print(best_trial_distill_aug)

BestRun(run_id='92', objective=0.7644517643387146, hyperparameters={'learning_rate': 0.0004922578519032032, 'weight_decay': 0.008, 'warmup_steps': 6, 'lambda_param': 1.0, 'temperature': 4.0}, run_summary=None)
