# Prohledávání hyperparametrů pro model BiLSTM nad datasetem SST2 

Tento notebook slouží k nalezení optimálních hyperparametrů nad datasetem SST2 pro model BiLSTM s odemčenou embedding vrstvou. Hyperparametry jsou hledány pro původní i augmentovaný dataset pro normální trénink i destilaci.

K prohledávání je využito knihovny Optuna s algoritmem Hyperband. Nejlepší konfigurace je volena na základě F1-skóre, zkoušeno je 150 kombinací hyperparametrů pro každou z variant.

## Import knihoven a základní nastavení

In [1]:
from transformers import BasicTokenizer, Trainer
from datasets import concatenate_datasets, load_from_disk
import kagglehub
import optuna
import torch
import math
import base

Resetování náhodného seedu pro replikovatelnost výsledků.

In [None]:
base.reset_seed()

Ověření dostupnosti GPU.

In [None]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Načtení embeddingů.

Načtení datasetu a jeho základní předzpracování (tokenizace, vytvoření slovníků všech tokenů, vytvoření indexu pro GloVe embeddingy).

In [4]:
my_glove = kagglehub.dataset_download("thanakomsn/glove6b300dtxt")
print(my_glove)

/home/jovyan/.cache/kagglehub/datasets/thanakomsn/glove6b300dtxt/versions/1


In [None]:
GLOVE_FILE = f"{my_glove}/glove.6B.300d.txt"
DATASET = "sst2"

In [6]:
train_data = load_from_disk(f"~/data/{DATASET}/train-logits")
eval_data = load_from_disk(f"~/data/{DATASET}/eval-logits")
test_data = load_from_disk(f"~/data/{DATASET}/test-logits")

all_train_data = load_from_disk(f"~/data/{DATASET}/train-logits-augmented")

all_data = concatenate_datasets([load_from_disk(file) for file in [f"~/data/{DATASET}/eval-logits", f"~/data/{DATASET}/test-logits", f"~/data/{DATASET}/train-logits-augmented"]])
tokenizer = BasicTokenizer(do_lower_case=True)

Tokenizace.

In [7]:
train_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), train_data))
eval_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), eval_data))
test_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), test_data))

all_train_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), all_train_data))

all_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), all_data))

Získání všech unikátních tokenů v datasetu.

In [8]:
vocab = base.get_vocab(all_data_tokens)

Přiřazení indexu jednotlivým tokenům.

In [None]:
word_index = dict(zip(vocab, range(len(vocab))))

Získání indexů z GloVe embeddingů.

In [10]:
embeddings_index = base.get_embeddings_indeces(GLOVE_FILE)

Found 400000 word vectors.


Definice velikosti slovníku a velikosti embedding dimenze. 

In [None]:
print(len(vocab))
num_tokens = len(vocab) + 2
embedding_dim = 300

14621


Vytvoření vazby mezi tokeny (jejich indexy) a embeddingy. Část tokenů nebyla nalezena, což ovšem nepředstavuje problém.

In [None]:
embedding_matrix = base.get_embedding_matrix(num_tokens, embedding_dim, word_index, embeddings_index)

Converted 14305 words (316) misses


Přiřazení indexu tokenům v každé části datasetu.

In [13]:
train_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),train_data_tokens))
eval_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),eval_data_tokens))
test_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),test_data_tokens))

all_train_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),all_train_data_tokens))

Zarovnání délky všech záznamů.

In [14]:
train_padded_data = list(map(lambda x: base.padd(x,60), train_data_index))
eval_padded_data = list(map(lambda x: base.padd(x,60), eval_data_index))
test_padded_data = list(map(lambda x: base.padd(x,60), test_data_index))

all_train_padded_data = list(map(lambda x: base.padd(x,60), all_train_data_index))

Přidání ID tokenů do každé části datasetu.

In [15]:
train_data = train_data.add_column("input_ids", train_padded_data)
eval_data = eval_data.add_column("input_ids", eval_padded_data)
test_data = test_data.add_column("input_ids", test_padded_data)

all_train_data = all_train_data.add_column("input_ids", all_train_padded_data)

Základní konfigurace tréninku během prohledávání. Optuna nepracuje s epochami, ale s kroky. Níže je prováděn přepočet. 

Minimální délka tréninku jsou tři epochy, maximální 10 epoch. Maximální počet kroků pro warm up je nastaven na 10 % první epochy.

In [None]:
num_epochs = 10
batch_size = 128

In [None]:
data_length = len(train_data)
min_r = math.ceil(data_length/batch_size)*3
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

## Prohledávání s normálním tréninkem nad původním datasetem
Definice hledaných hyperparametrů a jejich rozmezí.

In [None]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [None]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Získání modelu s definovanou odemčenou embedding vrstvou. 

In [None]:
def get_BiLSTM():
    return base.BiLSTMClassifier(embedding_matrix=embedding_matrix, embedding_dim=embedding_dim, fc_dim=400, hidden_dim=300, output_dim=2, freeze_embed=False)

In [None]:
base.reset_seed()

Konfigurace jednotlivých tréninků.

In [None]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-base-embedd_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-base-embedd_hp-search", epochs=num_epochs, batch_size=batch_size)

Konfigurace trenéra pro jednotlivé tréninky. 

In [23]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM()
)
  

Nastavení prohledávání.

In [None]:
best_trial_normal = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-embedd",
    n_trials=150
)

[I 2025-03-24 18:30:08,634] A new study created in memory with name: Base-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3845,0.417013,0.811927,0.818788,0.813253,0.811306
2,0.2429,0.441692,0.822248,0.828598,0.820946,0.820923
3,0.1846,0.464164,0.827982,0.832648,0.826871,0.826991
4,0.1491,0.550418,0.830275,0.837502,0.828913,0.828906
5,0.1234,0.527008,0.829128,0.829148,0.828966,0.829029
6,0.1036,0.579586,0.834862,0.835999,0.834302,0.834514


[I 2025-03-24 18:31:19,812] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3276,0.506878,0.799312,0.817779,0.801497,0.797102
2,0.185,0.438959,0.831422,0.833268,0.830713,0.830931
3,0.1248,0.422435,0.850917,0.851054,0.850699,0.850804


[I 2025-03-24 18:31:53,394] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4813,0.450207,0.790138,0.79016,0.790256,0.790124
2,0.3409,0.437236,0.801606,0.801706,0.801349,0.801442
3,0.3047,0.447074,0.807339,0.809566,0.806517,0.806652
4,0.2793,0.475632,0.801606,0.807456,0.800297,0.800127
5,0.2596,0.426536,0.822248,0.8222,0.822293,0.822219
6,0.244,0.434964,0.821101,0.821288,0.821335,0.8211


[I 2025-03-24 18:33:04,612] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 42}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3191,0.509428,0.817661,0.834853,0.819725,0.815911
2,0.1639,0.418752,0.84289,0.843054,0.842648,0.842761
3,0.1047,0.405241,0.84633,0.846876,0.845942,0.846123


[I 2025-03-24 18:33:38,960] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2926,0.454389,0.819954,0.825091,0.821093,0.819555
2,0.1444,0.436777,0.83945,0.842592,0.838554,0.838785
3,0.0865,0.425882,0.837156,0.83781,0.836722,0.836908
4,0.0527,0.669149,0.832569,0.832618,0.832723,0.832561
5,0.0323,0.960243,0.821101,0.821377,0.821377,0.821101
6,0.0194,1.036092,0.825688,0.825688,0.825798,0.825673


[I 2025-03-24 18:34:47,717] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4382,0.429064,0.792431,0.795617,0.793351,0.792169
2,0.3032,0.444195,0.819954,0.820924,0.819409,0.819593
3,0.258,0.46729,0.821101,0.824949,0.820072,0.820192


[I 2025-03-24 18:35:24,113] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3681,0.416729,0.821101,0.826063,0.822219,0.820724
2,0.2254,0.457935,0.823394,0.830439,0.82203,0.82197
3,0.1658,0.474362,0.831422,0.834617,0.830502,0.830699
4,0.1305,0.565301,0.829128,0.835211,0.827871,0.827922
5,0.1045,0.570624,0.832569,0.833073,0.832176,0.832343
6,0.0855,0.613566,0.848624,0.849835,0.848068,0.848305
7,0.0704,0.632543,0.83945,0.83964,0.83969,0.839449
8,0.0593,0.679648,0.832569,0.833207,0.832975,0.832561
9,0.0499,0.738283,0.829128,0.831204,0.82985,0.829029
10,0.0431,0.771138,0.832569,0.832513,0.832597,0.832537


[I 2025-03-24 18:37:22,119] Trial 6 finished with value: 0.8325370935494054 and parameters: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 26}. Best is trial 6 with value: 0.8325370935494054.


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4482,0.435506,0.792431,0.793388,0.79293,0.792398
2,0.3178,0.444224,0.805046,0.807011,0.804265,0.804403
3,0.2757,0.454228,0.824541,0.827356,0.823661,0.82384


[I 2025-03-24 18:37:58,796] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3592,0.424636,0.81422,0.820689,0.815505,0.813656
2,0.2196,0.457364,0.822248,0.827372,0.821072,0.821128
3,0.1597,0.461914,0.833716,0.836639,0.832839,0.833051


[I 2025-03-24 18:38:32,334] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3432,0.455811,0.808486,0.817391,0.81,0.807605
2,0.2052,0.453104,0.834862,0.838925,0.833839,0.834023
3,0.146,0.431766,0.844037,0.844097,0.843858,0.843937
4,0.1106,0.533242,0.83945,0.842292,0.838596,0.838831
5,0.0847,0.599109,0.838303,0.839756,0.83768,0.837909
6,0.0668,0.653487,0.841743,0.842917,0.841185,0.841409
7,0.0522,0.69875,0.84289,0.842847,0.842816,0.84283
8,0.0411,0.750131,0.841743,0.841749,0.841606,0.84166
9,0.0324,0.823903,0.840596,0.841831,0.841153,0.840561
10,0.0256,0.889494,0.836009,0.835948,0.835975,0.835961


[I 2025-03-24 18:40:29,757] Trial 9 finished with value: 0.8359606345514556 and parameters: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 2}. Best is trial 9 with value: 0.8359606345514556.


Trial 10 with params: {'learning_rate': 0.0026025741521183794, 'weight_decay': 0.007, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2956,0.439307,0.825688,0.829974,0.826724,0.82539
2,0.1409,0.430303,0.841743,0.842735,0.841227,0.841442
3,0.0834,0.472908,0.837156,0.838931,0.83647,0.836702
4,0.0504,0.745423,0.834862,0.834806,0.834891,0.834831
5,0.031,0.982434,0.826835,0.826913,0.827008,0.826829
6,0.0189,1.08762,0.837156,0.838499,0.836554,0.836777


[I 2025-03-24 18:41:38,566] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0020056372842325635, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2909,0.453307,0.826835,0.833738,0.828145,0.826286
2,0.1459,0.431288,0.850917,0.852345,0.85032,0.850571
3,0.0882,0.427232,0.84633,0.847343,0.845815,0.846038
4,0.0549,0.617919,0.832569,0.832617,0.832386,0.832462
5,0.0347,0.872408,0.826835,0.827152,0.826503,0.826643
6,0.0215,1.057437,0.838303,0.839193,0.837806,0.838011
7,0.014,1.217165,0.837156,0.838499,0.836554,0.836777
8,0.0081,1.375863,0.836009,0.835962,0.836059,0.835983
9,0.0055,1.530518,0.826835,0.827056,0.826545,0.826669
10,0.0039,1.761411,0.827982,0.828603,0.827545,0.82772


[I 2025-03-24 18:43:37,266] Trial 11 finished with value: 0.82771973636378 and parameters: {'learning_rate': 0.0020056372842325635, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 9 with value: 0.8359606345514556.


Trial 12 with params: {'learning_rate': 0.0005510960237541879, 'weight_decay': 0.004, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3482,0.481872,0.806193,0.818904,0.808001,0.804825
2,0.2028,0.439365,0.832569,0.836596,0.831544,0.831718
3,0.1425,0.442505,0.838303,0.838646,0.837975,0.838124
4,0.1081,0.534105,0.837156,0.838931,0.83647,0.836702
5,0.0823,0.60694,0.834862,0.835999,0.834302,0.834514
6,0.0643,0.654188,0.841743,0.843114,0.841143,0.841375


[I 2025-03-24 18:44:45,530] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.00019035618237958822, 'weight_decay': 0.005, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4012,0.418422,0.799312,0.805737,0.800613,0.798676
2,0.2691,0.459394,0.821101,0.826012,0.819946,0.820006
3,0.2157,0.481042,0.823394,0.829991,0.822072,0.822043
4,0.1823,0.547251,0.819954,0.827135,0.818567,0.818464
5,0.1563,0.499577,0.824541,0.82449,0.824461,0.824475
6,0.137,0.583938,0.825688,0.827161,0.82504,0.825243


[I 2025-03-24 18:45:57,351] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.00023364707944876568, 'weight_decay': 0.004, 'warmup_steps': 43}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3975,0.415112,0.802752,0.809449,0.804075,0.802102
2,0.2558,0.450676,0.817661,0.823482,0.816399,0.816374
3,0.1994,0.481412,0.824541,0.831389,0.823198,0.823162


[I 2025-03-24 18:46:33,435] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0013668811947394382, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3128,0.510776,0.81078,0.826937,0.8128,0.80905
2,0.1622,0.451417,0.84633,0.848175,0.845647,0.845901
3,0.103,0.396894,0.84633,0.847173,0.845858,0.846068
4,0.0678,0.611541,0.834862,0.836398,0.834217,0.834441
5,0.045,0.690096,0.834862,0.834977,0.834638,0.834737
6,0.0299,0.757961,0.834862,0.835999,0.834302,0.834514
7,0.0196,0.897805,0.833716,0.83494,0.833133,0.833347
8,0.0123,1.077309,0.833716,0.833697,0.833596,0.833637
9,0.0081,1.185826,0.829128,0.829106,0.829008,0.829047
10,0.0054,1.363338,0.829128,0.829451,0.828797,0.828939


[I 2025-03-24 18:48:29,694] Trial 15 finished with value: 0.8289392437294532 and parameters: {'learning_rate': 0.0013668811947394382, 'weight_decay': 0.0, 'warmup_steps': 24}. Best is trial 9 with value: 0.8359606345514556.


Trial 16 with params: {'learning_rate': 0.0002670832857772683, 'weight_decay': 0.001, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3859,0.417811,0.81078,0.817842,0.812126,0.81013
2,0.2461,0.441369,0.825688,0.830688,0.824535,0.824622
3,0.188,0.468298,0.827982,0.832648,0.826871,0.826991


[I 2025-03-24 18:49:05,528] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0020085822314002493, 'weight_decay': 0.008, 'warmup_steps': 35}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3045,0.4535,0.829128,0.840201,0.830776,0.828174
2,0.1485,0.411431,0.847477,0.847648,0.847236,0.847352
3,0.0904,0.397514,0.841743,0.842567,0.841269,0.841473
4,0.0565,0.5863,0.831422,0.831499,0.831218,0.831305
5,0.0368,0.800859,0.824541,0.825371,0.82404,0.824225
6,0.0225,0.89281,0.829128,0.829271,0.828881,0.828988


[I 2025-03-24 18:50:12,248] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.00063349301937938, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.334,0.48445,0.808486,0.823853,0.810464,0.806819
2,0.195,0.449916,0.832569,0.836596,0.831544,0.831718
3,0.135,0.418915,0.849771,0.849869,0.849573,0.849666
4,0.1004,0.534991,0.845183,0.846903,0.844521,0.84477
5,0.0756,0.600472,0.834862,0.837372,0.834049,0.834273
6,0.058,0.720487,0.84289,0.844828,0.842184,0.842432
7,0.0443,0.748806,0.84633,0.846302,0.846236,0.846265
8,0.0331,0.814605,0.841743,0.841682,0.841732,0.841702
9,0.0251,0.878576,0.836009,0.837233,0.836564,0.835973
10,0.0189,0.969338,0.845183,0.845277,0.844984,0.845076


[I 2025-03-24 18:52:12,727] Trial 18 finished with value: 0.8450757052332353 and parameters: {'learning_rate': 0.00063349301937938, 'weight_decay': 0.005, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 19 with params: {'learning_rate': 8.317868295138088e-05, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4553,0.440739,0.792431,0.793094,0.792845,0.792418
2,0.3245,0.445162,0.807339,0.809099,0.806601,0.806754
3,0.2865,0.455889,0.811927,0.815017,0.810979,0.811091


[I 2025-03-24 18:52:49,270] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0009158446429099127, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3183,0.513869,0.805046,0.822711,0.807169,0.80304
2,0.1784,0.432274,0.834862,0.836191,0.834259,0.834478
3,0.1189,0.419589,0.84289,0.843801,0.842395,0.842607
4,0.0838,0.529455,0.84633,0.847173,0.845858,0.846068
5,0.0604,0.652645,0.831422,0.834033,0.830586,0.830797
6,0.0436,0.728677,0.834862,0.838584,0.833881,0.834077


[I 2025-03-24 18:54:04,999] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00032177916596048777, 'weight_decay': 0.005, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3703,0.405676,0.817661,0.822038,0.818715,0.817332
2,0.2346,0.450059,0.822248,0.827765,0.82103,0.821061
3,0.1757,0.468573,0.834862,0.838925,0.833839,0.834023
4,0.1401,0.557538,0.824541,0.830947,0.82324,0.823234
5,0.1143,0.550075,0.826835,0.827056,0.826545,0.826669
6,0.0951,0.607952,0.840596,0.84186,0.840016,0.840243


[I 2025-03-24 18:55:16,400] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0011868088603629804, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3083,0.503218,0.816514,0.83121,0.818431,0.815034
2,0.1675,0.424575,0.847477,0.847467,0.847363,0.847405
3,0.1086,0.421625,0.853211,0.854447,0.852656,0.852901
4,0.0732,0.565127,0.84633,0.847343,0.845815,0.846038
5,0.0503,0.706539,0.826835,0.826974,0.826587,0.826692
6,0.0346,0.785558,0.837156,0.839422,0.836385,0.836619
7,0.0235,0.880772,0.844037,0.844245,0.843774,0.843898
8,0.0146,1.128801,0.834862,0.834806,0.834891,0.834831
9,0.01,1.17804,0.832569,0.832849,0.832849,0.832569
10,0.0065,1.373369,0.833716,0.83395,0.833428,0.833556


[I 2025-03-24 18:57:10,376] Trial 22 finished with value: 0.83355602214163 and parameters: {'learning_rate': 0.0011868088603629804, 'weight_decay': 0.004, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 23 with params: {'learning_rate': 0.0009790966942775099, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3202,0.516395,0.807339,0.825865,0.809506,0.805265
2,0.1757,0.423803,0.833716,0.83395,0.833428,0.833556
3,0.1165,0.419963,0.849771,0.850712,0.849278,0.8495
4,0.0807,0.552275,0.845183,0.846104,0.844689,0.844904
5,0.0572,0.644357,0.830275,0.831576,0.829671,0.829881
6,0.0409,0.759937,0.830275,0.833305,0.829376,0.829572


[I 2025-03-24 18:58:19,111] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.00047899011094305346, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3472,0.448087,0.815367,0.822973,0.816757,0.814682
2,0.2106,0.458768,0.827982,0.833419,0.826787,0.826866
3,0.1517,0.460762,0.832569,0.834309,0.831881,0.832102
4,0.1162,0.536596,0.838303,0.84128,0.837427,0.837657
5,0.0908,0.596062,0.832569,0.834543,0.831839,0.83206
6,0.0728,0.649118,0.84289,0.84364,0.842437,0.842636
7,0.058,0.690256,0.84289,0.842831,0.842858,0.842843
8,0.0468,0.727915,0.833716,0.833738,0.833849,0.833705
9,0.0382,0.791599,0.830275,0.831393,0.830807,0.830243
10,0.0312,0.853532,0.834862,0.8348,0.834849,0.83482


[I 2025-03-24 19:00:16,125] Trial 24 finished with value: 0.8348198077317717 and parameters: {'learning_rate': 0.00047899011094305346, 'weight_decay': 0.007, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 25 with params: {'learning_rate': 0.00041445024656083215, 'weight_decay': 0.006, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3579,0.428587,0.815367,0.822071,0.816673,0.814782
2,0.2189,0.468316,0.822248,0.828174,0.820988,0.820993
3,0.1592,0.468869,0.832569,0.835334,0.831713,0.831924


[I 2025-03-24 19:00:51,360] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.0005613023462080405, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3389,0.461591,0.809633,0.819337,0.811211,0.808666
2,0.2023,0.460385,0.826835,0.83245,0.825619,0.825679
3,0.1423,0.433963,0.844037,0.844245,0.843774,0.843898
4,0.1069,0.545739,0.84289,0.844593,0.842227,0.84247
5,0.0817,0.609529,0.836009,0.837892,0.835301,0.835531
6,0.0639,0.673073,0.847477,0.847961,0.84711,0.847284
7,0.0494,0.722005,0.841743,0.841688,0.841774,0.841713
8,0.0385,0.773519,0.845183,0.845125,0.845152,0.845138
9,0.0297,0.863893,0.831422,0.833043,0.83206,0.831358
10,0.023,0.92597,0.837156,0.8371,0.837185,0.837125


[I 2025-03-24 19:02:47,630] Trial 26 finished with value: 0.8371251183836683 and parameters: {'learning_rate': 0.0005613023462080405, 'weight_decay': 0.006, 'warmup_steps': 1}. Best is trial 18 with value: 0.8450757052332353.


Trial 27 with params: {'learning_rate': 0.00026715864232807217, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3788,0.414114,0.809633,0.817355,0.811042,0.8089
2,0.2472,0.443576,0.825688,0.830688,0.824535,0.824622
3,0.1908,0.473864,0.831422,0.835965,0.830334,0.83048


[I 2025-03-24 19:03:23,461] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0011463168487183346, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3095,0.520812,0.809633,0.826088,0.811674,0.80785
2,0.1691,0.423638,0.84289,0.842847,0.842816,0.84283
3,0.1101,0.425524,0.853211,0.854873,0.852572,0.852836
4,0.0749,0.554499,0.850917,0.852141,0.850362,0.850603
5,0.0519,0.656376,0.827982,0.828603,0.827545,0.82772
6,0.0361,0.753882,0.832569,0.837702,0.831418,0.831544


[I 2025-03-24 19:04:30,337] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.003707099022053846, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2859,0.425316,0.838303,0.84095,0.839111,0.83817
2,0.1336,0.451618,0.841743,0.842735,0.841227,0.841442
3,0.0754,0.502028,0.838303,0.840204,0.837596,0.837832
4,0.0426,0.863439,0.826835,0.827674,0.826335,0.826523
5,0.0263,0.920351,0.831422,0.831652,0.831134,0.83126
6,0.0155,1.146311,0.834862,0.834806,0.834891,0.834831


[I 2025-03-24 19:05:42,118] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0005537102935947302, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3395,0.462243,0.813073,0.823683,0.814715,0.812029
2,0.2027,0.452291,0.829128,0.833631,0.828039,0.828174
3,0.1431,0.43,0.840596,0.840536,0.840606,0.840561
4,0.1083,0.52498,0.84289,0.844828,0.842184,0.842432
5,0.0829,0.606618,0.834862,0.837372,0.834049,0.834273
6,0.065,0.672247,0.840596,0.841671,0.840059,0.840277
7,0.0506,0.710768,0.840596,0.840549,0.840648,0.840571
8,0.0393,0.770592,0.841743,0.841708,0.841816,0.841722
9,0.0309,0.834918,0.834862,0.836181,0.835438,0.83482
10,0.0242,0.906519,0.841743,0.841712,0.841648,0.841676


[I 2025-03-24 19:07:44,001] Trial 30 finished with value: 0.8416756571849591 and parameters: {'learning_rate': 0.0005537102935947302, 'weight_decay': 0.004, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 31 with params: {'learning_rate': 0.0006094795463684698, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3365,0.471651,0.81078,0.821883,0.812463,0.809656
2,0.1988,0.450608,0.834862,0.838259,0.833923,0.834129
3,0.1389,0.430271,0.837156,0.837123,0.837059,0.837087
4,0.1039,0.544698,0.84289,0.844167,0.842311,0.842542
5,0.0783,0.63101,0.837156,0.84027,0.836259,0.836482
6,0.0608,0.674231,0.84289,0.843977,0.842353,0.842575
7,0.0466,0.740611,0.83945,0.839388,0.839438,0.839408
8,0.0352,0.797547,0.838303,0.838283,0.838396,0.838286
9,0.0267,0.891252,0.833716,0.835571,0.834396,0.833637
10,0.0203,0.972122,0.841743,0.841712,0.841648,0.841676


[I 2025-03-24 19:09:42,827] Trial 31 finished with value: 0.8416756571849591 and parameters: {'learning_rate': 0.0006094795463684698, 'weight_decay': 0.006, 'warmup_steps': 2}. Best is trial 18 with value: 0.8450757052332353.


Trial 32 with params: {'learning_rate': 0.00041847249888242205, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.353,0.425808,0.818807,0.823375,0.819883,0.818463
2,0.2182,0.455724,0.823394,0.828737,0.822198,0.822249
3,0.1588,0.47034,0.826835,0.830281,0.825871,0.826038
4,0.1232,0.561921,0.832569,0.838101,0.831376,0.831483
5,0.0975,0.596805,0.824541,0.825535,0.823998,0.82419
6,0.0792,0.627418,0.840596,0.84186,0.840016,0.840243
7,0.0642,0.673434,0.838303,0.838287,0.838185,0.838226
8,0.0531,0.712706,0.832569,0.832618,0.832723,0.832561
9,0.0441,0.764209,0.827982,0.829485,0.828597,0.827924
10,0.037,0.809082,0.833716,0.833654,0.833681,0.833666


[I 2025-03-24 19:11:49,071] Trial 32 finished with value: 0.8336663776920354 and parameters: {'learning_rate': 0.00041847249888242205, 'weight_decay': 0.004, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 33 with params: {'learning_rate': 0.00072524218875279, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3294,0.494741,0.802752,0.821034,0.804917,0.800628
2,0.1888,0.440177,0.836009,0.83867,0.835175,0.835401
3,0.1285,0.425005,0.849771,0.849945,0.849531,0.849647
4,0.0942,0.521394,0.853211,0.853778,0.852825,0.853013
5,0.0696,0.620517,0.837156,0.83969,0.836343,0.836575
6,0.0523,0.757797,0.836009,0.839263,0.835091,0.835305


[I 2025-03-24 19:13:05,986] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0008483624710274836, 'weight_decay': 0.004, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.325,0.504819,0.800459,0.819363,0.802665,0.798213
2,0.1821,0.429675,0.834862,0.835999,0.834302,0.834514
3,0.122,0.42546,0.847477,0.848587,0.846942,0.847171
4,0.0865,0.532457,0.83945,0.840611,0.83889,0.839111
5,0.0623,0.644441,0.840596,0.8436,0.839722,0.83996
6,0.0455,0.751759,0.834862,0.838584,0.833881,0.834077


[I 2025-03-24 19:14:17,142] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.000371056014479313, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3627,0.413029,0.818807,0.822685,0.819799,0.818531
2,0.2266,0.465221,0.821101,0.82764,0.819778,0.819732
3,0.1669,0.466766,0.833716,0.837589,0.832712,0.832898


[I 2025-03-24 19:14:55,305] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 9.317856512792363e-05, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4493,0.435884,0.793578,0.794791,0.79414,0.793525
2,0.3189,0.443652,0.806193,0.807835,0.805475,0.805628
3,0.2782,0.452218,0.819954,0.822183,0.819157,0.819336
4,0.2483,0.510578,0.808486,0.816267,0.807011,0.806735
5,0.2264,0.437588,0.821101,0.821727,0.821504,0.821092
6,0.2086,0.45303,0.825688,0.825631,0.825714,0.825655


[I 2025-03-24 19:16:07,520] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.00044452284930224104, 'weight_decay': 0.001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3496,0.434146,0.817661,0.823973,0.818925,0.817129
2,0.2146,0.456634,0.823394,0.828737,0.822198,0.822249
3,0.1551,0.469465,0.830275,0.833014,0.829418,0.829622


[I 2025-03-24 19:16:43,391] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0010196863983558202, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3183,0.502657,0.803899,0.820435,0.805959,0.802018
2,0.1744,0.424955,0.830275,0.830905,0.829839,0.830017
3,0.1153,0.427048,0.847477,0.848587,0.846942,0.847171
4,0.0798,0.589209,0.832569,0.833884,0.831965,0.832179
5,0.0565,0.667916,0.829128,0.831194,0.828376,0.828587
6,0.0402,0.749871,0.827982,0.830983,0.827082,0.827269


[I 2025-03-24 19:17:59,349] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 5.7801019639330395e-05, 'weight_decay': 0.002, 'warmup_steps': 40}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4942,0.455651,0.786697,0.786625,0.786625,0.786625
2,0.3476,0.441306,0.801606,0.801779,0.801307,0.801415
3,0.3135,0.436937,0.805046,0.806021,0.804475,0.804635
4,0.2896,0.474527,0.798165,0.804132,0.796834,0.79662
5,0.271,0.426167,0.817661,0.817864,0.817368,0.817486
6,0.256,0.428999,0.815367,0.815799,0.815705,0.815365


[I 2025-03-24 19:19:05,791] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.004241076779716196, 'weight_decay': 0.003, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2886,0.42003,0.826835,0.827164,0.827134,0.826835
2,0.1316,0.466831,0.83945,0.839744,0.839143,0.839284
3,0.0752,0.509524,0.833716,0.836639,0.832839,0.833051


[I 2025-03-24 19:19:40,822] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 6.459897452290429e-05, 'weight_decay': 0.0, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4769,0.450378,0.783257,0.783207,0.783289,0.783222
2,0.3415,0.437766,0.801606,0.801647,0.801391,0.801467
3,0.3056,0.448823,0.805046,0.807501,0.80418,0.804295
4,0.2806,0.473643,0.802752,0.808431,0.801465,0.801321
5,0.261,0.425827,0.819954,0.819906,0.819999,0.819925
6,0.2456,0.434134,0.819954,0.820101,0.820167,0.819952


[I 2025-03-24 19:20:47,441] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0008700067416775609, 'weight_decay': 0.006, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3238,0.49752,0.803899,0.821872,0.806043,0.801835
2,0.1814,0.419555,0.83945,0.840112,0.839017,0.839205
3,0.1209,0.426603,0.84289,0.843493,0.842479,0.842665
4,0.0858,0.549207,0.840596,0.841061,0.840227,0.840395
5,0.0616,0.648886,0.837156,0.83969,0.836343,0.836575
6,0.0449,0.730383,0.834862,0.837948,0.833965,0.834178


[I 2025-03-24 19:21:54,412] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0005183001201815199, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3424,0.450096,0.81078,0.818761,0.812211,0.810024
2,0.2069,0.453617,0.827982,0.833026,0.826829,0.826929
3,0.148,0.438954,0.838303,0.83833,0.838143,0.838209
4,0.1124,0.529402,0.838303,0.84128,0.837427,0.837657
5,0.0867,0.59991,0.827982,0.829917,0.82725,0.827459
6,0.0688,0.660544,0.84289,0.843801,0.842395,0.842607
7,0.0538,0.70805,0.840596,0.840552,0.840522,0.840536
8,0.0427,0.752666,0.83945,0.839394,0.83948,0.839419
9,0.0339,0.812678,0.833716,0.835344,0.834354,0.833652
10,0.0272,0.877715,0.836009,0.835963,0.835933,0.835947


[I 2025-03-24 19:23:52,648] Trial 43 finished with value: 0.8359468224366691 and parameters: {'learning_rate': 0.0005183001201815199, 'weight_decay': 0.005, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 44 with params: {'learning_rate': 0.0006569508360410568, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3324,0.490373,0.802752,0.821034,0.804917,0.800628
2,0.1926,0.45257,0.833716,0.838298,0.832628,0.832786
3,0.1325,0.424987,0.848624,0.84905,0.848278,0.848444
4,0.0982,0.536413,0.850917,0.852345,0.85032,0.850571
5,0.0736,0.624542,0.833716,0.838298,0.832628,0.832786
6,0.0563,0.76646,0.836009,0.839263,0.835091,0.835305
7,0.043,0.751406,0.847477,0.847418,0.847489,0.847443
8,0.0319,0.821661,0.841743,0.841708,0.841816,0.841722
9,0.024,0.896197,0.831422,0.832636,0.831976,0.831385
10,0.0179,0.998672,0.841743,0.841867,0.841521,0.841623


[I 2025-03-24 19:25:49,564] Trial 44 finished with value: 0.8416231469002695 and parameters: {'learning_rate': 0.0006569508360410568, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 45 with params: {'learning_rate': 0.0007547428810264201, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3275,0.490531,0.806193,0.825034,0.80838,0.804059
2,0.1868,0.434162,0.836009,0.838137,0.835259,0.83549
3,0.1265,0.420479,0.848624,0.849177,0.848236,0.84842
4,0.0921,0.512238,0.84289,0.843361,0.842521,0.842691
5,0.0674,0.61407,0.836009,0.838959,0.835133,0.835354
6,0.0506,0.736667,0.834862,0.840039,0.833712,0.833852


[I 2025-03-24 19:26:54,136] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0007504479305967269, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3275,0.485608,0.805046,0.822711,0.807169,0.80304
2,0.187,0.430467,0.837156,0.838931,0.83647,0.836702
3,0.1269,0.424281,0.849771,0.850547,0.84932,0.849528
4,0.0925,0.509287,0.849771,0.850036,0.849489,0.849626
5,0.0678,0.612672,0.83945,0.841737,0.83868,0.83892
6,0.051,0.732027,0.83945,0.843584,0.838427,0.838634
7,0.0377,0.751462,0.845183,0.845277,0.844984,0.845076
8,0.0266,0.836315,0.841743,0.841708,0.841816,0.841722
9,0.0195,0.925713,0.837156,0.837799,0.837564,0.837148
10,0.0137,1.064632,0.841743,0.841749,0.841606,0.84166


[I 2025-03-24 19:28:50,170] Trial 46 finished with value: 0.8416598244173561 and parameters: {'learning_rate': 0.0007504479305967269, 'weight_decay': 0.004, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 47 with params: {'learning_rate': 0.0008460312093305546, 'weight_decay': 0.003, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3245,0.504971,0.799312,0.82089,0.801665,0.796691
2,0.1819,0.414157,0.84289,0.843493,0.842479,0.842665
3,0.1217,0.419427,0.850917,0.851137,0.850657,0.850785
4,0.0868,0.540516,0.84633,0.847343,0.845815,0.846038
5,0.063,0.619591,0.840596,0.843909,0.83968,0.839912
6,0.0462,0.708439,0.832569,0.835334,0.831713,0.831924


[I 2025-03-24 19:29:58,513] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0018353067784147546, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2927,0.460516,0.826835,0.837273,0.828439,0.825926
2,0.1494,0.434071,0.844037,0.844715,0.843605,0.843799
3,0.0911,0.414147,0.848624,0.848757,0.848404,0.848509
4,0.0583,0.611963,0.83945,0.840112,0.839017,0.839205
5,0.0372,0.848954,0.830275,0.83024,0.830344,0.830253
6,0.024,0.962387,0.830275,0.832477,0.829502,0.829715


[I 2025-03-24 19:31:05,230] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0005179356089709051, 'weight_decay': 0.003, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3453,0.442184,0.811927,0.821692,0.813505,0.810971
2,0.2067,0.445869,0.833716,0.837257,0.832754,0.832951
3,0.1469,0.441998,0.838303,0.838761,0.837933,0.838098
4,0.1116,0.535192,0.838303,0.842597,0.837259,0.837454
5,0.0858,0.617314,0.830275,0.833611,0.829334,0.829521
6,0.0677,0.648216,0.83945,0.840264,0.838974,0.839175
7,0.0531,0.711526,0.836009,0.835948,0.836017,0.835973
8,0.0419,0.767107,0.837156,0.837094,0.837143,0.837114
9,0.0333,0.827804,0.832569,0.833354,0.833018,0.832555
10,0.0265,0.89751,0.836009,0.836034,0.835849,0.835914


[I 2025-03-24 19:33:04,602] Trial 49 finished with value: 0.8359140093401742 and parameters: {'learning_rate': 0.0005179356089709051, 'weight_decay': 0.003, 'warmup_steps': 4}. Best is trial 18 with value: 0.8450757052332353.


Trial 50 with params: {'learning_rate': 0.0003144555766003982, 'weight_decay': 0.007, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3773,0.412911,0.81422,0.819474,0.815378,0.813788
2,0.2351,0.454707,0.822248,0.829962,0.82082,0.820701
3,0.1761,0.464375,0.837156,0.840911,0.836175,0.836381


[I 2025-03-24 19:33:39,289] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.00044122315760318433, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3516,0.434052,0.81422,0.822037,0.815631,0.813505
2,0.2145,0.456992,0.825688,0.830688,0.824535,0.824622
3,0.1553,0.473462,0.829128,0.831715,0.828292,0.828495
4,0.1199,0.553185,0.831422,0.834617,0.830502,0.830699
5,0.0942,0.600276,0.831422,0.83245,0.830881,0.831084
6,0.0765,0.649,0.840596,0.841337,0.840143,0.840339
7,0.0611,0.679949,0.838303,0.838258,0.838227,0.838241
8,0.0499,0.705249,0.832569,0.832618,0.832723,0.832561
9,0.0411,0.779316,0.831422,0.832453,0.831934,0.831395
10,0.0342,0.830467,0.836009,0.835963,0.835933,0.835947


[I 2025-03-24 19:35:41,083] Trial 51 finished with value: 0.8359468224366691 and parameters: {'learning_rate': 0.00044122315760318433, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 52 with params: {'learning_rate': 0.0006757172864525417, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3312,0.484777,0.803899,0.820435,0.805959,0.802018
2,0.1914,0.447386,0.832569,0.836596,0.831544,0.831718
3,0.1316,0.418897,0.848624,0.849177,0.848236,0.84842
4,0.097,0.531365,0.853211,0.854257,0.852698,0.852932
5,0.0725,0.626609,0.838303,0.84128,0.837427,0.837657
6,0.0551,0.748883,0.834862,0.838259,0.833923,0.834129


[I 2025-03-24 19:36:54,028] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.0007155978748623091, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3307,0.493684,0.802752,0.820301,0.804875,0.800723
2,0.1902,0.437566,0.829128,0.830732,0.82846,0.828672
3,0.1303,0.426399,0.841743,0.841947,0.841479,0.841602
4,0.0956,0.50674,0.850917,0.851779,0.850446,0.850663
5,0.0708,0.63246,0.838303,0.840204,0.837596,0.837832
6,0.0534,0.699456,0.84289,0.844593,0.842227,0.84247
7,0.04,0.769283,0.84633,0.846522,0.846573,0.846329
8,0.0286,0.848877,0.844037,0.843981,0.844068,0.844007
9,0.0212,0.939048,0.834862,0.835503,0.83527,0.834855
10,0.0153,1.036948,0.841743,0.841749,0.841606,0.84166


[I 2025-03-24 19:38:59,535] Trial 53 finished with value: 0.8416598244173561 and parameters: {'learning_rate': 0.0007155978748623091, 'weight_decay': 0.004, 'warmup_steps': 1}. Best is trial 18 with value: 0.8450757052332353.


Trial 54 with params: {'learning_rate': 0.000870270136838561, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3213,0.515747,0.800459,0.821704,0.802791,0.797906
2,0.18,0.424016,0.837156,0.838499,0.836554,0.836777
3,0.1204,0.428015,0.84633,0.847173,0.845858,0.846068
4,0.0853,0.536796,0.845183,0.845942,0.844731,0.844934
5,0.0614,0.63687,0.831422,0.834318,0.830544,0.830749
6,0.0443,0.706415,0.833716,0.83608,0.832923,0.833145


[I 2025-03-24 19:40:12,424] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.0009387502330983644, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3178,0.480213,0.811927,0.827787,0.813926,0.810249
2,0.1773,0.431039,0.837156,0.83781,0.836722,0.836908
3,0.1183,0.432266,0.847477,0.848587,0.846942,0.847171
4,0.083,0.546814,0.84289,0.844167,0.842311,0.842542
5,0.0585,0.63006,0.830275,0.831576,0.829671,0.829881
6,0.0419,0.731725,0.827982,0.831604,0.826998,0.827163


[I 2025-03-24 19:41:23,065] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0009430129366163565, 'weight_decay': 0.004, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3193,0.513189,0.805046,0.827408,0.807422,0.802448
2,0.1776,0.423074,0.831422,0.83175,0.831092,0.831235
3,0.1179,0.441917,0.84289,0.843801,0.842395,0.842607
4,0.0824,0.578525,0.844037,0.845636,0.843395,0.843639
5,0.0588,0.673682,0.833716,0.83558,0.833007,0.833231
6,0.0419,0.749051,0.834862,0.836856,0.834133,0.83436


[I 2025-03-24 19:42:45,232] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00023208452182587144, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.386,0.425362,0.805046,0.815697,0.806706,0.803923
2,0.2559,0.443397,0.830275,0.833931,0.829292,0.829468
3,0.1998,0.480388,0.832569,0.838101,0.831376,0.831483
4,0.1655,0.554149,0.823394,0.832391,0.821862,0.82166
5,0.1396,0.512456,0.816514,0.81647,0.81641,0.816436
6,0.1196,0.588864,0.830275,0.83223,0.829544,0.829759


[I 2025-03-24 19:43:58,661] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0019403519985629898, 'weight_decay': 0.003, 'warmup_steps': 36}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3062,0.450357,0.830275,0.838423,0.831692,0.829622
2,0.1502,0.426355,0.845183,0.846474,0.844605,0.84484
3,0.0921,0.412261,0.853211,0.853534,0.852909,0.85306
4,0.0582,0.614362,0.836009,0.83659,0.835596,0.835774
5,0.036,0.912831,0.830275,0.83024,0.830344,0.830253
6,0.0231,0.903422,0.830275,0.834266,0.82925,0.829413


[I 2025-03-24 19:45:10,051] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0002471824952041614, 'weight_decay': 0.001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3863,0.420133,0.807339,0.815969,0.808832,0.806483
2,0.2513,0.449227,0.825688,0.831484,0.824451,0.824492
3,0.1945,0.473537,0.827982,0.833419,0.826787,0.826866


[I 2025-03-24 19:45:46,623] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.004014238616142541, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2861,0.414458,0.830275,0.831393,0.830807,0.830243
2,0.1319,0.462974,0.832569,0.833884,0.831965,0.832179
3,0.0745,0.436447,0.847477,0.847557,0.847657,0.847472
4,0.0424,0.766292,0.847477,0.847714,0.847741,0.847477
5,0.0264,0.90311,0.826835,0.827152,0.826503,0.826643
6,0.015,1.087933,0.838303,0.838283,0.838396,0.838286
7,0.0087,1.401858,0.832569,0.833207,0.832134,0.832314
8,0.0053,1.628978,0.832569,0.832617,0.832386,0.832462
9,0.0038,1.850382,0.827982,0.828252,0.827671,0.827804
10,0.0031,1.96571,0.827982,0.828472,0.827587,0.82775


[I 2025-03-24 19:47:53,015] Trial 60 finished with value: 0.8277496839443742 and parameters: {'learning_rate': 0.004014238616142541, 'weight_decay': 0.0, 'warmup_steps': 14}. Best is trial 18 with value: 0.8450757052332353.


Trial 61 with params: {'learning_rate': 0.0005955999792763465, 'weight_decay': 0.005, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3382,0.471833,0.809633,0.821566,0.811379,0.808398
2,0.1996,0.443887,0.833716,0.836639,0.832839,0.833051
3,0.139,0.437688,0.837156,0.837273,0.836933,0.837033


[I 2025-03-24 19:48:30,839] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0002855860885158068, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3761,0.410525,0.81422,0.820689,0.815505,0.813656
2,0.2429,0.445109,0.826835,0.832051,0.825661,0.825744
3,0.1849,0.476634,0.827982,0.833828,0.826745,0.826801
4,0.1491,0.558121,0.823394,0.831383,0.821946,0.821819
5,0.1237,0.525471,0.833716,0.833795,0.833512,0.8336
6,0.1039,0.581182,0.844037,0.84487,0.843563,0.84377
7,0.0888,0.57619,0.830275,0.830464,0.830513,0.830274
8,0.0778,0.615594,0.833716,0.834286,0.834102,0.83371
9,0.0684,0.66715,0.824541,0.827387,0.825387,0.824373
10,0.0618,0.677831,0.823394,0.823443,0.823546,0.823386


[I 2025-03-24 19:50:36,303] Trial 62 finished with value: 0.8233861337177186 and parameters: {'learning_rate': 0.0002855860885158068, 'weight_decay': 0.008, 'warmup_steps': 5}. Best is trial 18 with value: 0.8450757052332353.


Trial 63 with params: {'learning_rate': 0.0004072984600075311, 'weight_decay': 0.004, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3597,0.421834,0.818807,0.824122,0.819967,0.818386
2,0.2203,0.460985,0.821101,0.826796,0.819862,0.819873
3,0.1602,0.463189,0.829128,0.832294,0.828208,0.828395


[I 2025-03-24 19:51:13,310] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.00011912397327149118, 'weight_decay': 0.006, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4349,0.428237,0.793578,0.796919,0.794519,0.7933
2,0.3014,0.445054,0.818807,0.820038,0.818199,0.818386
3,0.2559,0.4666,0.823394,0.827279,0.822367,0.822497
4,0.2245,0.500678,0.817661,0.825242,0.816231,0.816074
5,0.2006,0.451938,0.817661,0.818353,0.818083,0.817649
6,0.1824,0.498614,0.831422,0.83175,0.831092,0.831235


[I 2025-03-24 19:52:25,603] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.00010546468583372021, 'weight_decay': 0.008, 'warmup_steps': 43}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.45,0.433799,0.797018,0.799428,0.797813,0.796851
2,0.3096,0.445259,0.811927,0.813725,0.81119,0.811355
3,0.2661,0.463682,0.822248,0.82563,0.821283,0.82143


[I 2025-03-24 19:53:02,059] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.000573043392411109, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3378,0.472114,0.809633,0.821566,0.811379,0.808398
2,0.2008,0.450075,0.827982,0.833026,0.826829,0.826929
3,0.1409,0.432367,0.84289,0.842831,0.842858,0.842843
4,0.1059,0.520724,0.838303,0.839756,0.83768,0.837909
5,0.0808,0.611091,0.831422,0.833764,0.830629,0.830843
6,0.063,0.684962,0.838303,0.839193,0.837806,0.838011
7,0.0486,0.727093,0.83945,0.839415,0.839522,0.839428
8,0.0375,0.787089,0.845183,0.845137,0.845237,0.845159
9,0.0291,0.854639,0.836009,0.837049,0.836522,0.835983
10,0.0225,0.921633,0.841743,0.841749,0.841606,0.84166


[I 2025-03-24 19:55:08,285] Trial 66 finished with value: 0.8416598244173561 and parameters: {'learning_rate': 0.000573043392411109, 'weight_decay': 0.004, 'warmup_steps': 0}. Best is trial 18 with value: 0.8450757052332353.


Trial 67 with params: {'learning_rate': 0.0008290047333322148, 'weight_decay': 0.003, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3244,0.504553,0.798165,0.817699,0.800413,0.795792
2,0.182,0.435195,0.838303,0.839193,0.837806,0.838011
3,0.1222,0.425846,0.848624,0.84905,0.848278,0.848444
4,0.0879,0.541289,0.848624,0.84905,0.848278,0.848444
5,0.0633,0.63906,0.838303,0.84128,0.837427,0.837657
6,0.0466,0.718116,0.840596,0.841671,0.840059,0.840277
7,0.0336,0.786468,0.847477,0.847458,0.847573,0.847461
8,0.0229,0.879037,0.83945,0.839562,0.839648,0.839446
9,0.0162,1.014876,0.832569,0.833882,0.833144,0.832526
10,0.0113,1.130169,0.832569,0.832506,0.832555,0.832526


[I 2025-03-24 19:57:15,624] Trial 67 finished with value: 0.8325256383947128 and parameters: {'learning_rate': 0.0008290047333322148, 'weight_decay': 0.003, 'warmup_steps': 4}. Best is trial 18 with value: 0.8450757052332353.


Trial 68 with params: {'learning_rate': 0.00044220548251500884, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3517,0.439695,0.813073,0.821103,0.814505,0.812327
2,0.2143,0.453542,0.825688,0.830313,0.824577,0.824684
3,0.1554,0.472578,0.829128,0.831715,0.828292,0.828495


[I 2025-03-24 19:57:54,528] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.00079946755168091, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.327,0.487919,0.803899,0.821872,0.806043,0.801835
2,0.185,0.414694,0.833716,0.834162,0.833344,0.833505
3,0.1245,0.417194,0.844037,0.844576,0.843647,0.843826
4,0.0894,0.524984,0.84633,0.847173,0.845858,0.846068
5,0.0651,0.621355,0.831422,0.833508,0.830671,0.830888
6,0.0483,0.715135,0.827982,0.830162,0.827208,0.827414


[I 2025-03-24 19:59:09,103] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00447987319523807, 'weight_decay': 0.006, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2843,0.416442,0.836009,0.836455,0.836354,0.836007
2,0.1311,0.448349,0.838303,0.838646,0.837975,0.838124
3,0.0733,0.51423,0.838303,0.839756,0.83768,0.837909
4,0.0412,0.728531,0.837156,0.839422,0.836385,0.836619
5,0.0246,0.923768,0.83945,0.839395,0.839395,0.839395
6,0.0146,1.134323,0.833716,0.833654,0.833681,0.833666


[I 2025-03-24 20:00:20,306] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0006342434465905245, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3345,0.48232,0.806193,0.822141,0.808211,0.804421
2,0.1958,0.44909,0.832569,0.835334,0.831713,0.831924
3,0.1359,0.432961,0.837156,0.837159,0.837017,0.83707
4,0.101,0.531951,0.844037,0.84487,0.843563,0.84377
5,0.076,0.619354,0.838303,0.840451,0.837554,0.837791
6,0.0583,0.701594,0.844037,0.844715,0.843605,0.843799
7,0.0444,0.760282,0.84289,0.842843,0.842942,0.842865
8,0.0333,0.822762,0.841743,0.841708,0.841816,0.841722
9,0.025,0.928776,0.834862,0.83599,0.835396,0.834831
10,0.0188,0.997373,0.84633,0.84627,0.846321,0.846291


[I 2025-03-24 20:02:23,853] Trial 71 finished with value: 0.8462906544170652 and parameters: {'learning_rate': 0.0006342434465905245, 'weight_decay': 0.005, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 72 with params: {'learning_rate': 0.0018933032453204138, 'weight_decay': 0.005, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2956,0.438604,0.827982,0.833028,0.829102,0.827619
2,0.1492,0.434317,0.847477,0.848095,0.847068,0.847258
3,0.0911,0.417254,0.83945,0.841017,0.838806,0.83904
4,0.057,0.631782,0.833716,0.834586,0.833218,0.833416
5,0.0365,0.891777,0.824541,0.824494,0.824587,0.824513
6,0.0224,1.042763,0.827982,0.829917,0.82725,0.827459


[I 2025-03-24 20:03:36,924] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.0006510787496273821, 'weight_decay': 0.005, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3339,0.490678,0.798165,0.817699,0.800413,0.795792
2,0.1949,0.439874,0.836009,0.837892,0.835301,0.835531
3,0.1344,0.424406,0.844037,0.844007,0.843942,0.84397
4,0.0993,0.533347,0.84633,0.847017,0.8459,0.846096
5,0.0744,0.631585,0.834862,0.837107,0.834091,0.834318
6,0.0567,0.709065,0.84289,0.844828,0.842184,0.842432
7,0.043,0.748655,0.84633,0.846275,0.846363,0.846301
8,0.0318,0.814353,0.840596,0.840552,0.840522,0.840536
9,0.0237,0.913108,0.833716,0.834427,0.834144,0.833705
10,0.0175,0.997807,0.841743,0.841712,0.841648,0.841676


[I 2025-03-24 20:05:40,060] Trial 73 finished with value: 0.8416756571849591 and parameters: {'learning_rate': 0.0006510787496273821, 'weight_decay': 0.005, 'warmup_steps': 2}. Best is trial 71 with value: 0.8462906544170652.


Trial 74 with params: {'learning_rate': 0.0010891522167171316, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.311,0.494319,0.809633,0.827532,0.811758,0.807674
2,0.1712,0.426919,0.844037,0.844245,0.843774,0.843898
3,0.1126,0.425177,0.855505,0.856224,0.855077,0.855285
4,0.0767,0.582915,0.83945,0.842007,0.838638,0.838877
5,0.0534,0.651155,0.830275,0.83055,0.829965,0.8301
6,0.0372,0.762457,0.831422,0.835965,0.830334,0.83048


[I 2025-03-24 20:06:50,162] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0004253081652942732, 'weight_decay': 0.005, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3575,0.431884,0.817661,0.823973,0.818925,0.817129
2,0.217,0.459945,0.823394,0.828737,0.822198,0.822249
3,0.1573,0.470489,0.831422,0.834318,0.830544,0.830749


[I 2025-03-24 20:07:27,021] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.0004456131051372049, 'weight_decay': 0.003, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3495,0.432065,0.817661,0.823973,0.818925,0.817129
2,0.2145,0.454839,0.825688,0.830688,0.824535,0.824622
3,0.1551,0.475856,0.829128,0.831997,0.82825,0.828446
4,0.1199,0.551493,0.832569,0.835627,0.83167,0.831875
5,0.0941,0.594022,0.829128,0.829978,0.828629,0.82882
6,0.0761,0.645026,0.83945,0.84043,0.838932,0.839144
7,0.061,0.691995,0.837156,0.837123,0.837059,0.837087
8,0.05,0.719156,0.836009,0.836032,0.836143,0.835999
9,0.0411,0.777787,0.826835,0.828038,0.827387,0.826796
10,0.0341,0.826729,0.834862,0.834807,0.834807,0.834807


[I 2025-03-24 20:09:29,056] Trial 76 finished with value: 0.8348067693862087 and parameters: {'learning_rate': 0.0004456131051372049, 'weight_decay': 0.003, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 77 with params: {'learning_rate': 0.001007761125954244, 'weight_decay': 0.01, 'warmup_steps': 42}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.33,0.5267,0.803899,0.827459,0.806338,0.801124
2,0.1756,0.418745,0.836009,0.837061,0.83547,0.83568
3,0.1152,0.424947,0.850917,0.851349,0.850573,0.850741
4,0.0792,0.565787,0.840596,0.840844,0.840311,0.840443
5,0.0555,0.691877,0.824541,0.825713,0.823956,0.824153
6,0.0389,0.756278,0.832569,0.835627,0.83167,0.831875


[I 2025-03-24 20:10:39,293] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.0006540064112309553, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3325,0.485993,0.801606,0.819459,0.803749,0.799517
2,0.193,0.454026,0.831422,0.835605,0.830376,0.830537
3,0.1328,0.42158,0.847477,0.847648,0.847236,0.847352
4,0.0984,0.539014,0.84633,0.847729,0.845731,0.845973
5,0.0736,0.619791,0.837156,0.839972,0.836301,0.836529
6,0.0564,0.739894,0.840596,0.843028,0.839806,0.840049
7,0.0429,0.768753,0.850917,0.850985,0.850741,0.850822
8,0.0319,0.83226,0.83945,0.839394,0.83948,0.839419
9,0.024,0.880317,0.834862,0.836387,0.83548,0.834807
10,0.0179,0.989865,0.840596,0.840626,0.840437,0.840504


[I 2025-03-24 20:12:38,782] Trial 78 finished with value: 0.8405038272607288 and parameters: {'learning_rate': 0.0006540064112309553, 'weight_decay': 0.005, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 79 with params: {'learning_rate': 0.0005194018662380114, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3442,0.443572,0.811927,0.819696,0.813337,0.811203
2,0.2063,0.44695,0.832569,0.835935,0.831628,0.831825
3,0.1467,0.441181,0.84289,0.843054,0.842648,0.842761
4,0.1116,0.534682,0.845183,0.848241,0.84431,0.844565
5,0.0859,0.618314,0.829128,0.831997,0.82825,0.828446
6,0.068,0.652951,0.84289,0.843801,0.842395,0.842607
7,0.0533,0.710263,0.83945,0.839505,0.839269,0.839347
8,0.0422,0.752219,0.838303,0.838243,0.838269,0.838255
9,0.0335,0.815109,0.836009,0.837432,0.836606,0.835961
10,0.0266,0.883088,0.834862,0.8348,0.834849,0.83482


[I 2025-03-24 20:14:40,017] Trial 79 finished with value: 0.8348198077317717 and parameters: {'learning_rate': 0.0005194018662380114, 'weight_decay': 0.004, 'warmup_steps': 2}. Best is trial 71 with value: 0.8462906544170652.


Trial 80 with params: {'learning_rate': 0.0006092283910208092, 'weight_decay': 0.005, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3394,0.484213,0.807339,0.821023,0.809211,0.805865
2,0.1973,0.435734,0.836009,0.838959,0.835133,0.835354
3,0.1373,0.434135,0.834862,0.835253,0.834512,0.834667


[I 2025-03-24 20:15:16,705] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.0006967302171779035, 'weight_decay': 0.005, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3322,0.516851,0.798165,0.817699,0.800413,0.795792
2,0.1917,0.430199,0.838303,0.839756,0.83768,0.837909
3,0.1307,0.421691,0.841743,0.841947,0.841479,0.841602
4,0.0963,0.526948,0.850917,0.851779,0.850446,0.850663
5,0.0715,0.60966,0.832569,0.834792,0.831797,0.832017
6,0.054,0.690226,0.845183,0.84714,0.844479,0.844732
7,0.0405,0.724888,0.84289,0.842871,0.842984,0.842873
8,0.029,0.826593,0.840596,0.840549,0.840648,0.840571
9,0.0214,0.919733,0.832569,0.833516,0.83306,0.832547
10,0.0156,1.006349,0.833716,0.833697,0.833596,0.833637


[I 2025-03-24 20:17:15,158] Trial 81 finished with value: 0.833636613628798 and parameters: {'learning_rate': 0.0006967302171779035, 'weight_decay': 0.005, 'warmup_steps': 4}. Best is trial 71 with value: 0.8462906544170652.


Trial 82 with params: {'learning_rate': 0.0004972657620708597, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3452,0.450518,0.811927,0.820172,0.813379,0.811148
2,0.2085,0.457402,0.829128,0.834793,0.827913,0.827988
3,0.1494,0.442392,0.83945,0.840112,0.839017,0.839205
4,0.1139,0.528858,0.84289,0.845921,0.842016,0.842262
5,0.0887,0.609144,0.830275,0.831998,0.829587,0.829802
6,0.0706,0.651753,0.844037,0.844715,0.843605,0.843799
7,0.056,0.696915,0.841743,0.841688,0.841774,0.841713
8,0.0446,0.733887,0.836009,0.83599,0.836101,0.835992
9,0.0361,0.798372,0.836009,0.837049,0.836522,0.835983
10,0.0293,0.865709,0.834862,0.8348,0.834849,0.83482


[I 2025-03-24 20:19:14,535] Trial 82 finished with value: 0.8348198077317717 and parameters: {'learning_rate': 0.0004972657620708597, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 83 with params: {'learning_rate': 0.0010325211318544124, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.316,0.540274,0.799312,0.821713,0.801707,0.796583
2,0.1736,0.403004,0.845183,0.845543,0.844858,0.845012
3,0.114,0.410004,0.854358,0.854999,0.853951,0.854149
4,0.0789,0.536757,0.848624,0.848937,0.84832,0.848468
5,0.0553,0.644892,0.833716,0.83558,0.833007,0.833231
6,0.0387,0.715942,0.838303,0.84128,0.837427,0.837657
7,0.0266,0.78416,0.849771,0.849711,0.849783,0.849737
8,0.0173,0.968797,0.841743,0.841947,0.841479,0.841602
9,0.012,1.016081,0.840596,0.841173,0.840985,0.840591
10,0.008,1.207589,0.84289,0.843361,0.842521,0.842691


[I 2025-03-24 20:21:15,046] Trial 83 finished with value: 0.842691095739792 and parameters: {'learning_rate': 0.0010325211318544124, 'weight_decay': 0.005, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 84 with params: {'learning_rate': 0.0004485120744847567, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3568,0.437756,0.81422,0.821123,0.815547,0.813607
2,0.2143,0.452864,0.825688,0.831078,0.824493,0.824558
3,0.1542,0.45852,0.827982,0.829917,0.82725,0.827459


[I 2025-03-24 20:21:50,026] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0017312357186939505, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2969,0.484989,0.823394,0.837697,0.825272,0.822043
2,0.1526,0.418231,0.845183,0.846282,0.844647,0.844873
3,0.0941,0.412956,0.84633,0.847529,0.845773,0.846006
4,0.0606,0.58757,0.841743,0.843114,0.841143,0.841375
5,0.0392,0.72362,0.83945,0.839852,0.839101,0.839259
6,0.0255,0.881297,0.841743,0.844914,0.840848,0.841088
7,0.016,0.985404,0.830275,0.831387,0.829713,0.829917
8,0.0098,1.08593,0.837156,0.837273,0.836933,0.837033
9,0.0064,1.285234,0.830275,0.830654,0.829923,0.830074
10,0.0043,1.499256,0.827982,0.828748,0.827503,0.827688


[I 2025-03-24 20:23:53,394] Trial 85 finished with value: 0.8276879623969817 and parameters: {'learning_rate': 0.0017312357186939505, 'weight_decay': 0.005, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 86 with params: {'learning_rate': 0.001264238884199789, 'weight_decay': 0.004, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3098,0.51972,0.809633,0.826801,0.811716,0.807763
2,0.1655,0.423604,0.844037,0.845422,0.843437,0.843674
3,0.1063,0.410646,0.850917,0.851621,0.850488,0.85069
4,0.0721,0.574019,0.845183,0.845661,0.844816,0.844988
5,0.0489,0.667099,0.830275,0.830772,0.829881,0.830046
6,0.0331,0.759287,0.837156,0.84027,0.836259,0.836482


[I 2025-03-24 20:25:00,190] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0009944190574613513, 'weight_decay': 0.005, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3163,0.498909,0.807339,0.826626,0.809548,0.805171
2,0.1749,0.419953,0.838303,0.839756,0.83768,0.837909
3,0.1154,0.411038,0.850917,0.851477,0.85053,0.850716
4,0.0799,0.55342,0.84289,0.84364,0.842437,0.842636
5,0.0558,0.663064,0.831422,0.833268,0.830713,0.830931
6,0.0396,0.757036,0.829128,0.833631,0.828039,0.828174


[I 2025-03-24 20:26:10,390] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0005116210284867555, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3446,0.441755,0.811927,0.821692,0.813505,0.810971
2,0.2072,0.44737,0.832569,0.836596,0.831544,0.831718
3,0.1474,0.438016,0.83945,0.839744,0.839143,0.839284
4,0.1119,0.549916,0.837156,0.841254,0.836133,0.836329
5,0.0868,0.604379,0.833716,0.835352,0.833049,0.833272
6,0.0687,0.638606,0.840596,0.841192,0.840185,0.840368
7,0.0541,0.702162,0.840596,0.840582,0.840479,0.840521
8,0.0432,0.750972,0.840596,0.840577,0.84069,0.840579
9,0.0345,0.811975,0.832569,0.833882,0.833144,0.832526
10,0.0276,0.880218,0.834862,0.834806,0.834891,0.834831


[I 2025-03-24 20:28:16,915] Trial 88 finished with value: 0.8348311059665369 and parameters: {'learning_rate': 0.0005116210284867555, 'weight_decay': 0.004, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 89 with params: {'learning_rate': 0.0009393790535598593, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3177,0.480231,0.813073,0.82864,0.815052,0.811446
2,0.1775,0.437411,0.834862,0.836191,0.834259,0.834478
3,0.1183,0.435761,0.847477,0.848781,0.846899,0.847139
4,0.0829,0.533618,0.841743,0.843114,0.841143,0.841375
5,0.0582,0.640711,0.829128,0.830522,0.828503,0.828712
6,0.0422,0.706131,0.829128,0.831997,0.82825,0.828446


[I 2025-03-24 20:29:31,934] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0012704953525811175, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3111,0.554856,0.802752,0.826665,0.805212,0.799906
2,0.1646,0.413651,0.845183,0.845794,0.844774,0.844961
3,0.1052,0.421442,0.850917,0.851779,0.850446,0.850663
4,0.0709,0.595432,0.847477,0.848781,0.846899,0.847139
5,0.0481,0.727557,0.832569,0.832848,0.83226,0.832396
6,0.0328,0.740303,0.837156,0.84027,0.836259,0.836482
7,0.0218,0.833174,0.84289,0.843142,0.842605,0.842739
8,0.0135,1.047112,0.84633,0.84675,0.845984,0.846148
9,0.0092,1.166168,0.836009,0.83599,0.836101,0.835992
10,0.0061,1.349889,0.840596,0.841192,0.840185,0.840368


[I 2025-03-24 20:31:33,569] Trial 90 finished with value: 0.8403677095200153 and parameters: {'learning_rate': 0.0012704953525811175, 'weight_decay': 0.007, 'warmup_steps': 9}. Best is trial 71 with value: 0.8462906544170652.


Trial 91 with params: {'learning_rate': 0.0003230836665453217, 'weight_decay': 0.005, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3698,0.405699,0.819954,0.824356,0.821009,0.819629
2,0.2345,0.452326,0.819954,0.826248,0.818652,0.818612
3,0.1756,0.471035,0.836009,0.840632,0.834923,0.835093
4,0.1401,0.560506,0.822248,0.828598,0.820946,0.820923
5,0.114,0.55626,0.830275,0.830383,0.83005,0.830147
6,0.0948,0.60809,0.836009,0.837447,0.835386,0.835609
7,0.0794,0.617911,0.837156,0.837121,0.837227,0.837135
8,0.0688,0.654724,0.830275,0.830555,0.830555,0.830275
9,0.0593,0.703824,0.819954,0.822235,0.820714,0.819829
10,0.0526,0.736289,0.822248,0.82227,0.822377,0.822236


[I 2025-03-24 20:33:34,924] Trial 91 finished with value: 0.8222362511261483 and parameters: {'learning_rate': 0.0003230836665453217, 'weight_decay': 0.005, 'warmup_steps': 5}. Best is trial 71 with value: 0.8462906544170652.


Trial 92 with params: {'learning_rate': 0.001012057537279114, 'weight_decay': 0.007, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3167,0.530645,0.800459,0.824209,0.802917,0.797579
2,0.1748,0.430212,0.838303,0.838891,0.83789,0.838071
3,0.1152,0.429015,0.855505,0.856079,0.855119,0.85531
4,0.0796,0.56495,0.847477,0.848781,0.846899,0.847139
5,0.0557,0.672172,0.832569,0.833207,0.832134,0.832314
6,0.0394,0.75022,0.829128,0.831997,0.82825,0.828446


[I 2025-03-24 20:34:45,281] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.0006872827710493167, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3305,0.486197,0.802752,0.820301,0.804875,0.800723
2,0.1903,0.446647,0.827982,0.832285,0.826913,0.82705
3,0.1297,0.422606,0.849771,0.850712,0.849278,0.8495
4,0.0956,0.532647,0.853211,0.854873,0.852572,0.852836
5,0.0714,0.614135,0.836009,0.838396,0.835217,0.835446
6,0.0543,0.751203,0.840596,0.843909,0.83968,0.839912
7,0.0411,0.753117,0.847477,0.847436,0.847405,0.847419
8,0.0297,0.83686,0.841743,0.841743,0.841858,0.84173
9,0.0223,0.894367,0.831422,0.832636,0.831976,0.831385
10,0.0163,1.006864,0.834862,0.834913,0.83468,0.834757


[I 2025-03-24 20:36:42,763] Trial 93 finished with value: 0.834757204895381 and parameters: {'learning_rate': 0.0006872827710493167, 'weight_decay': 0.004, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 94 with params: {'learning_rate': 0.0004642372495304151, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3484,0.44255,0.81078,0.818294,0.812168,0.810078
2,0.2124,0.453846,0.827982,0.833419,0.826787,0.826866
3,0.1534,0.474731,0.833716,0.83558,0.833007,0.833231


[I 2025-03-24 20:37:19,103] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.000774619979693519, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.326,0.490276,0.806193,0.825801,0.808422,0.803963
2,0.1858,0.431578,0.834862,0.83662,0.834175,0.834402
3,0.1252,0.422873,0.849771,0.850141,0.849446,0.849604
4,0.091,0.519483,0.84633,0.847017,0.8459,0.846096
5,0.0661,0.623843,0.833716,0.835352,0.833049,0.833272
6,0.0494,0.737969,0.834862,0.837653,0.834007,0.834227


[I 2025-03-24 20:38:28,265] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.000202279886156917, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3957,0.418681,0.803899,0.811276,0.805285,0.803172
2,0.2643,0.449423,0.824541,0.829712,0.823367,0.823436
3,0.21,0.483079,0.824541,0.832321,0.823114,0.823014


[I 2025-03-24 20:39:03,173] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0009403176330541005, 'weight_decay': 0.005, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3203,0.530521,0.800459,0.819363,0.802665,0.798213
2,0.1784,0.415077,0.841743,0.841867,0.841521,0.841623
3,0.1183,0.417876,0.852064,0.85244,0.851741,0.8519
4,0.0831,0.547437,0.841743,0.841749,0.841606,0.84166
5,0.0594,0.643109,0.836009,0.837447,0.835386,0.835609
6,0.0431,0.720872,0.840596,0.842516,0.83989,0.840132
7,0.03,0.755315,0.841743,0.841708,0.841816,0.841722
8,0.0206,0.91369,0.834862,0.834827,0.834933,0.834841
9,0.0143,1.008089,0.836009,0.836723,0.836438,0.835999
10,0.0097,1.137182,0.838303,0.83833,0.838143,0.838209


[I 2025-03-24 20:41:03,751] Trial 97 finished with value: 0.8382089183004515 and parameters: {'learning_rate': 0.0009403176330541005, 'weight_decay': 0.005, 'warmup_steps': 7}. Best is trial 71 with value: 0.8462906544170652.


Trial 98 with params: {'learning_rate': 0.001335501850774799, 'weight_decay': 0.006, 'warmup_steps': 40}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3186,0.479208,0.824541,0.837277,0.826313,0.82337
2,0.1646,0.417889,0.844037,0.844245,0.843774,0.843898
3,0.105,0.414572,0.847477,0.848781,0.846899,0.847139
4,0.07,0.607788,0.844037,0.845422,0.843437,0.843674
5,0.0469,0.720657,0.838303,0.838388,0.838101,0.83819
6,0.0315,0.799222,0.83945,0.840112,0.839017,0.839205
7,0.0207,0.934207,0.84633,0.846876,0.845942,0.846123
8,0.0134,1.110767,0.834862,0.834913,0.83468,0.834757
9,0.0087,1.185928,0.830275,0.83046,0.830008,0.830124
10,0.0059,1.416692,0.829128,0.829825,0.828671,0.828853


[I 2025-03-24 20:43:00,249] Trial 98 finished with value: 0.8288527172832042 and parameters: {'learning_rate': 0.001335501850774799, 'weight_decay': 0.006, 'warmup_steps': 40}. Best is trial 71 with value: 0.8462906544170652.


Trial 99 with params: {'learning_rate': 0.0004805104254499945, 'weight_decay': 0.002, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3513,0.438185,0.813073,0.821103,0.814505,0.812327
2,0.2104,0.447331,0.829128,0.833275,0.828082,0.828232
3,0.1502,0.445232,0.833716,0.835139,0.833091,0.83331


[I 2025-03-24 20:43:36,495] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.001121026528146478, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3122,0.526755,0.801606,0.82252,0.803917,0.79912
2,0.1701,0.417493,0.84289,0.842831,0.842858,0.842843
3,0.1101,0.408551,0.856651,0.856836,0.856414,0.856533
4,0.0752,0.567189,0.844037,0.845039,0.843521,0.84374
5,0.0514,0.664688,0.827982,0.828603,0.827545,0.82772
6,0.0352,0.791405,0.833716,0.83608,0.832923,0.833145


[I 2025-03-24 20:44:46,002] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0004842717273193513, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3468,0.437826,0.811927,0.819696,0.813337,0.811203
2,0.2099,0.448259,0.837156,0.841987,0.836049,0.836218
3,0.1501,0.452226,0.834862,0.836191,0.834259,0.834478
4,0.115,0.545772,0.83945,0.843584,0.838427,0.838634
5,0.0891,0.609342,0.834862,0.836398,0.834217,0.834441
6,0.0712,0.644876,0.844037,0.844715,0.843605,0.843799
7,0.0564,0.685564,0.84289,0.842921,0.842732,0.842799
8,0.0456,0.726373,0.83945,0.839394,0.83948,0.839419
9,0.0367,0.788561,0.83945,0.840587,0.839985,0.839419
10,0.0298,0.849425,0.837156,0.837094,0.837143,0.837114


[I 2025-03-24 20:46:45,581] Trial 101 finished with value: 0.8371139770688304 and parameters: {'learning_rate': 0.0004842717273193513, 'weight_decay': 0.004, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 102 with params: {'learning_rate': 0.0006945643203235844, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3317,0.505733,0.805046,0.821287,0.807085,0.80322
2,0.1915,0.438648,0.831422,0.833268,0.830713,0.830931
3,0.1311,0.427634,0.84289,0.843244,0.842563,0.842716
4,0.0969,0.50859,0.853211,0.85335,0.852993,0.8531
5,0.072,0.616698,0.841743,0.844611,0.84089,0.841134
6,0.0547,0.704957,0.845183,0.846474,0.844605,0.84484
7,0.0409,0.759537,0.841743,0.841708,0.841816,0.841722
8,0.0297,0.829816,0.844037,0.843976,0.844026,0.843996
9,0.022,0.918575,0.844037,0.844321,0.844321,0.844037
10,0.0161,1.021918,0.841743,0.841867,0.841521,0.841623


[I 2025-03-24 20:48:43,162] Trial 102 finished with value: 0.8416231469002695 and parameters: {'learning_rate': 0.0006945643203235844, 'weight_decay': 0.006, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 103 with params: {'learning_rate': 0.0007765846360430897, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3282,0.524634,0.799312,0.819299,0.801581,0.796901
2,0.186,0.430354,0.838303,0.839193,0.837806,0.838011
3,0.1254,0.425995,0.84633,0.847173,0.845858,0.846068
4,0.0906,0.517181,0.84633,0.84675,0.845984,0.846148
5,0.0659,0.62581,0.832569,0.836596,0.831544,0.831718
6,0.0494,0.706317,0.83945,0.842592,0.838554,0.838785
7,0.036,0.754847,0.848624,0.84891,0.84891,0.848624
8,0.0253,0.829578,0.837156,0.8371,0.837185,0.837125
9,0.0183,0.955536,0.832569,0.833692,0.833102,0.832537
10,0.0127,1.056132,0.836009,0.836091,0.835807,0.835895


[I 2025-03-24 20:50:36,981] Trial 103 finished with value: 0.8358950062840937 and parameters: {'learning_rate': 0.0007765846360430897, 'weight_decay': 0.007, 'warmup_steps': 8}. Best is trial 71 with value: 0.8462906544170652.


Trial 104 with params: {'learning_rate': 6.119956273045214e-05, 'weight_decay': 0.006, 'warmup_steps': 34}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4884,0.452363,0.786697,0.786637,0.78671,0.786657
2,0.344,0.438512,0.802752,0.802823,0.802517,0.802603
3,0.3091,0.439656,0.807339,0.808686,0.806685,0.806848


[I 2025-03-24 20:51:10,610] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.0005090231735865456, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3444,0.456677,0.813073,0.822605,0.814631,0.812154
2,0.2076,0.455174,0.825688,0.831078,0.824493,0.824558
3,0.1489,0.444595,0.841743,0.841801,0.841564,0.841642
4,0.1137,0.533421,0.83945,0.842592,0.838554,0.838785
5,0.0879,0.606981,0.829128,0.830956,0.828418,0.828631
6,0.0701,0.655434,0.841743,0.842414,0.841311,0.841502
7,0.055,0.700814,0.841743,0.841712,0.841648,0.841676
8,0.044,0.743666,0.836009,0.83599,0.836101,0.835992
9,0.0352,0.815292,0.830275,0.831786,0.830892,0.830218
10,0.0285,0.878234,0.837156,0.837094,0.837143,0.837114


[I 2025-03-24 20:53:11,778] Trial 105 finished with value: 0.8371139770688304 and parameters: {'learning_rate': 0.0005090231735865456, 'weight_decay': 0.007, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 106 with params: {'learning_rate': 0.0011592895639949924, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3088,0.505422,0.81422,0.830889,0.816263,0.81248
2,0.1683,0.420298,0.841743,0.841712,0.841648,0.841676
3,0.1094,0.424317,0.848624,0.850037,0.848026,0.848272
4,0.0744,0.568914,0.84289,0.843361,0.842521,0.842691
5,0.0508,0.684479,0.827982,0.828252,0.827671,0.827804
6,0.0351,0.781957,0.838303,0.840204,0.837596,0.837832
7,0.024,0.879554,0.83945,0.839505,0.839269,0.839347
8,0.0153,1.028635,0.833716,0.833697,0.833596,0.833637
9,0.0103,1.105678,0.834862,0.835503,0.83527,0.834855
10,0.0068,1.294993,0.833716,0.834049,0.833386,0.833531


[I 2025-03-24 20:55:12,001] Trial 106 finished with value: 0.8335314787971189 and parameters: {'learning_rate': 0.0011592895639949924, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 107 with params: {'learning_rate': 0.0014616406430306304, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3032,0.54407,0.81422,0.833112,0.816389,0.81222
2,0.1582,0.411312,0.844037,0.84487,0.843563,0.84377
3,0.0995,0.402867,0.850917,0.851349,0.850573,0.850741
4,0.0648,0.569611,0.849771,0.850141,0.849446,0.849604
5,0.0423,0.71215,0.829128,0.829151,0.82926,0.829117
6,0.0281,0.782528,0.84633,0.847729,0.845731,0.845973
7,0.019,0.854754,0.853211,0.853162,0.853162,0.853162
8,0.0119,1.057995,0.84289,0.84298,0.84269,0.842781
9,0.0078,1.266937,0.84289,0.842831,0.842858,0.842843
10,0.0052,1.434748,0.838303,0.838646,0.837975,0.838124


[I 2025-03-24 20:57:08,380] Trial 107 finished with value: 0.8381237138647845 and parameters: {'learning_rate': 0.0014616406430306304, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 108 with params: {'learning_rate': 0.0005779583069298079, 'weight_decay': 0.006, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3415,0.480791,0.806193,0.819518,0.808043,0.804748
2,0.2004,0.437519,0.836009,0.838396,0.835217,0.835446
3,0.14,0.435401,0.83945,0.83965,0.839185,0.839307
4,0.1052,0.522884,0.841743,0.842917,0.841185,0.841409
5,0.0803,0.647158,0.832569,0.836949,0.831502,0.831662
6,0.0627,0.667124,0.844037,0.845039,0.843521,0.84374
7,0.0484,0.712385,0.845183,0.845137,0.845237,0.845159
8,0.0373,0.775455,0.834862,0.834806,0.834891,0.834831
9,0.0289,0.839038,0.830275,0.830911,0.830681,0.830267
10,0.0223,0.920095,0.833716,0.833668,0.833639,0.833652


[I 2025-03-24 20:59:07,596] Trial 108 finished with value: 0.8336523724008182 and parameters: {'learning_rate': 0.0005779583069298079, 'weight_decay': 0.006, 'warmup_steps': 10}. Best is trial 71 with value: 0.8462906544170652.


Trial 109 with params: {'learning_rate': 0.002139751907835521, 'weight_decay': 0.003, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2897,0.439998,0.826835,0.836183,0.828355,0.826038
2,0.1439,0.434628,0.849771,0.851765,0.849068,0.849333
3,0.0868,0.402291,0.840596,0.842516,0.83989,0.840132
4,0.0539,0.674633,0.838303,0.838646,0.837975,0.838124
5,0.0342,0.801721,0.840596,0.840536,0.840606,0.840561
6,0.0219,0.865991,0.83945,0.839454,0.839311,0.839365
7,0.0145,1.0811,0.823394,0.823335,0.823335,0.823335
8,0.0087,1.175506,0.825688,0.825729,0.825503,0.825577
9,0.0057,1.368735,0.825688,0.826171,0.825293,0.825453
10,0.004,1.544154,0.824541,0.825535,0.823998,0.82419


[I 2025-03-24 21:01:09,566] Trial 109 finished with value: 0.8241896099823025 and parameters: {'learning_rate': 0.002139751907835521, 'weight_decay': 0.003, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 110 with params: {'learning_rate': 0.0005243018149062744, 'weight_decay': 0.005, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3448,0.444365,0.809633,0.819337,0.811211,0.808666
2,0.2054,0.445676,0.830275,0.834617,0.829208,0.829356
3,0.145,0.435419,0.837156,0.837445,0.836849,0.836988


[I 2025-03-24 21:01:46,994] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0006037156313354205, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3357,0.483511,0.808486,0.821898,0.810337,0.807059
2,0.198,0.451928,0.830275,0.835364,0.829124,0.829237
3,0.1381,0.432366,0.840596,0.840552,0.840522,0.840536
4,0.1032,0.527677,0.840596,0.84186,0.840016,0.840243
5,0.0782,0.618215,0.829128,0.831997,0.82825,0.828446
6,0.0608,0.690991,0.844037,0.845223,0.843479,0.843708
7,0.0466,0.745743,0.83945,0.839388,0.839438,0.839408
8,0.0356,0.817137,0.84289,0.842831,0.842858,0.842843
9,0.0271,0.879897,0.830275,0.831582,0.83085,0.830231
10,0.0208,0.967133,0.840596,0.840552,0.840522,0.840536


[I 2025-03-24 21:03:44,671] Trial 111 finished with value: 0.8405357225083707 and parameters: {'learning_rate': 0.0006037156313354205, 'weight_decay': 0.005, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 112 with params: {'learning_rate': 0.0010234036744130955, 'weight_decay': 0.003, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3132,0.506462,0.805046,0.823449,0.807211,0.802947
2,0.1737,0.429302,0.834862,0.835821,0.834344,0.834548
3,0.1148,0.428353,0.848624,0.849476,0.848152,0.848365
4,0.0791,0.54777,0.845183,0.846681,0.844563,0.844806
5,0.0554,0.645849,0.829128,0.829686,0.828713,0.828883
6,0.0393,0.74718,0.832569,0.835935,0.831628,0.831825


[I 2025-03-24 21:04:50,370] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0006140292550362502, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3352,0.490888,0.806193,0.822141,0.808211,0.804421
2,0.1974,0.442451,0.836009,0.839917,0.835007,0.835203
3,0.1373,0.427254,0.84633,0.84627,0.846321,0.846291
4,0.1028,0.529273,0.84289,0.844828,0.842184,0.842432
5,0.0777,0.614319,0.838303,0.839554,0.837722,0.837944
6,0.0601,0.680813,0.84633,0.847945,0.845689,0.845938
7,0.046,0.739211,0.840596,0.840536,0.840606,0.840561
8,0.0349,0.794689,0.841743,0.841708,0.841816,0.841722
9,0.0266,0.866219,0.836009,0.837049,0.836522,0.835983
10,0.0203,0.952119,0.838303,0.83833,0.838143,0.838209


[I 2025-03-24 21:06:46,134] Trial 113 finished with value: 0.8382089183004515 and parameters: {'learning_rate': 0.0006140292550362502, 'weight_decay': 0.005, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 114 with params: {'learning_rate': 0.00039855775770016326, 'weight_decay': 0.006, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3592,0.42084,0.816514,0.822191,0.817715,0.816045
2,0.2223,0.464207,0.816514,0.822515,0.815231,0.815183
3,0.1626,0.464389,0.831422,0.833764,0.830629,0.830843


[I 2025-03-24 21:07:20,687] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0004848410928349712, 'weight_decay': 0.002, 'warmup_steps': 38}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.358,0.449545,0.806193,0.816604,0.807832,0.80511
2,0.2089,0.44624,0.830275,0.836172,0.829039,0.82911
3,0.1483,0.435449,0.83945,0.839852,0.839101,0.839259
4,0.1132,0.526025,0.838303,0.841908,0.837343,0.837559
5,0.0877,0.592971,0.833716,0.835823,0.832965,0.833189
6,0.0692,0.635401,0.841743,0.842567,0.841269,0.841473
7,0.0544,0.676147,0.84633,0.846275,0.846363,0.846301
8,0.0432,0.736213,0.84289,0.842843,0.842942,0.842865
9,0.0345,0.831903,0.837156,0.838481,0.837733,0.837114
10,0.0278,0.874998,0.83945,0.839395,0.839395,0.839395


[I 2025-03-24 21:09:18,457] Trial 115 finished with value: 0.8393954702365918 and parameters: {'learning_rate': 0.0004848410928349712, 'weight_decay': 0.002, 'warmup_steps': 38}. Best is trial 71 with value: 0.8462906544170652.


Trial 116 with params: {'learning_rate': 0.0010621434934449268, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3129,0.511631,0.811927,0.827104,0.813884,0.81033
2,0.1718,0.418061,0.837156,0.837159,0.837017,0.83707
3,0.1136,0.42528,0.847477,0.848408,0.846984,0.847202
4,0.0773,0.557681,0.848624,0.849835,0.848068,0.848305
5,0.0539,0.658638,0.833716,0.834756,0.833175,0.833382
6,0.0383,0.730215,0.831422,0.834932,0.83046,0.830647


[I 2025-03-24 21:10:29,937] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.0027121193476131807, 'weight_decay': 0.009000000000000001, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2971,0.454662,0.826835,0.830621,0.827808,0.826587
2,0.1408,0.432141,0.852064,0.852334,0.851783,0.851922
3,0.0818,0.420123,0.841743,0.841743,0.841858,0.84173
4,0.0499,0.649405,0.832569,0.832757,0.832302,0.83242
5,0.0313,0.902003,0.832569,0.832506,0.832555,0.832526
6,0.0201,0.980039,0.827982,0.828252,0.827671,0.827804


[I 2025-03-24 21:11:39,742] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0006996305476919972, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3305,0.49021,0.808486,0.823853,0.810464,0.806819
2,0.1899,0.4551,0.832569,0.836949,0.831502,0.831662
3,0.1298,0.41978,0.849771,0.849762,0.849657,0.849699
4,0.0955,0.529866,0.858945,0.860315,0.858371,0.858632
5,0.0712,0.636317,0.834862,0.838584,0.833881,0.834077
6,0.054,0.758789,0.840596,0.844573,0.839595,0.839812
7,0.0405,0.773939,0.844037,0.844007,0.843942,0.84397
8,0.0294,0.845369,0.84289,0.842831,0.842858,0.842843
9,0.0219,0.919877,0.833716,0.834751,0.834228,0.833689
10,0.0159,1.040915,0.84289,0.843054,0.842648,0.842761


[I 2025-03-24 21:13:45,289] Trial 118 finished with value: 0.8427606648950523 and parameters: {'learning_rate': 0.0006996305476919972, 'weight_decay': 0.005, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 119 with params: {'learning_rate': 0.0009100109835856238, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3226,0.512201,0.803899,0.824157,0.806169,0.801544
2,0.1787,0.412813,0.841743,0.841947,0.841479,0.841602
3,0.1181,0.437584,0.847477,0.848781,0.846899,0.847139
4,0.0828,0.549883,0.841743,0.842151,0.841395,0.841556
5,0.0588,0.662763,0.831422,0.834318,0.830544,0.830749
6,0.0426,0.737352,0.837156,0.839422,0.836385,0.836619
7,0.0302,0.799415,0.837156,0.837156,0.83727,0.837142
8,0.0203,0.928206,0.838303,0.838287,0.838185,0.838226
9,0.0142,1.012224,0.833716,0.834047,0.834017,0.833715
10,0.0094,1.154782,0.838303,0.838891,0.83789,0.838071


[I 2025-03-24 21:15:39,950] Trial 119 finished with value: 0.8380708420310947 and parameters: {'learning_rate': 0.0009100109835856238, 'weight_decay': 0.003, 'warmup_steps': 12}. Best is trial 71 with value: 0.8462906544170652.


Trial 120 with params: {'learning_rate': 0.001096315032466265, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3141,0.525985,0.802752,0.820301,0.804875,0.800723
2,0.1717,0.425635,0.845183,0.846104,0.844689,0.844904
3,0.1125,0.416054,0.848624,0.849476,0.848152,0.848365
4,0.0774,0.549143,0.849771,0.850141,0.849446,0.849604
5,0.0532,0.660344,0.834862,0.836398,0.834217,0.834441
6,0.0368,0.698031,0.836009,0.839263,0.835091,0.835305


[I 2025-03-24 21:16:47,861] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.0004513249206773796, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3499,0.437057,0.815367,0.822071,0.816673,0.814782
2,0.2137,0.459544,0.823394,0.828737,0.822198,0.822249
3,0.1545,0.477736,0.830275,0.832477,0.829502,0.829715


[I 2025-03-24 21:17:25,733] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.000696693060306976, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3304,0.493247,0.805046,0.821991,0.807127,0.803131
2,0.1905,0.451503,0.832569,0.836258,0.831586,0.831772
3,0.1299,0.426676,0.844037,0.844576,0.843647,0.843826
4,0.0957,0.538881,0.848624,0.850734,0.847899,0.848164
5,0.0711,0.620477,0.83945,0.842292,0.838596,0.838831
6,0.0541,0.755644,0.836009,0.839263,0.835091,0.835305


[I 2025-03-24 21:18:37,081] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0002971869536503394, 'weight_decay': 0.003, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3722,0.409226,0.813073,0.819312,0.814337,0.812529
2,0.2398,0.442421,0.825688,0.831484,0.824451,0.824492
3,0.1819,0.480928,0.833716,0.840343,0.832418,0.832476


[I 2025-03-24 21:19:14,322] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0005363790361907136, 'weight_decay': 0.004, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3438,0.462726,0.81078,0.821324,0.812421,0.809723
2,0.2046,0.451501,0.827982,0.832648,0.826871,0.826991
3,0.1445,0.440964,0.84289,0.842877,0.842774,0.842815
4,0.1096,0.536992,0.841743,0.844611,0.84089,0.841134
5,0.0839,0.629904,0.830275,0.833305,0.829376,0.829572
6,0.0663,0.666806,0.841743,0.843326,0.8411,0.841339
7,0.0517,0.70419,0.83945,0.839394,0.83948,0.839419
8,0.0405,0.766083,0.837156,0.8371,0.837185,0.837125
9,0.0319,0.840999,0.833716,0.835571,0.834396,0.833637
10,0.0253,0.910814,0.832569,0.832568,0.832428,0.832481


[I 2025-03-24 21:21:15,171] Trial 124 finished with value: 0.8324806838038695 and parameters: {'learning_rate': 0.0005363790361907136, 'weight_decay': 0.004, 'warmup_steps': 6}. Best is trial 71 with value: 0.8462906544170652.


Trial 125 with params: {'learning_rate': 0.0006645513128932002, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3326,0.505217,0.802752,0.821034,0.804917,0.800628
2,0.1931,0.429995,0.836009,0.837061,0.83547,0.83568
3,0.1329,0.424197,0.840596,0.840684,0.840395,0.840485
4,0.0982,0.528001,0.845183,0.845942,0.844731,0.844934
5,0.0736,0.61105,0.837156,0.838499,0.836554,0.836777
6,0.0561,0.704389,0.844037,0.845636,0.843395,0.843639
7,0.0424,0.74223,0.841743,0.84169,0.84169,0.84169
8,0.031,0.843577,0.841743,0.841688,0.841774,0.841713
9,0.0232,0.925552,0.841743,0.842391,0.842153,0.841736
10,0.017,1.010448,0.840596,0.840582,0.840479,0.840521


[I 2025-03-24 21:23:17,756] Trial 125 finished with value: 0.8405206158234686 and parameters: {'learning_rate': 0.0006645513128932002, 'weight_decay': 0.003, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 126 with params: {'learning_rate': 0.0006190017548889218, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3351,0.472587,0.811927,0.823351,0.813631,0.810776
2,0.1974,0.459092,0.826835,0.831298,0.825745,0.825867
3,0.1373,0.42795,0.841743,0.841801,0.841564,0.841642
4,0.1022,0.544335,0.840596,0.841671,0.840059,0.840277
5,0.0772,0.623364,0.834862,0.836191,0.834259,0.834478
6,0.0597,0.687393,0.841743,0.842414,0.841311,0.841502
7,0.0457,0.749765,0.845183,0.845164,0.845279,0.845167
8,0.0349,0.797123,0.840596,0.840577,0.84069,0.840579
9,0.0264,0.893861,0.832569,0.834086,0.833186,0.832512
10,0.0202,0.962582,0.840596,0.840536,0.840606,0.840561


[I 2025-03-24 21:25:23,224] Trial 126 finished with value: 0.8405608939576303 and parameters: {'learning_rate': 0.0006190017548889218, 'weight_decay': 0.005, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 127 with params: {'learning_rate': 0.00046444615934274026, 'weight_decay': 0.006, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3514,0.435708,0.811927,0.820172,0.813379,0.811148
2,0.2123,0.457554,0.822248,0.827765,0.82103,0.821061
3,0.1523,0.460517,0.830275,0.833014,0.829418,0.829622


[I 2025-03-24 21:26:01,883] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0009390221650144338, 'weight_decay': 0.005, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3213,0.513271,0.800459,0.820906,0.802749,0.798011
2,0.1778,0.426168,0.836009,0.83659,0.835596,0.835774
3,0.118,0.418929,0.853211,0.854082,0.852741,0.85296
4,0.0827,0.55133,0.84633,0.84675,0.845984,0.846148
5,0.0596,0.637369,0.834862,0.837107,0.834091,0.834318
6,0.0427,0.751797,0.837156,0.84027,0.836259,0.836482
7,0.0298,0.796772,0.83945,0.839418,0.839353,0.839381
8,0.0205,0.893315,0.838303,0.838646,0.837975,0.838124
9,0.014,1.021073,0.831422,0.832131,0.831849,0.831411
10,0.0094,1.166317,0.840596,0.841061,0.840227,0.840395


[I 2025-03-24 21:28:02,174] Trial 128 finished with value: 0.8403946153856283 and parameters: {'learning_rate': 0.0009390221650144338, 'weight_decay': 0.005, 'warmup_steps': 6}. Best is trial 71 with value: 0.8462906544170652.


Trial 129 with params: {'learning_rate': 0.0007810262825560762, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3279,0.498681,0.801606,0.82252,0.803917,0.79912
2,0.1862,0.419196,0.836009,0.836889,0.835512,0.835713
3,0.1259,0.428255,0.850917,0.851349,0.850573,0.850741
4,0.0911,0.533392,0.847477,0.848408,0.846984,0.847202
5,0.0663,0.651591,0.838303,0.840204,0.837596,0.837832
6,0.0495,0.709657,0.845183,0.846903,0.844521,0.84477
7,0.0361,0.759576,0.84633,0.84638,0.846489,0.846323
8,0.0252,0.857656,0.847477,0.847419,0.847447,0.847432
9,0.0181,0.989713,0.83945,0.840587,0.839985,0.839419
10,0.0126,1.09646,0.84289,0.842831,0.842858,0.842843


[I 2025-03-24 21:30:04,616] Trial 129 finished with value: 0.8428434051297162 and parameters: {'learning_rate': 0.0007810262825560762, 'weight_decay': 0.004, 'warmup_steps': 2}. Best is trial 71 with value: 0.8462906544170652.


Trial 130 with params: {'learning_rate': 0.0009492108503558592, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3172,0.492757,0.811927,0.828488,0.813968,0.810165
2,0.1767,0.439869,0.833716,0.83443,0.83326,0.833447
3,0.1178,0.434115,0.84633,0.847729,0.845731,0.845973
4,0.0824,0.535035,0.841743,0.843795,0.841016,0.841262
5,0.0582,0.639627,0.829128,0.831448,0.828334,0.828542
6,0.0415,0.729708,0.826835,0.830944,0.825787,0.825926


[I 2025-03-24 21:31:18,430] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.00054225110436658, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.341,0.438598,0.813073,0.820633,0.814463,0.81238
2,0.2041,0.446981,0.831422,0.835605,0.830376,0.830537
3,0.1447,0.442345,0.838303,0.838761,0.837933,0.838098
4,0.1095,0.55591,0.838303,0.842245,0.837301,0.837508
5,0.0836,0.610058,0.836009,0.837447,0.835386,0.835609
6,0.0659,0.662783,0.838303,0.839035,0.837848,0.838042
7,0.0515,0.714931,0.840596,0.840549,0.840648,0.840571
8,0.0404,0.772747,0.841743,0.841682,0.841732,0.841702
9,0.0317,0.859437,0.836009,0.837432,0.836606,0.835961
10,0.0249,0.922357,0.836009,0.835948,0.836017,0.835973


[I 2025-03-24 21:33:21,448] Trial 131 finished with value: 0.835972718244181 and parameters: {'learning_rate': 0.00054225110436658, 'weight_decay': 0.004, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 132 with params: {'learning_rate': 0.00021974483274299196, 'weight_decay': 0.007, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.39,0.423281,0.800459,0.80992,0.802033,0.799445
2,0.2591,0.445171,0.827982,0.832648,0.826871,0.826991
3,0.2041,0.483269,0.827982,0.834692,0.826661,0.826665
4,0.1698,0.565606,0.81422,0.824497,0.812558,0.812129
5,0.1441,0.510476,0.822248,0.822261,0.822083,0.822145
6,0.1242,0.583571,0.833716,0.835823,0.832965,0.833189


[I 2025-03-24 21:34:34,780] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0012818727493764648, 'weight_decay': 0.004, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3104,0.510472,0.808486,0.828225,0.810716,0.806283
2,0.1644,0.421594,0.84289,0.844373,0.842269,0.842507
3,0.1047,0.414595,0.848624,0.849835,0.848068,0.848305
4,0.0705,0.586742,0.848624,0.849648,0.84811,0.848336
5,0.0476,0.72056,0.831422,0.831987,0.831007,0.83118
6,0.0326,0.732114,0.83945,0.841242,0.838764,0.839002
7,0.0214,0.859376,0.850917,0.851952,0.850404,0.850634
8,0.0136,0.998358,0.840596,0.840757,0.840353,0.840465
9,0.0093,1.120497,0.832569,0.832568,0.832428,0.832481
10,0.0062,1.315097,0.83945,0.840264,0.838974,0.839175


[I 2025-03-24 21:36:43,336] Trial 133 finished with value: 0.8391754315705162 and parameters: {'learning_rate': 0.0012818727493764648, 'weight_decay': 0.004, 'warmup_steps': 9}. Best is trial 71 with value: 0.8462906544170652.


Trial 134 with params: {'learning_rate': 0.0006575546521331842, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3323,0.490697,0.803899,0.822616,0.806085,0.80174
2,0.1931,0.450438,0.831422,0.835965,0.830334,0.83048
3,0.1325,0.419888,0.850917,0.851054,0.850699,0.850804
4,0.098,0.543167,0.850917,0.853047,0.850194,0.850464
5,0.0734,0.615653,0.834862,0.837948,0.833965,0.834178
6,0.0562,0.74814,0.844037,0.846931,0.843184,0.843436
7,0.0427,0.751557,0.853211,0.853281,0.853035,0.853118
8,0.0317,0.834958,0.83945,0.839388,0.839438,0.839408
9,0.0237,0.894477,0.833716,0.834751,0.834228,0.833689
10,0.0177,0.999873,0.840596,0.840757,0.840353,0.840465


[I 2025-03-24 21:38:44,287] Trial 134 finished with value: 0.8404652001489946 and parameters: {'learning_rate': 0.0006575546521331842, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 135 with params: {'learning_rate': 0.0005697181756283703, 'weight_decay': 0.004, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3417,0.464063,0.808486,0.82009,0.810211,0.80728
2,0.2008,0.43259,0.831422,0.834617,0.830502,0.830699
3,0.1403,0.438269,0.841743,0.841947,0.841479,0.841602
4,0.1056,0.53514,0.840596,0.843307,0.839764,0.840005
5,0.0808,0.628915,0.837156,0.83969,0.836343,0.836575
6,0.0634,0.680079,0.838303,0.840204,0.837596,0.837832
7,0.0491,0.702015,0.840596,0.840577,0.84069,0.840579
8,0.0381,0.768698,0.83945,0.839395,0.839395,0.839395
9,0.0295,0.852168,0.831422,0.832453,0.831934,0.831395
10,0.023,0.925026,0.834862,0.834828,0.834765,0.834792


[I 2025-03-24 21:40:46,174] Trial 135 finished with value: 0.8347919901060443 and parameters: {'learning_rate': 0.0005697181756283703, 'weight_decay': 0.004, 'warmup_steps': 8}. Best is trial 71 with value: 0.8462906544170652.


Trial 136 with params: {'learning_rate': 0.003322474574291809, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2837,0.467756,0.832569,0.835323,0.833396,0.83242
2,0.1332,0.467483,0.838303,0.838546,0.838017,0.838148
3,0.076,0.458572,0.833716,0.83494,0.833133,0.833347
4,0.0432,0.765191,0.834862,0.835374,0.83447,0.83464
5,0.0263,0.889272,0.832569,0.832568,0.832428,0.832481
6,0.0165,0.9577,0.832569,0.83268,0.832344,0.832442


[I 2025-03-24 21:41:55,697] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0006780731234519245, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.331,0.4864,0.802752,0.821034,0.804917,0.800628
2,0.1912,0.447866,0.830275,0.834617,0.829208,0.829356
3,0.1309,0.41966,0.849771,0.850261,0.849404,0.849581
4,0.0966,0.546014,0.847477,0.849976,0.846689,0.846954
5,0.0721,0.621652,0.837156,0.839422,0.836385,0.836619
6,0.055,0.747159,0.83945,0.842292,0.838596,0.838831
7,0.0414,0.769472,0.847477,0.847512,0.84732,0.847389
8,0.0303,0.847798,0.841743,0.841682,0.841732,0.841702
9,0.0227,0.917414,0.837156,0.837948,0.837606,0.837142
10,0.0167,1.033232,0.834862,0.835055,0.834596,0.834715


[I 2025-03-24 21:43:55,525] Trial 137 finished with value: 0.8347154433019002 and parameters: {'learning_rate': 0.0006780731234519245, 'weight_decay': 0.004, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 138 with params: {'learning_rate': 0.0011727803737929055, 'weight_decay': 0.006, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3126,0.504265,0.81078,0.828368,0.812884,0.808877
2,0.1677,0.412037,0.837156,0.837552,0.836806,0.836963
3,0.1088,0.400948,0.850917,0.851137,0.850657,0.850785
4,0.0741,0.552984,0.850917,0.851477,0.85053,0.850716
5,0.0508,0.661891,0.838303,0.839035,0.837848,0.838042
6,0.0349,0.758973,0.834862,0.838584,0.833881,0.834077


[I 2025-03-24 21:45:06,804] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0009375024069308997, 'weight_decay': 0.004, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3207,0.516474,0.803899,0.822616,0.806085,0.80174
2,0.1779,0.432227,0.834862,0.835821,0.834344,0.834548
3,0.1178,0.417703,0.84633,0.846876,0.845942,0.846123
4,0.0827,0.566006,0.837156,0.838499,0.836554,0.836777
5,0.0593,0.648808,0.832569,0.835055,0.831755,0.831971
6,0.0427,0.732502,0.834862,0.837372,0.834049,0.834273


[I 2025-03-24 21:46:21,226] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.000412289255091539, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3538,0.421711,0.821101,0.825694,0.822177,0.820761
2,0.2195,0.460628,0.822248,0.828174,0.820988,0.820993
3,0.1599,0.467561,0.829128,0.832933,0.828124,0.828288


[I 2025-03-24 21:46:58,357] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0013566647975542167, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3054,0.506778,0.816514,0.833289,0.818557,0.814795
2,0.1625,0.436738,0.844037,0.845223,0.843479,0.843708
3,0.1035,0.433944,0.848624,0.850037,0.848026,0.848272
4,0.069,0.567593,0.838303,0.839366,0.837764,0.837979
5,0.0463,0.6952,0.832569,0.833073,0.832176,0.832343
6,0.0312,0.808658,0.833716,0.83558,0.833007,0.833231


[I 2025-03-24 21:48:10,944] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0003239693705202621, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3681,0.405328,0.819954,0.823679,0.820925,0.819696
2,0.2348,0.448332,0.821101,0.82721,0.81982,0.819803
3,0.176,0.466428,0.831422,0.83634,0.830292,0.830421


[I 2025-03-24 21:48:45,592] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0010896126526999258, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3148,0.509234,0.801606,0.817345,0.803623,0.799792
2,0.1716,0.428275,0.841743,0.842275,0.841353,0.84153
3,0.1125,0.40538,0.853211,0.853778,0.852825,0.853013
4,0.077,0.553835,0.850917,0.852564,0.850278,0.850537
5,0.0531,0.645721,0.826835,0.82784,0.826292,0.826488
6,0.0366,0.738069,0.834862,0.838925,0.833839,0.834023


[I 2025-03-24 21:49:59,617] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.0006264065678714925, 'weight_decay': 0.005, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3359,0.484575,0.808486,0.823185,0.810421,0.806901
2,0.1961,0.435538,0.832569,0.835334,0.831713,0.831924
3,0.1357,0.442765,0.840596,0.840684,0.840395,0.840485
4,0.1009,0.54056,0.83945,0.841482,0.838722,0.838962
5,0.0762,0.622756,0.834862,0.837372,0.834049,0.834273
6,0.0587,0.688798,0.845183,0.846903,0.844521,0.84477
7,0.0453,0.716902,0.845183,0.845124,0.845194,0.845149
8,0.0339,0.767867,0.840596,0.840552,0.840522,0.840536
9,0.0258,0.879521,0.834862,0.83599,0.835396,0.834831
10,0.0197,0.942625,0.840596,0.840582,0.840479,0.840521


[I 2025-03-24 21:51:57,826] Trial 144 finished with value: 0.8405206158234686 and parameters: {'learning_rate': 0.0006264065678714925, 'weight_decay': 0.005, 'warmup_steps': 5}. Best is trial 71 with value: 0.8462906544170652.


Trial 145 with params: {'learning_rate': 0.0004926143118156184, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3459,0.437017,0.807339,0.815484,0.80879,0.806542
2,0.2088,0.448784,0.836009,0.840632,0.834923,0.835093
3,0.1491,0.450866,0.833716,0.83494,0.833133,0.833347
4,0.1138,0.543607,0.838303,0.841586,0.837385,0.837609
5,0.0885,0.614419,0.836009,0.837892,0.835301,0.835531
6,0.0706,0.647967,0.845183,0.846104,0.844689,0.844904
7,0.0558,0.696063,0.841743,0.841749,0.841606,0.84166
8,0.0447,0.745337,0.834862,0.834806,0.834891,0.834831
9,0.0359,0.805711,0.837156,0.838289,0.83769,0.837125
10,0.0291,0.874914,0.833716,0.833654,0.833723,0.833679


[I 2025-03-24 21:54:00,870] Trial 145 finished with value: 0.8336786303874562 and parameters: {'learning_rate': 0.0004926143118156184, 'weight_decay': 0.005, 'warmup_steps': 1}. Best is trial 71 with value: 0.8462906544170652.


Trial 146 with params: {'learning_rate': 0.00098335850470726, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3155,0.524192,0.807339,0.825865,0.809506,0.805265
2,0.1754,0.424677,0.840596,0.841671,0.840059,0.840277
3,0.1162,0.415072,0.853211,0.854447,0.852656,0.852901
4,0.081,0.514069,0.856651,0.856929,0.856372,0.856514
5,0.0574,0.635502,0.832569,0.834309,0.831881,0.832102
6,0.0412,0.724292,0.840596,0.8436,0.839722,0.83996
7,0.029,0.752727,0.844037,0.844037,0.844153,0.844024
8,0.0193,0.926727,0.84633,0.846302,0.846236,0.846265
9,0.0132,1.040847,0.84289,0.843126,0.843153,0.84289
10,0.0089,1.172434,0.840596,0.840684,0.840395,0.840485


[I 2025-03-24 21:56:01,915] Trial 146 finished with value: 0.8404853557586645 and parameters: {'learning_rate': 0.00098335850470726, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 147 with params: {'learning_rate': 0.00019295969439747166, 'weight_decay': 0.003, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4009,0.417031,0.801606,0.808496,0.802949,0.800925
2,0.2682,0.457179,0.823394,0.82914,0.822156,0.822182
3,0.2144,0.485109,0.825688,0.833261,0.824282,0.824208
4,0.1815,0.553643,0.821101,0.828085,0.819736,0.819658
5,0.1557,0.498727,0.824541,0.824516,0.824419,0.824458
6,0.1362,0.581915,0.829128,0.830732,0.82846,0.828672


[I 2025-03-24 21:57:11,863] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0004708293252621487, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3481,0.441757,0.815367,0.822973,0.816757,0.814682
2,0.2116,0.458589,0.827982,0.833026,0.826829,0.826929
3,0.1528,0.460299,0.833716,0.83494,0.833133,0.833347
4,0.1172,0.542018,0.834862,0.837107,0.834091,0.834318
5,0.0919,0.59979,0.833716,0.835352,0.833049,0.833272
6,0.074,0.649204,0.84633,0.847173,0.845858,0.846068
7,0.0588,0.689745,0.840596,0.840536,0.840606,0.840561
8,0.0477,0.725697,0.833716,0.833795,0.833891,0.83371
9,0.0389,0.790143,0.827982,0.829095,0.828513,0.827949
10,0.032,0.85241,0.833716,0.833654,0.833723,0.833679


[I 2025-03-24 21:59:11,833] Trial 148 finished with value: 0.8336786303874562 and parameters: {'learning_rate': 0.0004708293252621487, 'weight_decay': 0.005, 'warmup_steps': 0}. Best is trial 71 with value: 0.8462906544170652.


Trial 149 with params: {'learning_rate': 0.0008753621830154515, 'weight_decay': 0.003, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3206,0.524938,0.802752,0.821784,0.804959,0.800532
2,0.1807,0.423844,0.837156,0.838126,0.836638,0.836846
3,0.1214,0.413718,0.849771,0.850261,0.849404,0.849581
4,0.0862,0.536922,0.848624,0.850037,0.848026,0.848272
5,0.0622,0.631359,0.83945,0.842292,0.838596,0.838831
6,0.0454,0.734328,0.831422,0.835605,0.830376,0.830537


[I 2025-03-24 22:00:25,901] Trial 149 pruned. 


In [25]:
print(best_trial_normal)

BestRun(run_id='71', objective=0.8462906544170652, hyperparameters={'learning_rate': 0.0006342434465905245, 'weight_decay': 0.005, 'warmup_steps': 1}, run_summary=None)


In [26]:
base.reset_seed()

## Prohledávání s destilací nad původním datasetem
Konfigurace jednotlivých tréninků.

In [27]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-distill-embedd_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-distill-embedd_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.


In [28]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [29]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [30]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM(),
)
  

Nastavení prohledávání.

In [31]:
best_trial_distill = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill-embedd",
    n_trials=150
)

[I 2025-03-25 09:28:19,571] A new study created in memory with name: Distill-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7191,1.487449,0.807339,0.81111,0.808327,0.807046
2,0.9273,1.317457,0.838303,0.839193,0.837806,0.838011
3,0.6619,1.419669,0.832569,0.837702,0.831418,0.831544


[I 2025-03-25 09:28:55,161] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 38, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1661,1.585032,0.798165,0.801545,0.799108,0.797893
2,1.321,1.481447,0.813073,0.813128,0.812863,0.812943
3,1.1124,1.412749,0.819954,0.819965,0.819788,0.81985
4,0.9663,1.56993,0.821101,0.828085,0.819736,0.819658
5,0.8453,1.406827,0.827982,0.827944,0.827882,0.827908
6,0.7637,1.435682,0.832569,0.833882,0.833144,0.832526


[I 2025-03-25 09:30:05,190] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 36, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5145,1.690814,0.78555,0.785964,0.785878,0.785548
2,1.562,1.585188,0.801606,0.801586,0.801686,0.801584
3,1.3561,1.513302,0.807339,0.807271,0.807317,0.80729


[I 2025-03-25 09:30:38,366] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0765,1.586931,0.799312,0.801034,0.799981,0.799217
2,1.2758,1.487513,0.811927,0.812368,0.811526,0.811673
3,1.0685,1.452018,0.823394,0.825532,0.822619,0.822812
4,0.9178,1.527867,0.827982,0.832648,0.826871,0.826991
5,0.7946,1.439833,0.822248,0.822575,0.822546,0.822247
6,0.7132,1.436842,0.831422,0.832131,0.831849,0.831411


[I 2025-03-25 09:31:46,392] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.346,1.388725,0.825688,0.831094,0.82685,0.825283
2,0.578,1.162006,0.84633,0.84675,0.845984,0.846148
3,0.3453,1.173077,0.84633,0.847173,0.845858,0.846068
4,0.2297,1.163229,0.857798,0.858526,0.857371,0.857582
5,0.1658,1.313861,0.852064,0.852561,0.851699,0.851877
6,0.1261,1.23595,0.853211,0.854447,0.852656,0.852901


[I 2025-03-25 09:32:54,787] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1915,1.231748,0.850917,0.852962,0.851625,0.850839
2,0.4395,1.064896,0.864679,0.865146,0.864339,0.864519
3,0.2368,1.10693,0.858945,0.859927,0.858455,0.858691
4,0.1556,1.202881,0.852064,0.853394,0.851488,0.851736
5,0.1067,1.240988,0.857798,0.857776,0.857708,0.857738
6,0.0793,1.209617,0.853211,0.853778,0.852825,0.853013
7,0.061,1.239619,0.858945,0.859227,0.858666,0.85881
8,0.0498,1.256427,0.852064,0.853016,0.851572,0.851797
9,0.0418,1.244365,0.863532,0.863532,0.863423,0.863467
10,0.0369,1.25635,0.858945,0.859133,0.858708,0.858829


[I 2025-03-25 09:34:48,800] Trial 5 finished with value: 0.8588289181174557 and parameters: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 5 with value: 0.8588289181174557.


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3257,1.671946,0.818807,0.831044,0.820557,0.817632
2,0.5905,1.219547,0.844037,0.843984,0.843984,0.843984
3,0.3532,1.21112,0.850917,0.851952,0.850404,0.850634
4,0.2346,1.231015,0.855505,0.85696,0.854909,0.855169
5,0.1698,1.285482,0.853211,0.853534,0.852909,0.85306
6,0.1278,1.307082,0.849771,0.851524,0.84911,0.849369


[I 2025-03-25 09:35:57,950] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.133,1.329103,0.83945,0.845458,0.840658,0.83904
2,0.4243,1.075793,0.849771,0.851524,0.84911,0.849369
3,0.2261,1.159989,0.850917,0.851349,0.850573,0.850741
4,0.1424,1.247464,0.847477,0.849707,0.846731,0.846994
5,0.0987,1.306211,0.837156,0.837445,0.836849,0.836988
6,0.0735,1.241427,0.848624,0.849835,0.848068,0.848305
7,0.0557,1.229618,0.852064,0.852698,0.851657,0.851852
8,0.0449,1.209165,0.862385,0.862407,0.862255,0.862313
9,0.0379,1.23258,0.852064,0.852561,0.851699,0.851877
10,0.0336,1.234316,0.848624,0.849319,0.848194,0.848393


[I 2025-03-25 09:37:52,001] Trial 7 finished with value: 0.8483933680001264 and parameters: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 5 with value: 0.8588289181174557.


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2105,1.617539,0.795872,0.798151,0.796645,0.795717
2,1.3762,1.516333,0.811927,0.811863,0.811863,0.811863
3,1.1828,1.457527,0.821101,0.821377,0.821377,0.821101
4,1.042,1.607412,0.81422,0.820587,0.812895,0.812798
5,0.9328,1.41842,0.827982,0.828472,0.827587,0.82775
6,0.845,1.494167,0.822248,0.825079,0.823093,0.822077


[I 2025-03-25 09:39:03,807] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 22, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.302,1.478051,0.831422,0.839348,0.832818,0.830797
2,0.5316,1.145829,0.855505,0.855732,0.855245,0.855376
3,0.3092,1.152417,0.856651,0.858454,0.855993,0.856269
4,0.2024,1.147699,0.858945,0.860113,0.858413,0.858662
5,0.1435,1.277138,0.857798,0.859059,0.857245,0.857498
6,0.1092,1.166008,0.861239,0.861636,0.860918,0.861085
7,0.0839,1.199558,0.862385,0.862537,0.862171,0.862281
8,0.0675,1.193429,0.860092,0.86024,0.859876,0.859986
9,0.0577,1.236497,0.860092,0.860092,0.860213,0.86008
10,0.0501,1.249335,0.858945,0.859054,0.85875,0.858847


[I 2025-03-25 09:41:04,788] Trial 9 finished with value: 0.8588467536569477 and parameters: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 22, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 9 with value: 0.8588467536569477.


Trial 10 with params: {'learning_rate': 0.0019688396221773483, 'weight_decay': 0.004, 'warmup_steps': 36, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.209,1.285063,0.845183,0.848472,0.846079,0.845012
2,0.4418,1.065803,0.854358,0.85532,0.853867,0.854095
3,0.2341,1.020119,0.852064,0.852025,0.851993,0.852008
4,0.1457,1.11898,0.861239,0.862231,0.86075,0.860988
5,0.1008,1.163995,0.865826,0.866024,0.865591,0.865715
6,0.072,1.179222,0.854358,0.854999,0.853951,0.854149
7,0.0566,1.175635,0.858945,0.859337,0.858624,0.858789
8,0.0461,1.128124,0.863532,0.863647,0.863339,0.863437
9,0.0388,1.154362,0.860092,0.86024,0.859876,0.859986
10,0.0346,1.147241,0.858945,0.859133,0.858708,0.858829


[I 2025-03-25 09:43:09,987] Trial 10 finished with value: 0.8588289181174557 and parameters: {'learning_rate': 0.0019688396221773483, 'weight_decay': 0.004, 'warmup_steps': 36, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 9 with value: 0.8588467536569477.


Trial 11 with params: {'learning_rate': 0.0010628916654939495, 'weight_decay': 0.003, 'warmup_steps': 24, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2908,1.630178,0.81422,0.822037,0.815631,0.813505
2,0.5341,1.118245,0.860092,0.86024,0.859876,0.859986
3,0.3031,1.08662,0.862385,0.862846,0.862044,0.862222
4,0.196,1.159331,0.865826,0.867029,0.865296,0.865557
5,0.1402,1.262465,0.863532,0.863647,0.863339,0.863437
6,0.1061,1.189186,0.862385,0.862625,0.862128,0.862263
7,0.0814,1.228851,0.861239,0.861184,0.861213,0.861197
8,0.0663,1.203016,0.863532,0.863582,0.863381,0.863453
9,0.0561,1.209288,0.862385,0.8625,0.862592,0.862382
10,0.0496,1.22555,0.860092,0.860071,0.860003,0.860032


[I 2025-03-25 09:45:10,404] Trial 11 finished with value: 0.8600321027287319 and parameters: {'learning_rate': 0.0010628916654939495, 'weight_decay': 0.003, 'warmup_steps': 24, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 11 with value: 0.8600321027287319.


Trial 12 with params: {'learning_rate': 0.0006655563257430926, 'weight_decay': 0.005, 'warmup_steps': 33, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.453,1.947164,0.808486,0.823185,0.810421,0.806901
2,0.6452,1.265363,0.84289,0.843977,0.842353,0.842575
3,0.3935,1.226893,0.854358,0.854462,0.854161,0.854256
4,0.2664,1.16731,0.860092,0.86068,0.859708,0.859903
5,0.1931,1.379,0.852064,0.852849,0.851614,0.851826
6,0.1483,1.256741,0.858945,0.860315,0.858371,0.858632
7,0.1187,1.286708,0.866972,0.866998,0.866843,0.866902
8,0.098,1.257353,0.868119,0.868087,0.868054,0.868069
9,0.082,1.289086,0.865826,0.865773,0.865802,0.865786
10,0.0725,1.313177,0.858945,0.859133,0.858708,0.858829


[I 2025-03-25 09:47:13,323] Trial 12 finished with value: 0.8588289181174557 and parameters: {'learning_rate': 0.0006655563257430926, 'weight_decay': 0.005, 'warmup_steps': 33, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 11 with value: 0.8600321027287319.


Trial 13 with params: {'learning_rate': 0.004593131171570443, 'weight_decay': 0.005, 'warmup_steps': 28, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1058,1.162847,0.856651,0.857243,0.857045,0.856647
2,0.368,1.136858,0.854358,0.855915,0.85374,0.854003
3,0.1936,1.150533,0.861239,0.862059,0.860792,0.861015
4,0.1218,1.239494,0.856651,0.8573,0.856245,0.856446
5,0.0813,1.189298,0.856651,0.857038,0.856329,0.856493
6,0.0594,1.165081,0.858945,0.859054,0.85875,0.858847


[I 2025-03-25 09:48:26,841] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0029303028816080995, 'weight_decay': 0.008, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1679,1.245638,0.841743,0.844003,0.84249,0.841642
2,0.3934,1.107263,0.860092,0.860991,0.859624,0.859853
3,0.2106,1.124962,0.857798,0.858526,0.857371,0.857582
4,0.1327,1.203723,0.858945,0.858909,0.858876,0.858891
5,0.0891,1.254983,0.848624,0.851879,0.847731,0.847997
6,0.0658,1.16189,0.857798,0.857873,0.857624,0.857708
7,0.0514,1.218243,0.850917,0.851236,0.850615,0.850764
8,0.041,1.230578,0.854358,0.854861,0.853993,0.854173
9,0.0347,1.233883,0.853211,0.853226,0.853077,0.853134
10,0.0307,1.241304,0.852064,0.852242,0.851825,0.851943


[I 2025-03-25 09:50:25,865] Trial 14 finished with value: 0.8519425238792828 and parameters: {'learning_rate': 0.0029303028816080995, 'weight_decay': 0.008, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 11 with value: 0.8600321027287319.


Trial 15 with params: {'learning_rate': 0.0007196916154021696, 'weight_decay': 0.0, 'warmup_steps': 22, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4131,1.988099,0.800459,0.819363,0.802665,0.798213
2,0.6287,1.224964,0.841743,0.841867,0.841521,0.841623
3,0.3802,1.198214,0.845183,0.845351,0.844942,0.845056


[I 2025-03-25 09:50:58,993] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0029766690529847615, 'weight_decay': 0.002, 'warmup_steps': 26, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1446,1.19379,0.857798,0.858197,0.858129,0.857797
2,0.3992,1.06575,0.863532,0.864724,0.863002,0.863259
3,0.2074,1.146379,0.857798,0.857944,0.857582,0.85769
4,0.1318,1.276287,0.854358,0.854631,0.854077,0.854218
5,0.0887,1.192018,0.862385,0.862366,0.862297,0.862327
6,0.0635,1.224685,0.857798,0.858526,0.857371,0.857582
7,0.0491,1.203908,0.863532,0.863823,0.863255,0.863401
8,0.04,1.17794,0.866972,0.866928,0.866928,0.866928
9,0.0339,1.193205,0.864679,0.86466,0.864591,0.864621
10,0.0304,1.196852,0.860092,0.860327,0.859834,0.859967


[I 2025-03-25 09:53:02,090] Trial 16 finished with value: 0.8599672505752209 and parameters: {'learning_rate': 0.0029766690529847615, 'weight_decay': 0.002, 'warmup_steps': 26, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 11 with value: 0.8600321027287319.


Trial 17 with params: {'learning_rate': 0.001474855607740676, 'weight_decay': 0.001, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2379,1.496565,0.830275,0.837938,0.831649,0.82967
2,0.4777,1.120193,0.854358,0.854311,0.854414,0.854335
3,0.2618,1.190464,0.860092,0.860547,0.85975,0.859926
4,0.1664,1.259232,0.854358,0.857216,0.85353,0.853818
5,0.1192,1.286251,0.858945,0.85889,0.858918,0.858903
6,0.0865,1.251712,0.853211,0.854257,0.852698,0.852932


[I 2025-03-25 09:54:08,653] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.004440229784088377, 'weight_decay': 0.001, 'warmup_steps': 36, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1231,1.20858,0.845183,0.84733,0.84591,0.845094
2,0.3748,1.269916,0.847477,0.850261,0.846647,0.846912
3,0.1986,1.193138,0.863532,0.863582,0.863381,0.863453
4,0.1243,1.23414,0.858945,0.858942,0.858834,0.858878
5,0.0831,1.299687,0.862385,0.86258,0.862634,0.862385
6,0.0608,1.179102,0.855505,0.855832,0.855203,0.855355


[I 2025-03-25 09:55:16,483] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.002925871769731485, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.134,1.322991,0.840596,0.844855,0.841616,0.840339
2,0.395,1.115392,0.858945,0.858909,0.858876,0.858891
3,0.2101,1.093306,0.864679,0.865431,0.864254,0.864473
4,0.1317,1.173088,0.864679,0.865977,0.864128,0.864394
5,0.088,1.179093,0.861239,0.861219,0.861339,0.861224
6,0.0643,1.177044,0.858945,0.860113,0.858413,0.858662
7,0.0492,1.11895,0.862385,0.86298,0.862002,0.8622
8,0.0402,1.096969,0.863532,0.863479,0.863507,0.863492
9,0.0341,1.124319,0.854358,0.854539,0.854119,0.854238
10,0.0305,1.124017,0.856651,0.857161,0.856287,0.85647


[I 2025-03-25 09:57:15,191] Trial 19 finished with value: 0.8564699778647737 and parameters: {'learning_rate': 0.002925871769731485, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 5.5}. Best is trial 11 with value: 0.8600321027287319.


Trial 20 with params: {'learning_rate': 0.00014766637242423952, 'weight_decay': 0.008, 'warmup_steps': 30, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9813,1.559447,0.793578,0.797861,0.794645,0.793185
2,1.185,1.490946,0.818807,0.820901,0.818031,0.81821
3,0.9412,1.423747,0.822248,0.82402,0.821535,0.82173


[I 2025-03-25 09:57:50,091] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.000570155485205159, 'weight_decay': 0.004, 'warmup_steps': 37, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5046,1.803734,0.798165,0.820077,0.800539,0.795475
2,0.6874,1.292786,0.84633,0.849557,0.845437,0.845694
3,0.4252,1.301143,0.852064,0.852104,0.851909,0.851978
4,0.2934,1.291986,0.854358,0.856389,0.853656,0.853933
5,0.2184,1.381769,0.84633,0.846876,0.845942,0.846123
6,0.1697,1.266261,0.864679,0.865026,0.864381,0.864539
7,0.1366,1.336474,0.861239,0.861902,0.860834,0.86104
8,0.1138,1.3109,0.863532,0.863532,0.863423,0.863467
9,0.0967,1.336215,0.870413,0.870567,0.870643,0.870411
10,0.086,1.345457,0.860092,0.860071,0.860003,0.860032


[I 2025-03-25 09:59:52,737] Trial 21 finished with value: 0.8600321027287319 and parameters: {'learning_rate': 0.000570155485205159, 'weight_decay': 0.004, 'warmup_steps': 37, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 11 with value: 0.8600321027287319.


Trial 22 with params: {'learning_rate': 0.0005910964729173038, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4693,1.687365,0.803899,0.817765,0.805791,0.802358
2,0.6716,1.269735,0.844037,0.846641,0.843226,0.84348
3,0.4195,1.243608,0.856651,0.857808,0.856119,0.856364
4,0.2897,1.265061,0.852064,0.85586,0.851109,0.851384
5,0.2114,1.359017,0.847477,0.848408,0.846984,0.847202
6,0.1642,1.281235,0.862385,0.86411,0.86175,0.862034
7,0.1319,1.281788,0.870413,0.870567,0.870643,0.870411
8,0.1101,1.264433,0.866972,0.866998,0.866843,0.866902
9,0.093,1.290722,0.865826,0.865906,0.866012,0.865821
10,0.0826,1.303448,0.864679,0.86476,0.864507,0.864593


[I 2025-03-25 10:01:49,005] Trial 22 finished with value: 0.8645927095670483 and parameters: {'learning_rate': 0.0005910964729173038, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 22 with value: 0.8645927095670483.


Trial 23 with params: {'learning_rate': 0.0007192557770680521, 'weight_decay': 0.006, 'warmup_steps': 40, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4406,1.529059,0.817661,0.82628,0.819136,0.816878
2,0.6097,1.224113,0.84633,0.847529,0.845773,0.846006
3,0.3672,1.154062,0.853211,0.853281,0.853035,0.853118
4,0.2465,1.178415,0.855505,0.85696,0.854909,0.855169
5,0.18,1.255798,0.849771,0.850892,0.849236,0.84947
6,0.1383,1.209365,0.858945,0.859601,0.85854,0.858743
7,0.1096,1.23158,0.861239,0.861761,0.860876,0.861063
8,0.0903,1.233914,0.855505,0.855577,0.85533,0.855413
9,0.0759,1.241742,0.862385,0.862351,0.862465,0.862367
10,0.0672,1.252813,0.862385,0.862464,0.862213,0.862298


[I 2025-03-25 10:03:50,032] Trial 23 finished with value: 0.8622976707461507 and parameters: {'learning_rate': 0.0007192557770680521, 'weight_decay': 0.006, 'warmup_steps': 40, 'lambda_param': 0.1, 'temperature': 2.5}. Best is trial 22 with value: 0.8645927095670483.


Trial 24 with params: {'learning_rate': 0.0016630677666141112, 'weight_decay': 0.007, 'warmup_steps': 30, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2316,1.393172,0.84633,0.851208,0.847415,0.846038
2,0.4518,1.065352,0.860092,0.860429,0.859792,0.859947
3,0.2415,1.088756,0.850917,0.851137,0.850657,0.850785


[I 2025-03-25 10:04:37,367] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0012116203017432747, 'weight_decay': 0.01, 'warmup_steps': 42, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3304,1.408046,0.830275,0.834972,0.831355,0.829952
2,0.5158,1.124574,0.854358,0.854739,0.854035,0.854197
3,0.2864,1.128112,0.863532,0.864535,0.863044,0.863286
4,0.1842,1.235572,0.856651,0.860169,0.85574,0.856036
5,0.1341,1.285968,0.852064,0.852561,0.851699,0.851877
6,0.0988,1.240362,0.852064,0.853394,0.851488,0.851736


[I 2025-03-25 10:05:50,135] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.00020485077903130487, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8439,1.499304,0.794725,0.800277,0.79594,0.794177
2,1.0558,1.339885,0.831422,0.831501,0.831597,0.831416
3,0.796,1.464419,0.819954,0.826684,0.818609,0.818539


[I 2025-03-25 10:06:26,348] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.000510433379457142, 'weight_decay': 0.006, 'warmup_steps': 41, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5358,1.776279,0.799312,0.820085,0.801623,0.796797
2,0.7272,1.330848,0.83945,0.843238,0.838469,0.838686
3,0.4605,1.298356,0.850917,0.851779,0.850446,0.850663
4,0.3232,1.49074,0.832569,0.841869,0.831039,0.830924
5,0.2424,1.380109,0.847477,0.847842,0.847152,0.847308
6,0.19,1.354743,0.856651,0.859241,0.855866,0.856159
7,0.1547,1.335837,0.857798,0.858247,0.857456,0.85763
8,0.1283,1.37922,0.852064,0.85244,0.851741,0.8519
9,0.1096,1.376972,0.858945,0.859025,0.859129,0.85894
10,0.0981,1.384685,0.861239,0.861286,0.861087,0.861158


[I 2025-03-25 10:08:22,391] Trial 27 finished with value: 0.8611580079032244 and parameters: {'learning_rate': 0.000510433379457142, 'weight_decay': 0.006, 'warmup_steps': 41, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 22 with value: 0.8645927095670483.


Trial 28 with params: {'learning_rate': 0.00024450764704243307, 'weight_decay': 0.008, 'warmup_steps': 40, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7886,1.545873,0.801606,0.808936,0.802991,0.80087
2,0.9967,1.37543,0.829128,0.829825,0.828671,0.828853
3,0.7179,1.439868,0.826835,0.83245,0.825619,0.825679
4,0.5625,1.404507,0.831422,0.831568,0.831176,0.831283
5,0.4527,1.460303,0.831422,0.832128,0.830965,0.83115
6,0.3777,1.522418,0.844037,0.845422,0.843437,0.843674


[I 2025-03-25 10:09:29,822] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0004414683896288992, 'weight_decay': 0.006, 'warmup_steps': 42, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5948,1.508546,0.807339,0.815014,0.808748,0.806598
2,0.7835,1.333796,0.825688,0.828661,0.824787,0.824966
3,0.5085,1.411036,0.848624,0.852901,0.847605,0.847855


[I 2025-03-25 10:10:05,191] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0002942834823169273, 'weight_decay': 0.001, 'warmup_steps': 41, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7105,1.489297,0.816514,0.819742,0.81742,0.816296
2,0.9126,1.381091,0.830275,0.833611,0.829334,0.829521
3,0.643,1.415557,0.830275,0.8366,0.828997,0.829044


[I 2025-03-25 10:10:40,573] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0008391756838229455, 'weight_decay': 0.007, 'warmup_steps': 43, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4094,1.759565,0.811927,0.827104,0.813884,0.81033
2,0.5856,1.196609,0.852064,0.853834,0.851404,0.851669
3,0.3464,1.167599,0.855505,0.855446,0.855498,0.855467
4,0.2282,1.170598,0.861239,0.861636,0.860918,0.861085
5,0.1631,1.236708,0.860092,0.86068,0.859708,0.859903
6,0.126,1.264704,0.852064,0.853606,0.851446,0.851704


[I 2025-03-25 10:11:52,723] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.0003533267074099703, 'weight_decay': 0.007, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6172,1.535743,0.818807,0.821758,0.819672,0.81862
2,0.852,1.361244,0.841743,0.843326,0.8411,0.841339
3,0.5754,1.391872,0.83945,0.845548,0.838217,0.838347


[I 2025-03-25 10:12:30,285] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0003595048395645195, 'weight_decay': 0.005, 'warmup_steps': 34, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6571,1.444849,0.823394,0.825823,0.824177,0.823261
2,0.8482,1.35643,0.838303,0.838761,0.837933,0.838098
3,0.5613,1.373124,0.840596,0.843909,0.83968,0.839912
4,0.4179,1.493342,0.84289,0.848021,0.841763,0.841957
5,0.3227,1.444126,0.84289,0.843493,0.842479,0.842665
6,0.2596,1.400163,0.849771,0.850397,0.849362,0.849555


[I 2025-03-25 10:13:42,498] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0013159291558804682, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2649,1.431711,0.832569,0.841269,0.834028,0.831875
2,0.4944,1.073533,0.858945,0.858909,0.858876,0.858891
3,0.2756,1.089109,0.862385,0.862846,0.862044,0.862222
4,0.1759,1.088315,0.868119,0.869334,0.867591,0.867855
5,0.126,1.184484,0.864679,0.864633,0.864633,0.864633
6,0.093,1.115705,0.860092,0.86117,0.859582,0.859826
7,0.071,1.147553,0.858945,0.859927,0.858455,0.858691
8,0.0574,1.15104,0.854358,0.854999,0.853951,0.854149
9,0.0483,1.139,0.860092,0.860112,0.85996,0.860018
10,0.0427,1.160258,0.860092,0.860327,0.859834,0.859967


[I 2025-03-25 10:15:46,114] Trial 34 finished with value: 0.8599672505752209 and parameters: {'learning_rate': 0.0013159291558804682, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 22 with value: 0.8645927095670483.


Trial 35 with params: {'learning_rate': 5.817102176211476e-05, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4474,1.682428,0.787844,0.787921,0.788004,0.787837
2,1.537,1.579192,0.799312,0.799335,0.799434,0.799299
3,1.3374,1.521754,0.805046,0.805585,0.804601,0.804749
4,1.2172,1.552848,0.816514,0.819375,0.81561,0.815754
5,1.1197,1.448864,0.81422,0.814162,0.814242,0.814185
6,1.0543,1.457093,0.821101,0.821871,0.821546,0.821086


[I 2025-03-25 10:16:57,788] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.004198014799920335, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1026,1.12253,0.84289,0.843943,0.843405,0.842865
2,0.3725,1.097568,0.860092,0.860991,0.859624,0.859853
3,0.1975,1.180539,0.848624,0.851569,0.847773,0.848041


[I 2025-03-25 10:17:33,293] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0003114584293983801, 'weight_decay': 0.002, 'warmup_steps': 9, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6395,1.463161,0.815367,0.817882,0.816168,0.815215
2,0.9017,1.383696,0.831422,0.834033,0.830586,0.830797
3,0.6267,1.40877,0.831422,0.835965,0.830334,0.83048
4,0.4756,1.465347,0.823394,0.826056,0.822535,0.822715
5,0.3709,1.443703,0.844037,0.844451,0.843689,0.843852
6,0.3009,1.43358,0.848624,0.848937,0.84832,0.848468


[I 2025-03-25 10:18:47,680] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00015181932061058664, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9325,1.530375,0.795872,0.798665,0.796729,0.795661
2,1.1822,1.39761,0.823394,0.823394,0.823503,0.82338
3,0.9406,1.409881,0.826835,0.828421,0.826166,0.826372
4,0.7896,1.548462,0.825688,0.830313,0.824577,0.824684
5,0.6648,1.43825,0.832569,0.832758,0.832807,0.832568
6,0.5814,1.407892,0.826835,0.826816,0.826924,0.826816


[I 2025-03-25 10:19:59,176] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0004903944054956613, 'weight_decay': 0.007, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5235,1.593986,0.800459,0.81044,0.802075,0.799378
2,0.7325,1.308784,0.841743,0.845231,0.840806,0.84104
3,0.4673,1.353464,0.853211,0.858332,0.852109,0.852365
4,0.3311,1.445836,0.838303,0.845504,0.836964,0.837032
5,0.2506,1.350396,0.848624,0.849476,0.848152,0.848365
6,0.1954,1.29257,0.860092,0.860991,0.859624,0.859853
7,0.1585,1.322394,0.864679,0.864922,0.864423,0.864558
8,0.1319,1.36297,0.865826,0.865943,0.865633,0.865732
9,0.1128,1.381994,0.862385,0.8625,0.862592,0.862382
10,0.101,1.385433,0.860092,0.860169,0.859918,0.860003


[I 2025-03-25 10:22:00,183] Trial 39 finished with value: 0.8600026319252534 and parameters: {'learning_rate': 0.0004903944054956613, 'weight_decay': 0.007, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 2.5}. Best is trial 22 with value: 0.8645927095670483.


Trial 40 with params: {'learning_rate': 0.0010607423900408743, 'weight_decay': 0.001, 'warmup_steps': 26, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3138,1.643073,0.819954,0.830763,0.821598,0.818948
2,0.5283,1.135657,0.856651,0.857038,0.856329,0.856493
3,0.301,1.148893,0.854358,0.856649,0.853614,0.853896
4,0.1936,1.199533,0.848624,0.850254,0.847983,0.848237
5,0.1389,1.236363,0.860092,0.86024,0.859876,0.859986
6,0.1054,1.197906,0.855505,0.857673,0.854782,0.855065
7,0.0818,1.195352,0.870413,0.870356,0.870432,0.870384
8,0.0665,1.190549,0.864679,0.864922,0.864423,0.864558
9,0.0559,1.196056,0.861239,0.861261,0.861381,0.86123
10,0.0492,1.209475,0.862385,0.862625,0.862128,0.862263


[I 2025-03-25 10:24:04,208] Trial 40 finished with value: 0.8622628694182501 and parameters: {'learning_rate': 0.0010607423900408743, 'weight_decay': 0.001, 'warmup_steps': 26, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 22 with value: 0.8645927095670483.


Trial 41 with params: {'learning_rate': 0.0011926524546854275, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2829,1.434598,0.830275,0.837468,0.831607,0.829715
2,0.5073,1.106325,0.861239,0.861203,0.861171,0.861186
3,0.2873,1.081651,0.853211,0.854082,0.852741,0.85296
4,0.1821,1.201975,0.857798,0.860847,0.85695,0.857251
5,0.1306,1.255848,0.850917,0.851477,0.85053,0.850716
6,0.0972,1.186389,0.852064,0.853394,0.851488,0.851736


[I 2025-03-25 10:25:16,030] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.001526200511034575, 'weight_decay': 0.0, 'warmup_steps': 42, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2851,1.407276,0.832569,0.838067,0.833733,0.832179
2,0.4669,1.08716,0.857798,0.858131,0.857498,0.857651
3,0.256,1.120433,0.860092,0.86068,0.859708,0.859903
4,0.1625,1.176853,0.861239,0.862418,0.860708,0.86096
5,0.1146,1.277381,0.853211,0.854873,0.852572,0.852836
6,0.0838,1.165413,0.856651,0.856929,0.856372,0.856514
7,0.0637,1.195072,0.862385,0.862339,0.862339,0.862339
8,0.0516,1.228328,0.861239,0.861761,0.860876,0.861063
9,0.0443,1.205102,0.860092,0.860038,0.860129,0.860065
10,0.0392,1.2264,0.860092,0.86024,0.859876,0.859986


[I 2025-03-25 10:27:24,867] Trial 42 finished with value: 0.859985680592992 and parameters: {'learning_rate': 0.001526200511034575, 'weight_decay': 0.0, 'warmup_steps': 42, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 22 with value: 0.8645927095670483.


Trial 43 with params: {'learning_rate': 0.000681158103930191, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4404,1.734072,0.808486,0.822533,0.810379,0.806981
2,0.632,1.28621,0.844037,0.846108,0.843311,0.843563
3,0.389,1.208938,0.856651,0.857038,0.856329,0.856493
4,0.2626,1.253121,0.853211,0.855109,0.85253,0.852801
5,0.1917,1.311263,0.852064,0.852698,0.851657,0.851852
6,0.1479,1.313612,0.858945,0.861013,0.858245,0.858534


[I 2025-03-25 10:28:37,069] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0014691315499909523, 'weight_decay': 0.009000000000000001, 'warmup_steps': 39, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2632,1.641811,0.832569,0.842327,0.834112,0.831772
2,0.4771,1.080532,0.858945,0.859756,0.858498,0.858717
3,0.265,1.177676,0.853211,0.854652,0.852614,0.85287
4,0.1685,1.245815,0.84633,0.847945,0.845689,0.845938
5,0.1203,1.300976,0.856651,0.8573,0.856245,0.856446
6,0.088,1.296175,0.833716,0.834586,0.833218,0.833416


[I 2025-03-25 10:29:48,584] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0003929309505239134, 'weight_decay': 0.006, 'warmup_steps': 40, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6504,1.471924,0.821101,0.824068,0.821967,0.820916
2,0.8188,1.327414,0.832569,0.834089,0.831923,0.832141
3,0.541,1.396423,0.845183,0.85119,0.843974,0.84415


[I 2025-03-25 10:30:25,336] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.001047564697535078, 'weight_decay': 0.001, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2993,1.613459,0.818807,0.827204,0.820262,0.818057
2,0.5347,1.116593,0.852064,0.852334,0.851783,0.851922
3,0.3059,1.132457,0.849771,0.851298,0.849152,0.849404
4,0.1993,1.172441,0.855505,0.85742,0.854824,0.855101
5,0.1406,1.225773,0.863532,0.863935,0.863213,0.863381
6,0.1064,1.191756,0.862385,0.862846,0.862044,0.862222
7,0.0825,1.203893,0.863532,0.863498,0.863465,0.86348
8,0.0669,1.200439,0.865826,0.865943,0.865633,0.865732
9,0.0562,1.194099,0.868119,0.868067,0.868096,0.86808
10,0.0495,1.220572,0.858945,0.859227,0.858666,0.85881


[I 2025-03-25 10:32:28,478] Trial 46 finished with value: 0.8588095911960034 and parameters: {'learning_rate': 0.001047564697535078, 'weight_decay': 0.001, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 2.5}. Best is trial 22 with value: 0.8645927095670483.


Trial 47 with params: {'learning_rate': 5.232252858049981e-05, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5125,1.690793,0.783257,0.783783,0.783626,0.78325
2,1.5982,1.604299,0.795872,0.795872,0.795971,0.795854
3,1.3742,1.544071,0.806193,0.806675,0.80577,0.805915


[I 2025-03-25 10:33:11,042] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.00025918133949346, 'weight_decay': 0.002, 'warmup_steps': 29, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7292,1.561744,0.801606,0.80986,0.803075,0.800754
2,0.9686,1.32149,0.841743,0.841749,0.841606,0.84166
3,0.6892,1.452979,0.824541,0.831847,0.823156,0.823089


[I 2025-03-25 10:33:54,940] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0006768623616635087, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4055,1.551851,0.819954,0.824716,0.821051,0.819593
2,0.6395,1.283686,0.83945,0.842907,0.838511,0.838736
3,0.3904,1.264448,0.840596,0.840945,0.840269,0.84042
4,0.2672,1.301868,0.854358,0.858541,0.853362,0.853642
5,0.1972,1.34408,0.855505,0.856224,0.855077,0.855285
6,0.1521,1.273173,0.860092,0.860991,0.859624,0.859853
7,0.1214,1.352757,0.868119,0.868122,0.868012,0.868057
8,0.1005,1.303294,0.858945,0.858909,0.858876,0.858891
9,0.0844,1.320126,0.855505,0.855505,0.855624,0.855492
10,0.0746,1.352484,0.860092,0.860169,0.859918,0.860003


[I 2025-03-25 10:36:02,336] Trial 49 finished with value: 0.8600026319252534 and parameters: {'learning_rate': 0.0006768623616635087, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 22 with value: 0.8645927095670483.


Trial 50 with params: {'learning_rate': 0.0021133792752108674, 'weight_decay': 0.005, 'warmup_steps': 20, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1732,1.231526,0.853211,0.857058,0.854172,0.853013
2,0.4262,1.079449,0.852064,0.853394,0.851488,0.851736
3,0.2268,1.120978,0.857798,0.857764,0.857877,0.857779
4,0.1454,1.149828,0.856651,0.857808,0.856119,0.856364
5,0.1013,1.229678,0.84633,0.846876,0.845942,0.846123
6,0.0726,1.20863,0.850917,0.851137,0.850657,0.850785


[I 2025-03-25 10:37:14,479] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0010475249065763457, 'weight_decay': 0.004, 'warmup_steps': 36, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3602,1.535334,0.826835,0.83672,0.828397,0.825983
2,0.5448,1.143428,0.855505,0.855948,0.855161,0.855333
3,0.3103,1.143998,0.864679,0.865026,0.864381,0.864539
4,0.1983,1.219204,0.863532,0.865149,0.862918,0.863199
5,0.1441,1.319381,0.849771,0.850892,0.849236,0.84947
6,0.1091,1.23625,0.856651,0.857623,0.856161,0.856393


[I 2025-03-25 10:38:31,730] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0003581357483238571, 'weight_decay': 0.004, 'warmup_steps': 28, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6428,1.469377,0.819954,0.823059,0.820841,0.819755
2,0.8467,1.358636,0.841743,0.841947,0.841479,0.841602
3,0.5687,1.445142,0.832569,0.839394,0.831249,0.831287
4,0.4205,1.497609,0.836009,0.841408,0.834838,0.834976
5,0.323,1.468447,0.844037,0.844245,0.843774,0.843898
6,0.2599,1.386656,0.850917,0.850858,0.850909,0.850879


[I 2025-03-25 10:39:43,903] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.000961291185728391, 'weight_decay': 0.005, 'warmup_steps': 42, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3617,1.71434,0.811927,0.825787,0.8138,0.810487
2,0.5577,1.145814,0.849771,0.849945,0.849531,0.849647
3,0.3245,1.132453,0.849771,0.850036,0.849489,0.849626


[I 2025-03-25 10:40:21,136] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.00036695283889299476, 'weight_decay': 0.003, 'warmup_steps': 36, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6452,1.470928,0.819954,0.822496,0.820757,0.819806
2,0.8427,1.319845,0.840596,0.841671,0.840059,0.840277
3,0.5658,1.395492,0.847477,0.850878,0.846563,0.846823
4,0.4175,1.539338,0.844037,0.851147,0.842721,0.842843
5,0.3249,1.422553,0.847477,0.847436,0.847405,0.847419
6,0.259,1.374062,0.858945,0.85889,0.858918,0.858903
7,0.2152,1.387536,0.860092,0.860206,0.860297,0.860089
8,0.1816,1.394395,0.856651,0.85711,0.857003,0.85665
9,0.159,1.427459,0.855505,0.855902,0.855835,0.855504
10,0.1426,1.421978,0.858945,0.858968,0.859087,0.858936


[I 2025-03-25 10:42:22,032] Trial 54 finished with value: 0.858935863796879 and parameters: {'learning_rate': 0.00036695283889299476, 'weight_decay': 0.003, 'warmup_steps': 36, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 22 with value: 0.8645927095670483.


Trial 55 with params: {'learning_rate': 7.242888062473813e-05, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3293,1.642945,0.786697,0.787059,0.787004,0.786696
2,1.4522,1.529267,0.809633,0.809569,0.809569,0.809569
3,1.2476,1.471457,0.815367,0.81539,0.815494,0.815355


[I 2025-03-25 10:42:57,742] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0006896980792033153, 'weight_decay': 0.005, 'warmup_steps': 43, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4593,1.694799,0.811927,0.827104,0.813884,0.81033
2,0.6341,1.297933,0.847477,0.851209,0.846521,0.846776
3,0.3848,1.199659,0.849771,0.849945,0.849531,0.849647
4,0.2595,1.195989,0.856651,0.857808,0.856119,0.856364
5,0.1889,1.312868,0.853211,0.854257,0.852698,0.852932
6,0.1444,1.296351,0.860092,0.861365,0.859539,0.859797
7,0.1155,1.264901,0.868119,0.868122,0.868012,0.868057
8,0.0949,1.237132,0.868119,0.868142,0.868264,0.868111
9,0.0798,1.292241,0.868119,0.868466,0.868433,0.868119
10,0.071,1.273618,0.866972,0.866955,0.866886,0.866916


[I 2025-03-25 10:44:57,882] Trial 56 finished with value: 0.8669157698076467 and parameters: {'learning_rate': 0.0006896980792033153, 'weight_decay': 0.005, 'warmup_steps': 43, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 56 with value: 0.8669157698076467.


Trial 57 with params: {'learning_rate': 0.000702115186243623, 'weight_decay': 0.004, 'warmup_steps': 41, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4497,1.680315,0.81078,0.819243,0.812253,0.809968
2,0.63,1.239782,0.845183,0.847943,0.844353,0.844609
3,0.3811,1.172156,0.856651,0.857161,0.856287,0.85647
4,0.2559,1.192178,0.861239,0.862418,0.860708,0.86096
5,0.184,1.298561,0.854358,0.854539,0.854119,0.854238
6,0.1426,1.22481,0.858945,0.860113,0.858413,0.858662
7,0.1131,1.289667,0.864679,0.864633,0.864633,0.864633
8,0.0928,1.24301,0.864679,0.864834,0.864465,0.864576
9,0.0787,1.265877,0.862385,0.862351,0.862465,0.862367
10,0.0691,1.280811,0.864679,0.86476,0.864507,0.864593


[I 2025-03-25 10:47:00,836] Trial 57 finished with value: 0.8645927095670483 and parameters: {'learning_rate': 0.000702115186243623, 'weight_decay': 0.004, 'warmup_steps': 41, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 56 with value: 0.8669157698076467.


Trial 58 with params: {'learning_rate': 0.00145615931479987, 'weight_decay': 0.005, 'warmup_steps': 38, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2742,1.489271,0.825688,0.836376,0.827313,0.824744
2,0.4762,1.086775,0.855505,0.85547,0.855582,0.855486
3,0.2604,1.100013,0.855505,0.855832,0.855203,0.855355
4,0.1643,1.31024,0.850917,0.854871,0.849941,0.850208
5,0.1178,1.306273,0.850917,0.851952,0.850404,0.850634
6,0.0854,1.252842,0.853211,0.85335,0.852993,0.8531


[I 2025-03-25 10:48:19,077] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0006641224855938141, 'weight_decay': 0.006, 'warmup_steps': 41, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4629,1.632026,0.809633,0.821566,0.811379,0.808398
2,0.6437,1.333774,0.845183,0.849589,0.844142,0.844371
3,0.3919,1.235958,0.857798,0.858247,0.857456,0.85763
4,0.2659,1.254489,0.847477,0.850261,0.846647,0.846912
5,0.1936,1.342346,0.852064,0.852849,0.851614,0.851826
6,0.1491,1.206221,0.865826,0.866024,0.865591,0.865715
7,0.1189,1.285791,0.863532,0.863727,0.863297,0.86342
8,0.0983,1.241737,0.865826,0.865769,0.865844,0.865796
9,0.0833,1.265845,0.865826,0.865906,0.866012,0.865821
10,0.0737,1.281375,0.862385,0.862366,0.862297,0.862327


[I 2025-03-25 10:50:44,251] Trial 59 finished with value: 0.8623266584217035 and parameters: {'learning_rate': 0.0006641224855938141, 'weight_decay': 0.006, 'warmup_steps': 41, 'lambda_param': 0.0, 'temperature': 4.0}. Best is trial 56 with value: 0.8669157698076467.


Trial 60 with params: {'learning_rate': 0.00046762991988506683, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.496,1.633931,0.817661,0.822768,0.818799,0.817257
2,0.7559,1.385098,0.83945,0.843238,0.838469,0.838686
3,0.4863,1.402433,0.847477,0.853533,0.846268,0.846459


[I 2025-03-25 10:51:19,974] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0013129789774030292, 'weight_decay': 0.004, 'warmup_steps': 40, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3046,1.337303,0.831422,0.835591,0.832439,0.83115
2,0.5025,1.091465,0.861239,0.861351,0.861044,0.861142
3,0.2803,1.062623,0.870413,0.870831,0.870096,0.870269
4,0.1792,1.144639,0.863532,0.866489,0.862707,0.863026
5,0.1287,1.224239,0.862385,0.863294,0.861918,0.86215
6,0.0932,1.172242,0.858945,0.860764,0.858287,0.858568
7,0.0724,1.225205,0.864679,0.865026,0.864381,0.864539
8,0.0581,1.237237,0.860092,0.860828,0.859666,0.859879
9,0.049,1.2341,0.857798,0.857944,0.857582,0.85769
10,0.0431,1.260789,0.856651,0.857161,0.856287,0.85647


[I 2025-03-25 10:53:20,888] Trial 61 finished with value: 0.8564699778647737 and parameters: {'learning_rate': 0.0013129789774030292, 'weight_decay': 0.004, 'warmup_steps': 40, 'lambda_param': 0.0, 'temperature': 3.0}. Best is trial 56 with value: 0.8669157698076467.


Trial 62 with params: {'learning_rate': 0.0004646507912177721, 'weight_decay': 0.006, 'warmup_steps': 42, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5728,1.503768,0.817661,0.821365,0.818631,0.817399
2,0.7556,1.292336,0.844037,0.844097,0.843858,0.843937
3,0.4834,1.409832,0.844037,0.848609,0.842974,0.843192


[I 2025-03-25 10:53:55,505] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0008679355207893437, 'weight_decay': 0.003, 'warmup_steps': 38, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3736,1.595242,0.822248,0.832561,0.823851,0.821315
2,0.5808,1.166787,0.848624,0.849476,0.848152,0.848365
3,0.3408,1.128066,0.864679,0.864729,0.864844,0.864672
4,0.2276,1.190876,0.854358,0.855915,0.85374,0.854003
5,0.1608,1.294066,0.861239,0.862418,0.860708,0.86096
6,0.1225,1.225054,0.861239,0.861902,0.860834,0.86104
7,0.0959,1.205463,0.87156,0.871589,0.871432,0.871492
8,0.0779,1.225604,0.864679,0.864622,0.864675,0.864644
9,0.0662,1.254665,0.865826,0.865906,0.866012,0.865821
10,0.0576,1.267328,0.863532,0.863647,0.863339,0.863437


[I 2025-03-25 10:55:56,956] Trial 63 finished with value: 0.8634371031315184 and parameters: {'learning_rate': 0.0008679355207893437, 'weight_decay': 0.003, 'warmup_steps': 38, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 56 with value: 0.8669157698076467.


Trial 64 with params: {'learning_rate': 0.0005917558885545052, 'weight_decay': 0.003, 'warmup_steps': 36, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.487,1.749889,0.801606,0.816674,0.80358,0.799879
2,0.6745,1.355311,0.83945,0.843238,0.838469,0.838686
3,0.4205,1.271314,0.849771,0.850712,0.849278,0.8495


[I 2025-03-25 10:56:30,185] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0009935611886988007, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3254,1.43285,0.824541,0.83233,0.825935,0.823891
2,0.5484,1.186259,0.848624,0.848569,0.848657,0.848595
3,0.3187,1.145369,0.857798,0.858131,0.857498,0.857651
4,0.2063,1.2485,0.854358,0.856649,0.853614,0.853896
5,0.1487,1.298908,0.858945,0.859756,0.858498,0.858717
6,0.1111,1.251496,0.857798,0.858866,0.857287,0.857528
7,0.0874,1.262101,0.861239,0.861525,0.86096,0.861105
8,0.0707,1.232935,0.861239,0.861184,0.861213,0.861197
9,0.0593,1.243412,0.863532,0.863486,0.863591,0.86351
10,0.0524,1.247737,0.862385,0.862464,0.862213,0.862298


[I 2025-03-25 10:58:34,416] Trial 65 finished with value: 0.8622976707461507 and parameters: {'learning_rate': 0.0009935611886988007, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}. Best is trial 56 with value: 0.8669157698076467.


Trial 66 with params: {'learning_rate': 0.00040967595721905713, 'weight_decay': 0.009000000000000001, 'warmup_steps': 21, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5707,1.552286,0.815367,0.82083,0.816547,0.814917
2,0.8005,1.325182,0.836009,0.837662,0.835344,0.835571
3,0.5243,1.511797,0.840596,0.849356,0.839132,0.839139


[I 2025-03-25 10:59:18,236] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0017703464675254024, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1828,1.437268,0.836009,0.839231,0.836901,0.835828
2,0.4512,1.054541,0.866972,0.868498,0.86638,0.866663
3,0.2431,1.026525,0.866972,0.867056,0.866801,0.866888
4,0.1576,1.154071,0.857798,0.857873,0.857624,0.857708
5,0.1099,1.250069,0.854358,0.854861,0.853993,0.854173
6,0.0796,1.199812,0.847477,0.848587,0.846942,0.847171


[I 2025-03-25 11:00:26,519] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.001157323292918207, 'weight_decay': 0.005, 'warmup_steps': 43, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3309,1.66783,0.825688,0.831916,0.826934,0.825202
2,0.5234,1.154348,0.853211,0.853649,0.852867,0.853037
3,0.2963,1.11725,0.861239,0.861391,0.861465,0.861237
4,0.1865,1.201836,0.860092,0.860429,0.859792,0.859947
5,0.1345,1.320247,0.84633,0.846542,0.846068,0.846194
6,0.1012,1.308402,0.853211,0.854082,0.852741,0.85296


[I 2025-03-25 11:01:39,291] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0009248416186520873, 'weight_decay': 0.004, 'warmup_steps': 38, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3711,1.599346,0.821101,0.831109,0.822682,0.820192
2,0.5575,1.185962,0.849771,0.849808,0.849615,0.849683
3,0.3232,1.106987,0.853211,0.853649,0.852867,0.853037
4,0.2121,1.17595,0.862385,0.864613,0.861665,0.861967
5,0.1516,1.283724,0.848624,0.849835,0.848068,0.848305
6,0.1156,1.214321,0.850917,0.852141,0.850362,0.850603


[I 2025-03-25 11:02:48,680] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0005598088672445005, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4748,1.588155,0.81422,0.819098,0.815336,0.813828
2,0.7081,1.334732,0.840596,0.842516,0.83989,0.840132
3,0.4427,1.37506,0.84633,0.847529,0.845773,0.846006


[I 2025-03-25 11:03:23,002] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0013999570054561079, 'weight_decay': 0.009000000000000001, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2398,1.55249,0.827982,0.833419,0.829145,0.827582
2,0.4768,1.10315,0.855505,0.856079,0.855119,0.85531
3,0.2621,1.148304,0.857798,0.858247,0.857456,0.85763
4,0.1661,1.268005,0.84633,0.850218,0.845352,0.845599
5,0.1203,1.256359,0.855505,0.855948,0.855161,0.855333
6,0.0877,1.20186,0.856651,0.858008,0.856077,0.856334
7,0.0678,1.198974,0.857798,0.85803,0.85754,0.857672
8,0.0549,1.202214,0.858945,0.859227,0.858666,0.85881
9,0.0466,1.203362,0.857798,0.857817,0.857666,0.857723
10,0.0411,1.213346,0.858945,0.859337,0.858624,0.858789


[I 2025-03-25 11:05:32,759] Trial 71 finished with value: 0.8587887716692801 and parameters: {'learning_rate': 0.0013999570054561079, 'weight_decay': 0.009000000000000001, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 56 with value: 0.8669157698076467.


Trial 72 with params: {'learning_rate': 0.0009405894847851332, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3253,1.716743,0.816514,0.829909,0.818346,0.815183
2,0.56,1.234267,0.849771,0.849713,0.849741,0.849726
3,0.3297,1.207404,0.852064,0.852018,0.85212,0.852041
4,0.2169,1.182361,0.861239,0.861761,0.860876,0.861063
5,0.1536,1.262547,0.847477,0.84899,0.846857,0.847105
6,0.1181,1.218957,0.863532,0.865149,0.862918,0.863199
7,0.0909,1.227887,0.870413,0.870469,0.870264,0.870338
8,0.0743,1.201712,0.869266,0.870035,0.868843,0.869067
9,0.0623,1.222948,0.863532,0.863479,0.863507,0.863492
10,0.0548,1.237152,0.868119,0.868322,0.867885,0.868011


[I 2025-03-25 11:07:35,957] Trial 72 finished with value: 0.8680107771016862 and parameters: {'learning_rate': 0.0009405894847851332, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.1, 'temperature': 6.0}. Best is trial 72 with value: 0.8680107771016862.


Trial 73 with params: {'learning_rate': 0.000764811093538467, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3524,1.731291,0.811927,0.82117,0.813463,0.811032
2,0.612,1.26263,0.849771,0.851524,0.84911,0.849369
3,0.3696,1.214504,0.850917,0.851349,0.850573,0.850741
4,0.245,1.31948,0.852064,0.853834,0.851404,0.851669
5,0.1795,1.352953,0.84633,0.847173,0.845858,0.846068
6,0.1367,1.283038,0.853211,0.855109,0.85253,0.852801


[I 2025-03-25 11:08:45,454] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.000347802741623925, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6121,1.4717,0.818807,0.820283,0.81942,0.818746
2,0.8626,1.346741,0.833716,0.83494,0.833133,0.833347
3,0.5857,1.447028,0.833716,0.84176,0.832291,0.832268


[I 2025-03-25 11:09:25,933] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0011140110794410035, 'weight_decay': 0.007, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2922,1.601976,0.821101,0.832812,0.822809,0.820006
2,0.5288,1.118092,0.856651,0.857161,0.856287,0.85647
3,0.3021,1.126724,0.860092,0.861575,0.859497,0.859766
4,0.1918,1.173906,0.853211,0.854257,0.852698,0.852932
5,0.1392,1.298499,0.855505,0.856561,0.854993,0.85523
6,0.104,1.23314,0.854358,0.856925,0.853572,0.853858


[I 2025-03-25 11:10:33,265] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.0007746438582999517, 'weight_decay': 0.005, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3964,1.630847,0.819954,0.833157,0.821767,0.818683
2,0.6168,1.230398,0.850917,0.851952,0.850404,0.850634
3,0.3679,1.191614,0.854358,0.854739,0.854035,0.854197
4,0.2451,1.273687,0.848624,0.852544,0.847647,0.847904
5,0.1797,1.323843,0.854358,0.855701,0.853783,0.854035
6,0.1353,1.271727,0.853211,0.854447,0.852656,0.852901


[I 2025-03-25 11:11:41,983] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0006542734991989972, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4508,1.811526,0.803899,0.819067,0.805875,0.802192
2,0.6484,1.310009,0.840596,0.842283,0.839932,0.840171
3,0.3958,1.236836,0.854358,0.855503,0.853825,0.854066
4,0.2687,1.234625,0.853211,0.856208,0.852362,0.852646
5,0.1961,1.318827,0.856651,0.857623,0.856161,0.856393
6,0.1516,1.304528,0.856651,0.859241,0.855866,0.856159
7,0.1209,1.312564,0.857798,0.858526,0.857371,0.857582
8,0.0995,1.284492,0.862385,0.862625,0.862128,0.862263
9,0.0843,1.303048,0.860092,0.860034,0.860087,0.860056
10,0.0747,1.312464,0.858945,0.859337,0.858624,0.858789


[I 2025-03-25 11:13:39,845] Trial 77 finished with value: 0.8587887716692801 and parameters: {'learning_rate': 0.0006542734991989972, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 0.2, 'temperature': 4.0}. Best is trial 72 with value: 0.8680107771016862.


Trial 78 with params: {'learning_rate': 0.0004453830982089004, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5076,1.482236,0.819954,0.824716,0.821051,0.819593
2,0.777,1.374664,0.84289,0.844373,0.842269,0.842507
3,0.5,1.333079,0.845183,0.84766,0.844395,0.844652
4,0.3611,1.500923,0.836009,0.845696,0.834459,0.83436
5,0.2768,1.493844,0.84289,0.84283,0.8429,0.842855
6,0.2192,1.36787,0.858945,0.858968,0.859087,0.858936
7,0.1808,1.400935,0.864679,0.864622,0.864675,0.864644
8,0.1505,1.430228,0.853211,0.853186,0.853119,0.853148
9,0.1293,1.45629,0.84633,0.846723,0.846657,0.846329
10,0.1158,1.452035,0.860092,0.860169,0.859918,0.860003


[I 2025-03-25 11:15:36,341] Trial 78 finished with value: 0.8600026319252534 and parameters: {'learning_rate': 0.0004453830982089004, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 72 with value: 0.8680107771016862.


Trial 79 with params: {'learning_rate': 0.000669852030187182, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4121,1.65258,0.803899,0.811733,0.805328,0.803116
2,0.6405,1.259879,0.838303,0.839035,0.837848,0.838042
3,0.3917,1.222336,0.852064,0.85244,0.851741,0.8519
4,0.2669,1.253289,0.860092,0.863166,0.859245,0.859553
5,0.1968,1.307614,0.860092,0.860828,0.859666,0.859879
6,0.1507,1.255495,0.864679,0.86619,0.864086,0.864364
7,0.1211,1.301537,0.862385,0.862846,0.862044,0.862222
8,0.0996,1.25156,0.866972,0.86713,0.866759,0.866872
9,0.0836,1.265344,0.863532,0.863555,0.863676,0.863523
10,0.0739,1.297548,0.858945,0.859133,0.858708,0.858829


[I 2025-03-25 11:17:31,720] Trial 79 finished with value: 0.8588289181174557 and parameters: {'learning_rate': 0.000669852030187182, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 72 with value: 0.8680107771016862.


Trial 80 with params: {'learning_rate': 0.002556615522049405, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1229,1.308571,0.845183,0.847871,0.845994,0.845056
2,0.4046,1.079008,0.852064,0.853016,0.851572,0.851797
3,0.2137,1.094012,0.862385,0.862328,0.862381,0.86235
4,0.1347,1.202857,0.850917,0.851236,0.850615,0.850764
5,0.0935,1.184204,0.856651,0.856695,0.856498,0.856568
6,0.0691,1.174492,0.850917,0.851137,0.850657,0.850785


[I 2025-03-25 11:18:41,972] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.002660643112262213, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.173,1.216845,0.84633,0.849169,0.847163,0.846194
2,0.4012,1.069793,0.862385,0.863883,0.861792,0.862065
3,0.2125,1.172632,0.84633,0.847729,0.845731,0.845973


[I 2025-03-25 11:19:15,237] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0005074978386422862, 'weight_decay': 0.002, 'warmup_steps': 42, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5341,1.730089,0.821101,0.825694,0.822177,0.820761
2,0.7265,1.279516,0.837156,0.839422,0.836385,0.836619
3,0.4631,1.36652,0.852064,0.854077,0.851362,0.851633
4,0.3247,1.360982,0.840596,0.845685,0.839469,0.83965
5,0.2438,1.380543,0.849771,0.849762,0.849657,0.849699
6,0.191,1.33566,0.857798,0.858247,0.857456,0.85763
7,0.1546,1.360095,0.861239,0.861193,0.861297,0.861216
8,0.1289,1.360419,0.857798,0.85803,0.85754,0.857672
9,0.1096,1.372401,0.856651,0.856891,0.856919,0.856651
10,0.0972,1.376037,0.858945,0.858991,0.858792,0.858863


[I 2025-03-25 11:21:15,025] Trial 82 finished with value: 0.8588630989429471 and parameters: {'learning_rate': 0.0005074978386422862, 'weight_decay': 0.002, 'warmup_steps': 42, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 72 with value: 0.8680107771016862.


Trial 83 with params: {'learning_rate': 0.0009763299499503336, 'weight_decay': 0.007, 'warmup_steps': 36, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3591,1.380826,0.826835,0.834196,0.828187,0.82624
2,0.5557,1.172205,0.854358,0.854311,0.854414,0.854335
3,0.323,1.184001,0.858945,0.860532,0.858329,0.858601
4,0.2115,1.192421,0.861239,0.862418,0.860708,0.86096
5,0.1527,1.29198,0.856651,0.856929,0.856372,0.856514
6,0.1153,1.231095,0.857798,0.859267,0.857203,0.857468


[I 2025-03-25 11:22:26,847] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 5.286423289644344e-05, 'weight_decay': 0.008, 'warmup_steps': 25, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5117,1.687667,0.784404,0.785124,0.784836,0.784386
2,1.5911,1.596817,0.791284,0.791225,0.791298,0.791245
3,1.3741,1.561681,0.801606,0.803211,0.800886,0.801027
4,1.2525,1.602697,0.806193,0.811368,0.80497,0.804899
5,1.1567,1.461173,0.811927,0.811859,0.811905,0.811878
6,1.0919,1.466097,0.817661,0.818217,0.818041,0.817655


[I 2025-03-25 11:23:33,949] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0006704971009641271, 'weight_decay': 0.004, 'warmup_steps': 40, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4588,1.746739,0.800459,0.819363,0.802665,0.798213
2,0.6412,1.297475,0.84633,0.848682,0.845563,0.845823
3,0.3885,1.203556,0.855505,0.855732,0.855245,0.855376
4,0.2645,1.194938,0.856651,0.858223,0.856035,0.856302
5,0.1912,1.322098,0.855505,0.855577,0.85533,0.855413
6,0.1481,1.212841,0.858945,0.860764,0.858287,0.858568
7,0.1177,1.293782,0.862385,0.86298,0.862002,0.8622
8,0.0968,1.239086,0.868119,0.868087,0.868054,0.868069
9,0.0817,1.267285,0.864679,0.864625,0.864718,0.864653
10,0.0726,1.293666,0.864679,0.86476,0.864507,0.864593


[I 2025-03-25 11:25:32,422] Trial 85 finished with value: 0.8645927095670483 and parameters: {'learning_rate': 0.0006704971009641271, 'weight_decay': 0.004, 'warmup_steps': 40, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 72 with value: 0.8680107771016862.


Trial 86 with params: {'learning_rate': 0.0011168242910589775, 'weight_decay': 0.001, 'warmup_steps': 43, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.353,1.317463,0.833716,0.837563,0.834691,0.833477
2,0.5197,1.107394,0.856651,0.857808,0.856119,0.856364
3,0.2946,1.093719,0.863532,0.863935,0.863213,0.863381
4,0.186,1.212212,0.861239,0.864171,0.860413,0.860724
5,0.1334,1.255899,0.860092,0.86117,0.859582,0.859826
6,0.1008,1.242454,0.862385,0.865179,0.861581,0.861894
7,0.0782,1.265073,0.858945,0.858899,0.859003,0.858923
8,0.0633,1.229318,0.863532,0.863647,0.863339,0.863437
9,0.0533,1.228141,0.860092,0.860057,0.860171,0.860073
10,0.0468,1.243837,0.861239,0.86143,0.861002,0.861124


[I 2025-03-25 11:27:31,268] Trial 86 finished with value: 0.8611243828635133 and parameters: {'learning_rate': 0.0011168242910589775, 'weight_decay': 0.001, 'warmup_steps': 43, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 72 with value: 0.8680107771016862.


Trial 87 with params: {'learning_rate': 0.00046366581015033564, 'weight_decay': 0.003, 'warmup_steps': 40, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5665,1.638124,0.815367,0.822071,0.816673,0.814782
2,0.7463,1.306387,0.838303,0.840712,0.837512,0.837748
3,0.4829,1.337882,0.853211,0.85756,0.852193,0.852465
4,0.3421,1.390329,0.837156,0.839972,0.836301,0.836529
5,0.2614,1.376249,0.852064,0.852561,0.851699,0.851877
6,0.2055,1.344523,0.858945,0.860113,0.858413,0.858662
7,0.1677,1.338997,0.866972,0.866919,0.867012,0.866947
8,0.1389,1.315654,0.869266,0.869222,0.869222,0.869222
9,0.1197,1.351664,0.866972,0.867022,0.867138,0.866966
10,0.1069,1.353081,0.866972,0.866955,0.866886,0.866916


[I 2025-03-25 11:29:30,110] Trial 87 finished with value: 0.8669157698076467 and parameters: {'learning_rate': 0.00046366581015033564, 'weight_decay': 0.003, 'warmup_steps': 40, 'lambda_param': 0.0, 'temperature': 3.5}. Best is trial 72 with value: 0.8680107771016862.


Trial 88 with params: {'learning_rate': 0.00020756123715311674, 'weight_decay': 0.005, 'warmup_steps': 43, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8479,1.52859,0.797018,0.801862,0.79815,0.796569
2,1.0538,1.364305,0.827982,0.827944,0.827882,0.827908
3,0.7816,1.455729,0.829128,0.83439,0.827955,0.828052


[I 2025-03-25 11:30:03,738] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0007385546836584568, 'weight_decay': 0.005, 'warmup_steps': 37, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4184,1.632776,0.816514,0.823458,0.817841,0.815909
2,0.6181,1.2299,0.845183,0.846681,0.844563,0.844806
3,0.3737,1.157354,0.858945,0.859227,0.858666,0.85881
4,0.251,1.196064,0.855505,0.858227,0.854698,0.854989
5,0.1801,1.305302,0.861239,0.861902,0.860834,0.86104
6,0.1378,1.242969,0.862385,0.862846,0.862044,0.862222
7,0.1095,1.277678,0.868119,0.868062,0.868138,0.86809
8,0.0893,1.230989,0.865826,0.865877,0.865675,0.865748
9,0.0758,1.254622,0.87156,0.87161,0.871727,0.871554
10,0.0668,1.279167,0.861239,0.861525,0.86096,0.861105


[I 2025-03-25 11:32:02,393] Trial 89 finished with value: 0.8611053702009465 and parameters: {'learning_rate': 0.0007385546836584568, 'weight_decay': 0.005, 'warmup_steps': 37, 'lambda_param': 0.2, 'temperature': 3.5}. Best is trial 72 with value: 0.8680107771016862.


Trial 90 with params: {'learning_rate': 0.0017119459081741386, 'weight_decay': 0.006, 'warmup_steps': 43, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2531,1.358595,0.832569,0.838478,0.833775,0.832141
2,0.4501,1.041669,0.863532,0.864203,0.863128,0.863336
3,0.2484,1.079064,0.860092,0.86117,0.859582,0.859826
4,0.1562,1.205044,0.853211,0.855627,0.852446,0.852727
5,0.1092,1.235309,0.860092,0.860034,0.860087,0.860056
6,0.0801,1.270812,0.840596,0.842283,0.839932,0.840171


[I 2025-03-25 11:33:08,568] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0007930405527281093, 'weight_decay': 0.003, 'warmup_steps': 34, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3968,1.485966,0.818807,0.823375,0.819883,0.818463
2,0.6014,1.190359,0.849771,0.850397,0.849362,0.849555
3,0.3574,1.135404,0.861239,0.861193,0.861297,0.861216
4,0.2389,1.21078,0.857798,0.860544,0.856993,0.857291
5,0.1713,1.303134,0.855505,0.857673,0.854782,0.855065
6,0.1295,1.220121,0.857798,0.859492,0.857161,0.857435


[I 2025-03-25 11:34:18,690] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0017797606136130458, 'weight_decay': 0.003, 'warmup_steps': 38, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2315,1.376883,0.83945,0.845039,0.840616,0.839076
2,0.444,1.075275,0.856651,0.857808,0.856119,0.856364
3,0.2374,1.150857,0.848624,0.850997,0.847857,0.848125
4,0.1509,1.200091,0.858945,0.859054,0.85875,0.858847
5,0.1072,1.309356,0.848624,0.849476,0.848152,0.848365
6,0.0791,1.215733,0.852064,0.853197,0.85153,0.851768


[I 2025-03-25 11:35:26,047] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.0005023108074181485, 'weight_decay': 0.004, 'warmup_steps': 39, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5406,1.587957,0.822248,0.826327,0.823261,0.821961
2,0.7311,1.393456,0.830275,0.837043,0.828955,0.828976
3,0.4648,1.389231,0.845183,0.84714,0.844479,0.844732


[I 2025-03-25 11:36:01,315] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0004767950355458847, 'weight_decay': 0.002, 'warmup_steps': 41, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5646,1.667647,0.805046,0.819931,0.807001,0.803391
2,0.7376,1.289977,0.841743,0.845231,0.840806,0.84104
3,0.4702,1.333579,0.847477,0.851209,0.846521,0.846776
4,0.3338,1.356938,0.847477,0.849707,0.846731,0.846994
5,0.2523,1.323131,0.849771,0.849869,0.849573,0.849666
6,0.1985,1.283589,0.865826,0.866361,0.865465,0.865656
7,0.1614,1.292881,0.863532,0.863823,0.863255,0.863401
8,0.1347,1.32266,0.870413,0.870619,0.87018,0.870306
9,0.1157,1.334903,0.864679,0.864874,0.864928,0.864678
10,0.1033,1.342181,0.860092,0.860071,0.860003,0.860032


[I 2025-03-25 11:37:59,511] Trial 94 finished with value: 0.8600321027287319 and parameters: {'learning_rate': 0.0004767950355458847, 'weight_decay': 0.002, 'warmup_steps': 41, 'lambda_param': 0.1, 'temperature': 3.0}. Best is trial 72 with value: 0.8680107771016862.


Trial 95 with params: {'learning_rate': 0.0006428951138976216, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4156,1.62055,0.81422,0.81714,0.815084,0.814028
2,0.6534,1.309272,0.844037,0.845864,0.843353,0.843601
3,0.4043,1.253438,0.847477,0.848587,0.846942,0.847171


[I 2025-03-25 11:38:32,653] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 5.399635979922363e-05, 'weight_decay': 0.0, 'warmup_steps': 35, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5251,1.687511,0.78555,0.786657,0.786089,0.785503
2,1.5642,1.603481,0.800459,0.800392,0.800392,0.800392
3,1.3537,1.536008,0.805046,0.805585,0.804601,0.804749
4,1.2417,1.559417,0.808486,0.810603,0.807685,0.807829
5,1.1469,1.460106,0.811927,0.811891,0.81199,0.811902
6,1.0829,1.470439,0.815367,0.816368,0.815873,0.815338


[I 2025-03-25 11:39:39,236] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0013364030029384055, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2212,1.647865,0.817661,0.830162,0.81943,0.816444
2,0.4873,1.113647,0.854358,0.85532,0.853867,0.854095
3,0.2737,1.1184,0.861239,0.861761,0.860876,0.861063
4,0.1764,1.174151,0.858945,0.860532,0.858329,0.858601
5,0.1252,1.267921,0.850917,0.851349,0.850573,0.850741
6,0.0929,1.200048,0.850917,0.851477,0.85053,0.850716


[I 2025-03-25 11:40:48,380] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0010757479953298324, 'weight_decay': 0.004, 'warmup_steps': 43, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3582,1.540681,0.817661,0.823973,0.818925,0.817129
2,0.5381,1.147045,0.856651,0.856596,0.856624,0.856609
3,0.305,1.148278,0.858945,0.859756,0.858498,0.858717
4,0.198,1.184356,0.852064,0.853197,0.85153,0.851768
5,0.1417,1.306636,0.850917,0.853047,0.850194,0.850464
6,0.1062,1.206372,0.855505,0.857182,0.854867,0.855136


[I 2025-03-25 11:41:55,583] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0017422533204379319, 'weight_decay': 0.0, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.16,1.353464,0.84289,0.846483,0.843826,0.842691
2,0.4498,1.052114,0.856651,0.856891,0.856919,0.856651
3,0.2464,1.016775,0.877294,0.877356,0.877147,0.877222
4,0.1579,1.144505,0.858945,0.862164,0.858077,0.858382
5,0.1105,1.243058,0.855505,0.855577,0.85533,0.855413
6,0.0816,1.181362,0.850917,0.851349,0.850573,0.850741


[I 2025-03-25 11:43:02,385] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0003645065474598689, 'weight_decay': 0.004, 'warmup_steps': 42, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6711,1.511991,0.816514,0.819449,0.817378,0.816324
2,0.8397,1.36183,0.833716,0.83558,0.833007,0.833231
3,0.5627,1.407822,0.847477,0.851557,0.846478,0.846727
4,0.4178,1.47774,0.844037,0.849806,0.842848,0.843025
5,0.3202,1.461871,0.84289,0.843142,0.842605,0.842739
6,0.2575,1.360345,0.847477,0.847573,0.847278,0.847371


[I 2025-03-25 11:44:07,253] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0026422815626647267, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1413,1.246364,0.857798,0.858974,0.85834,0.857771
2,0.4015,1.164029,0.84633,0.848682,0.845563,0.845823
3,0.2135,1.201759,0.84633,0.848682,0.845563,0.845823


[I 2025-03-25 11:44:40,114] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0008908749370891274, 'weight_decay': 0.007, 'warmup_steps': 40, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3707,1.402128,0.827982,0.834245,0.829229,0.827502
2,0.5672,1.159101,0.850917,0.851137,0.850657,0.850785
3,0.3322,1.148435,0.855505,0.855732,0.855245,0.855376
4,0.2167,1.186867,0.860092,0.860991,0.859624,0.859853
5,0.1552,1.307787,0.850917,0.851952,0.850404,0.850634
6,0.1177,1.253121,0.848624,0.849648,0.84811,0.848336


[I 2025-03-25 11:45:46,182] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0008555235299408995, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3361,1.470742,0.818807,0.826248,0.820178,0.818161
2,0.5865,1.229672,0.845183,0.84714,0.844479,0.844732
3,0.3506,1.17378,0.860092,0.860169,0.859918,0.860003
4,0.2308,1.215131,0.857798,0.858526,0.857371,0.857582
5,0.1656,1.302954,0.857798,0.858688,0.857329,0.857555
6,0.1273,1.21879,0.862385,0.86313,0.86196,0.862176
7,0.0997,1.227021,0.865826,0.865877,0.865675,0.865748
8,0.0813,1.201713,0.873853,0.874112,0.8736,0.873741
9,0.0683,1.230991,0.864679,0.864874,0.864928,0.864678
10,0.0599,1.237549,0.869266,0.869352,0.869096,0.869183


[I 2025-03-25 11:47:40,194] Trial 103 finished with value: 0.8691827872088432 and parameters: {'learning_rate': 0.0008555235299408995, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.1, 'temperature': 5.5}. Best is trial 103 with value: 0.8691827872088432.


Trial 104 with params: {'learning_rate': 0.0009375090513233341, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3253,1.695038,0.818807,0.829867,0.820472,0.817763
2,0.5601,1.149216,0.84289,0.843244,0.842563,0.842716
3,0.3314,1.118051,0.854358,0.854352,0.854246,0.854289
4,0.2147,1.232401,0.850917,0.853593,0.850109,0.850385
5,0.1539,1.302982,0.84633,0.848682,0.845563,0.845823
6,0.1173,1.244521,0.860092,0.862042,0.859413,0.859701
7,0.0912,1.23952,0.864679,0.864703,0.864549,0.864608
8,0.075,1.211212,0.864679,0.865146,0.864339,0.864519
9,0.0629,1.223983,0.858945,0.858968,0.859087,0.858936
10,0.0551,1.238587,0.864679,0.864922,0.864423,0.864558


[I 2025-03-25 11:49:36,697] Trial 104 finished with value: 0.8645584882612793 and parameters: {'learning_rate': 0.0009375090513233341, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 5.5}. Best is trial 103 with value: 0.8691827872088432.


Trial 105 with params: {'learning_rate': 0.001394113520827695, 'weight_decay': 0.002, 'warmup_steps': 42, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.287,1.39493,0.83945,0.843878,0.84049,0.839175
2,0.4871,1.094143,0.875,0.875007,0.874895,0.874941
3,0.2695,1.083586,0.865826,0.865943,0.865633,0.865732
4,0.1721,1.139102,0.862385,0.864613,0.861665,0.861967
5,0.1241,1.20653,0.853211,0.853435,0.852951,0.85308
6,0.09,1.142392,0.866972,0.866998,0.866843,0.866902
7,0.0704,1.202435,0.864679,0.86466,0.864591,0.864621
8,0.0567,1.203152,0.865826,0.866024,0.865591,0.865715
9,0.0479,1.171447,0.860092,0.860071,0.860003,0.860032
10,0.0421,1.19885,0.860092,0.860327,0.859834,0.859967


[I 2025-03-25 11:51:32,097] Trial 105 finished with value: 0.8599672505752209 and parameters: {'learning_rate': 0.001394113520827695, 'weight_decay': 0.002, 'warmup_steps': 42, 'lambda_param': 1.0, 'temperature': 6.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 106 with params: {'learning_rate': 0.00040124304914810287, 'weight_decay': 0.005, 'warmup_steps': 40, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6116,1.541174,0.81422,0.819474,0.815378,0.813788
2,0.8065,1.289235,0.836009,0.837662,0.835344,0.835571
3,0.5284,1.394588,0.844037,0.850237,0.842805,0.842966
4,0.3848,1.419593,0.840596,0.844233,0.839638,0.839863
5,0.2942,1.40578,0.852064,0.852025,0.851993,0.852008
6,0.2336,1.332168,0.858945,0.859461,0.858582,0.858766
7,0.1923,1.345786,0.861239,0.861203,0.861171,0.861186
8,0.1617,1.348999,0.861239,0.861219,0.861339,0.861224
9,0.14,1.389097,0.862385,0.86258,0.862634,0.862385
10,0.1248,1.385873,0.861239,0.861184,0.861213,0.861197


[I 2025-03-25 11:53:26,575] Trial 106 finished with value: 0.8611974600050779 and parameters: {'learning_rate': 0.00040124304914810287, 'weight_decay': 0.005, 'warmup_steps': 40, 'lambda_param': 0.2, 'temperature': 3.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 107 with params: {'learning_rate': 0.0006500659503822838, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4102,1.719771,0.809633,0.818819,0.811169,0.808727
2,0.6544,1.293179,0.834862,0.837372,0.834049,0.834273
3,0.4024,1.183193,0.858945,0.859601,0.85854,0.858743
4,0.2756,1.244201,0.848624,0.850254,0.847983,0.848237
5,0.2022,1.326931,0.838303,0.839554,0.837722,0.837944
6,0.1566,1.302309,0.860092,0.861801,0.859455,0.859735
7,0.1251,1.241916,0.864679,0.865431,0.864254,0.864473
8,0.1033,1.258464,0.87156,0.871545,0.871474,0.871505
9,0.0868,1.269292,0.863532,0.863685,0.86376,0.86353
10,0.0767,1.286932,0.865826,0.865792,0.865759,0.865775


[I 2025-03-25 11:55:30,198] Trial 107 finished with value: 0.8657746729027292 and parameters: {'learning_rate': 0.0006500659503822838, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 103 with value: 0.8691827872088432.


Trial 108 with params: {'learning_rate': 0.0011642062860860766, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2719,1.654154,0.822248,0.827414,0.823388,0.821854
2,0.5166,1.075911,0.855505,0.856753,0.854951,0.8552
3,0.2915,1.090202,0.858945,0.85889,0.858918,0.858903
4,0.1876,1.160279,0.861239,0.865162,0.860287,0.8606
5,0.136,1.275463,0.855505,0.855832,0.855203,0.855355
6,0.1012,1.217203,0.857798,0.859492,0.857161,0.857435


[I 2025-03-25 11:56:38,739] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0002900848385820744, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6567,1.487583,0.811927,0.814555,0.812747,0.811759
2,0.9306,1.346412,0.829128,0.830956,0.828418,0.828631
3,0.6568,1.419915,0.826835,0.83245,0.825619,0.825679
4,0.5036,1.409842,0.837156,0.837445,0.836849,0.836988
5,0.3955,1.444917,0.838303,0.838382,0.83848,0.838297
6,0.3274,1.463175,0.83945,0.839975,0.839059,0.839233


[I 2025-03-25 11:57:48,663] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.00047547300185750007, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5072,1.601727,0.818807,0.820958,0.819546,0.818692
2,0.7596,1.301737,0.834862,0.83662,0.834175,0.834402
3,0.4871,1.446585,0.84289,0.850696,0.841511,0.84159
4,0.3454,1.456235,0.83945,0.845989,0.838175,0.838285
5,0.2628,1.429779,0.83945,0.841017,0.838806,0.83904
6,0.2074,1.331092,0.862385,0.862846,0.862044,0.862222
7,0.1691,1.375399,0.860092,0.860169,0.859918,0.860003
8,0.1421,1.350372,0.863532,0.863727,0.863297,0.86342
9,0.1216,1.39336,0.849771,0.84985,0.849952,0.849766
10,0.1085,1.394144,0.856651,0.856647,0.85654,0.856583


[I 2025-03-25 11:59:45,245] Trial 110 finished with value: 0.8565832876110329 and parameters: {'learning_rate': 0.00047547300185750007, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 5.5}. Best is trial 103 with value: 0.8691827872088432.


Trial 111 with params: {'learning_rate': 0.0007703588624188188, 'weight_decay': 0.009000000000000001, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3721,1.914033,0.808486,0.822533,0.810379,0.806981
2,0.6115,1.235458,0.836009,0.837247,0.835428,0.835646
3,0.368,1.225653,0.852064,0.852242,0.851825,0.851943
4,0.2447,1.257277,0.858945,0.860315,0.858371,0.858632
5,0.1775,1.313947,0.850917,0.851621,0.850488,0.85069
6,0.1365,1.250367,0.858945,0.859337,0.858624,0.858789
7,0.108,1.303086,0.862385,0.862366,0.862297,0.862327
8,0.0895,1.268628,0.856651,0.856758,0.856456,0.856552
9,0.0747,1.281052,0.864679,0.864679,0.864802,0.864668
10,0.0658,1.293509,0.854358,0.854739,0.854035,0.854197


[I 2025-03-25 12:01:45,084] Trial 111 finished with value: 0.8541965366016144 and parameters: {'learning_rate': 0.0007703588624188188, 'weight_decay': 0.009000000000000001, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 112 with params: {'learning_rate': 6.786706512825958e-05, 'weight_decay': 0.007, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3414,1.641649,0.790138,0.79204,0.790846,0.790016
2,1.4716,1.533175,0.798165,0.798274,0.79835,0.798161
3,1.2891,1.47037,0.811927,0.811927,0.812032,0.811911
4,1.1636,1.599617,0.808486,0.814922,0.807138,0.806981
5,1.0589,1.437328,0.821101,0.822163,0.820535,0.820724
6,0.9852,1.456005,0.825688,0.826984,0.826261,0.825643


[I 2025-03-25 12:03:00,702] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0017406021954941922, 'weight_decay': 0.01, 'warmup_steps': 5, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1635,1.401798,0.836009,0.841352,0.837154,0.835646
2,0.451,1.065774,0.857798,0.858688,0.857329,0.857555
3,0.2425,1.114862,0.863532,0.864203,0.863128,0.863336
4,0.1532,1.128062,0.864679,0.865977,0.864128,0.864394
5,0.1069,1.266751,0.860092,0.861365,0.859539,0.859797
6,0.0791,1.216731,0.861239,0.861761,0.860876,0.861063
7,0.0618,1.217255,0.860092,0.860112,0.85996,0.860018
8,0.0502,1.228114,0.853211,0.854873,0.852572,0.852836
9,0.0423,1.216758,0.856651,0.856695,0.856498,0.856568
10,0.0374,1.229108,0.855505,0.855948,0.855161,0.855333


[I 2025-03-25 12:04:59,292] Trial 113 finished with value: 0.8553333579114241 and parameters: {'learning_rate': 0.0017406021954941922, 'weight_decay': 0.01, 'warmup_steps': 5, 'lambda_param': 0.4, 'temperature': 6.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 114 with params: {'learning_rate': 0.00025385455926951023, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7199,1.591173,0.794725,0.803297,0.796234,0.793781
2,0.9788,1.338588,0.836009,0.836348,0.83568,0.835828
3,0.7024,1.443064,0.827982,0.833828,0.826745,0.826801
4,0.5541,1.46973,0.825688,0.826171,0.825293,0.825453
5,0.4444,1.491906,0.836009,0.835948,0.835975,0.835961
6,0.3703,1.527107,0.840596,0.840582,0.840479,0.840521


[I 2025-03-25 12:06:06,711] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.00031426870593619235, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6586,1.513435,0.811927,0.814293,0.812705,0.811784
2,0.8876,1.341784,0.830275,0.831998,0.829587,0.829802
3,0.619,1.392473,0.838303,0.843748,0.837133,0.837284


[I 2025-03-25 12:06:39,466] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0007789928527424453, 'weight_decay': 0.006, 'warmup_steps': 37, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4167,1.811439,0.815367,0.824958,0.816926,0.814459
2,0.5987,1.194724,0.845183,0.846282,0.844647,0.844873
3,0.3579,1.155845,0.856651,0.856674,0.856793,0.856642
4,0.2407,1.256279,0.853211,0.855109,0.85253,0.852801
5,0.174,1.296052,0.845183,0.846681,0.844563,0.844806
6,0.1326,1.210987,0.857798,0.859492,0.857161,0.857435
7,0.1045,1.254561,0.870413,0.870536,0.870222,0.870323
8,0.0863,1.202364,0.861239,0.86143,0.861002,0.861124
9,0.0727,1.241454,0.864679,0.864679,0.864802,0.864668
10,0.0641,1.250556,0.860092,0.86024,0.859876,0.859986


[I 2025-03-25 12:08:34,777] Trial 116 finished with value: 0.859985680592992 and parameters: {'learning_rate': 0.0007789928527424453, 'weight_decay': 0.006, 'warmup_steps': 37, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 117 with params: {'learning_rate': 0.00012050092247739796, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0869,1.551825,0.792431,0.792462,0.792214,0.792287
2,1.2579,1.455604,0.817661,0.818062,0.817283,0.81743
3,1.0409,1.425367,0.824541,0.824962,0.824166,0.824319


[I 2025-03-25 12:09:09,445] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0009097230516574109, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3214,1.583027,0.818807,0.826719,0.82022,0.81811
2,0.5742,1.148431,0.849771,0.850261,0.849404,0.849581
3,0.3374,1.15385,0.854358,0.854399,0.854204,0.854273
4,0.2224,1.170356,0.865826,0.866664,0.865381,0.865609
5,0.1593,1.273927,0.854358,0.855503,0.853825,0.854066
6,0.1205,1.217751,0.855505,0.856224,0.855077,0.855285


[I 2025-03-25 12:10:15,916] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0002745543053787802, 'weight_decay': 0.007, 'warmup_steps': 33, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7201,1.541723,0.800459,0.808926,0.801949,0.799572
2,0.9471,1.335698,0.836009,0.836462,0.835638,0.835802
3,0.6726,1.426147,0.826835,0.83245,0.825619,0.825679
4,0.5181,1.436376,0.829128,0.830956,0.828418,0.828631
5,0.417,1.483571,0.831422,0.831375,0.83147,0.831395
6,0.343,1.513499,0.849771,0.850712,0.849278,0.8495


[I 2025-03-25 12:11:22,633] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.0036101090092247124, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0747,1.24283,0.84633,0.846846,0.8467,0.846327
2,0.3764,1.194297,0.844037,0.84487,0.843563,0.84377
3,0.1999,1.230654,0.848624,0.852204,0.847689,0.847951


[I 2025-03-25 12:11:58,028] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.0019387413021258243, 'weight_decay': 0.01, 'warmup_steps': 16, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1891,1.46602,0.838303,0.844949,0.839574,0.837832
2,0.438,1.097934,0.864679,0.865779,0.86417,0.864421
3,0.237,1.09634,0.862385,0.862464,0.862213,0.862298
4,0.1508,1.220061,0.852064,0.85244,0.851741,0.8519
5,0.104,1.228778,0.853211,0.853534,0.852909,0.85306
6,0.0772,1.187082,0.852064,0.852165,0.851867,0.851961


[I 2025-03-25 12:13:06,530] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0004693439330486905, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5175,1.652749,0.813073,0.820633,0.814463,0.81238
2,0.748,1.376186,0.824541,0.830108,0.823324,0.82337
3,0.4784,1.418077,0.848624,0.852544,0.847647,0.847904


[I 2025-03-25 12:13:40,297] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0005672772410553919, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4554,1.506993,0.816514,0.820706,0.817547,0.8162
2,0.7109,1.445814,0.826835,0.834681,0.825408,0.825328
3,0.4472,1.295949,0.848624,0.848564,0.848615,0.848585
4,0.3097,1.293208,0.860092,0.862042,0.859413,0.859701
5,0.2324,1.387828,0.850917,0.851236,0.850615,0.850764
6,0.1786,1.335687,0.855505,0.85742,0.854824,0.855101
7,0.1458,1.309689,0.863532,0.864203,0.863128,0.863336
8,0.1202,1.347177,0.862385,0.862537,0.862171,0.862281
9,0.1013,1.35367,0.860092,0.860092,0.860213,0.86008
10,0.0905,1.374734,0.862385,0.862728,0.862086,0.862243


[I 2025-03-25 12:15:35,733] Trial 123 finished with value: 0.8622432859399685 and parameters: {'learning_rate': 0.0005672772410553919, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 124 with params: {'learning_rate': 0.0009167350119707469, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3303,1.663432,0.819954,0.828626,0.82143,0.819182
2,0.5593,1.16867,0.853211,0.85335,0.852993,0.8531
3,0.3305,1.155592,0.861239,0.861525,0.86096,0.861105
4,0.2171,1.142119,0.863532,0.864361,0.863086,0.863312
5,0.1555,1.296585,0.864679,0.865598,0.864212,0.864448
6,0.1193,1.259987,0.853211,0.856522,0.85232,0.852603


[I 2025-03-25 12:16:52,346] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.003116284255717824, 'weight_decay': 0.007, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0833,1.334693,0.84633,0.848883,0.84712,0.846214
2,0.3896,1.120293,0.862385,0.862407,0.862255,0.862313
3,0.2081,1.106471,0.852064,0.852057,0.851951,0.851994
4,0.1312,1.244949,0.845183,0.845164,0.845279,0.845167
5,0.0873,1.231947,0.857798,0.857744,0.857834,0.857771
6,0.0638,1.199477,0.855505,0.855481,0.855414,0.855443
7,0.0501,1.231409,0.854358,0.854352,0.854246,0.854289
8,0.0406,1.237684,0.856651,0.856605,0.856708,0.856629
9,0.0347,1.22569,0.855505,0.85545,0.85554,0.855477
10,0.0311,1.230953,0.854358,0.854352,0.854246,0.854289


[I 2025-03-25 12:19:00,769] Trial 125 finished with value: 0.8542886202128093 and parameters: {'learning_rate': 0.003116284255717824, 'weight_decay': 0.007, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 126 with params: {'learning_rate': 0.000669927915241936, 'weight_decay': 0.003, 'warmup_steps': 43, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4702,1.890483,0.807339,0.825865,0.809506,0.805265
2,0.6384,1.317123,0.837156,0.837961,0.83668,0.836878
3,0.3891,1.210791,0.849771,0.849945,0.849531,0.849647


[I 2025-03-25 12:19:40,080] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.000631886581076646, 'weight_decay': 0.007, 'warmup_steps': 42, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4797,1.994179,0.798165,0.820077,0.800539,0.795475
2,0.6614,1.26737,0.84633,0.848958,0.845521,0.845782
3,0.4018,1.266365,0.852064,0.854077,0.851362,0.851633
4,0.2733,1.175148,0.858945,0.859227,0.858666,0.85881
5,0.2016,1.347899,0.849771,0.850547,0.84932,0.849528
6,0.154,1.293363,0.862385,0.86411,0.86175,0.862034
7,0.1246,1.286249,0.863532,0.864061,0.86317,0.863359
8,0.1024,1.277761,0.866972,0.86722,0.866717,0.866854
9,0.0862,1.285345,0.863532,0.863685,0.86376,0.86353
10,0.0762,1.300447,0.861239,0.861525,0.86096,0.861105


[I 2025-03-25 12:21:40,972] Trial 127 finished with value: 0.8611053702009465 and parameters: {'learning_rate': 0.000631886581076646, 'weight_decay': 0.007, 'warmup_steps': 42, 'lambda_param': 0.0, 'temperature': 4.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 128 with params: {'learning_rate': 0.0015712205338941642, 'weight_decay': 0.01, 'warmup_steps': 41, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2613,1.531515,0.827982,0.837625,0.829523,0.827163
2,0.4671,1.074913,0.870413,0.872074,0.869801,0.870097
3,0.2534,1.117413,0.854358,0.855152,0.853909,0.854123
4,0.1625,1.215417,0.849771,0.85258,0.848941,0.849214
5,0.1128,1.24791,0.858945,0.859054,0.85875,0.858847
6,0.0848,1.243978,0.850917,0.852345,0.85032,0.850571


[I 2025-03-25 12:22:50,627] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.0010094596681741567, 'weight_decay': 0.003, 'warmup_steps': 43, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3598,1.528839,0.821101,0.830573,0.82264,0.82025
2,0.5451,1.128236,0.852064,0.85244,0.851741,0.8519
3,0.3146,1.155776,0.854358,0.854539,0.854119,0.854238
4,0.2006,1.236885,0.856651,0.858008,0.856077,0.856334
5,0.1447,1.313438,0.849771,0.850547,0.84932,0.849528
6,0.1105,1.267156,0.855505,0.856385,0.855035,0.855258


[I 2025-03-25 12:24:03,997] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0012451621036718194, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2524,1.582926,0.825688,0.834237,0.827145,0.824966
2,0.5103,1.12372,0.854358,0.854739,0.854035,0.854197
3,0.2873,1.129167,0.854358,0.855915,0.85374,0.854003
4,0.1846,1.275622,0.845183,0.85119,0.843974,0.84415
5,0.1329,1.303588,0.856651,0.857038,0.856329,0.856493
6,0.0991,1.222071,0.847477,0.848587,0.846942,0.847171


[I 2025-03-25 12:25:17,495] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0009128036524545295, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3376,1.822919,0.809633,0.82405,0.811548,0.808098
2,0.5665,1.147998,0.850917,0.850985,0.850741,0.850822
3,0.3319,1.108748,0.864679,0.865431,0.864254,0.864473
4,0.2173,1.189296,0.858945,0.860113,0.858413,0.858662
5,0.1568,1.268826,0.862385,0.863883,0.861792,0.862065
6,0.1191,1.242397,0.854358,0.856649,0.853614,0.853896
7,0.0929,1.238847,0.861239,0.861351,0.861044,0.861142
8,0.0755,1.225603,0.862385,0.862537,0.862171,0.862281
9,0.064,1.238392,0.858945,0.858887,0.858961,0.858914
10,0.0563,1.258771,0.855505,0.855521,0.855372,0.855429


[I 2025-03-25 12:27:10,516] Trial 131 finished with value: 0.8554285353375861 and parameters: {'learning_rate': 0.0009128036524545295, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 5.5}. Best is trial 103 with value: 0.8691827872088432.


Trial 132 with params: {'learning_rate': 0.000950721772090261, 'weight_decay': 0.01, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2952,1.611994,0.815367,0.822071,0.816673,0.814782
2,0.5553,1.240124,0.854358,0.854352,0.854246,0.854289
3,0.3243,1.19163,0.852064,0.853016,0.851572,0.851797


[I 2025-03-25 12:27:44,073] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0012977394780824265, 'weight_decay': 0.0, 'warmup_steps': 39, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3,1.503184,0.823394,0.828014,0.824472,0.823058
2,0.5028,1.091854,0.864679,0.86619,0.864086,0.864364
3,0.2768,1.067344,0.864679,0.864922,0.864423,0.864558
4,0.1776,1.210047,0.854358,0.858912,0.853319,0.853594
5,0.1274,1.236811,0.862385,0.862339,0.862339,0.862339
6,0.0931,1.191001,0.848624,0.849476,0.848152,0.848365


[I 2025-03-25 12:28:49,374] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0005461188644789779, 'weight_decay': 0.006, 'warmup_steps': 31, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4891,1.620223,0.821101,0.826844,0.822304,0.820644
2,0.6975,1.275425,0.841743,0.843795,0.841016,0.841262
3,0.4406,1.247615,0.849771,0.849869,0.849573,0.849666


[I 2025-03-25 12:29:23,095] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.000924064335483476, 'weight_decay': 0.002, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3594,1.683425,0.817661,0.830782,0.819473,0.816374
2,0.5656,1.200075,0.847477,0.847738,0.847194,0.847331
3,0.3285,1.165143,0.853211,0.855109,0.85253,0.852801
4,0.2164,1.145718,0.864679,0.865598,0.864212,0.864448
5,0.1554,1.301942,0.861239,0.86284,0.860623,0.8609
6,0.1189,1.234598,0.855505,0.858844,0.854614,0.854906
7,0.0936,1.219856,0.860092,0.860038,0.860129,0.860065
8,0.0762,1.224331,0.860092,0.86024,0.859876,0.859986
9,0.0653,1.239462,0.857798,0.857912,0.858003,0.857795
10,0.0569,1.272724,0.854358,0.854462,0.854161,0.854256


[I 2025-03-25 12:31:18,776] Trial 135 finished with value: 0.8542564041823769 and parameters: {'learning_rate': 0.000924064335483476, 'weight_decay': 0.002, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 103 with value: 0.8691827872088432.


Trial 136 with params: {'learning_rate': 0.001156179201901999, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2487,1.463589,0.826835,0.83206,0.827976,0.826451
2,0.5256,1.149367,0.861239,0.861261,0.861381,0.86123
3,0.3036,1.106485,0.860092,0.860429,0.859792,0.859947
4,0.196,1.15228,0.855505,0.857673,0.854782,0.855065
5,0.1397,1.280135,0.853211,0.854652,0.852614,0.85287
6,0.1027,1.139472,0.856651,0.857038,0.856329,0.856493


[I 2025-03-25 12:32:24,805] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0005800710580054836, 'weight_decay': 0.003, 'warmup_steps': 39, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4873,1.752616,0.81078,0.820253,0.812337,0.809849
2,0.6887,1.30629,0.84289,0.846559,0.841932,0.842167
3,0.4236,1.288964,0.854358,0.857847,0.853446,0.853733
4,0.2914,1.338306,0.848624,0.852901,0.847605,0.847855
5,0.2143,1.339478,0.855505,0.855456,0.855456,0.855456
6,0.1673,1.272352,0.868119,0.868661,0.867759,0.867952
7,0.1362,1.313884,0.861239,0.862231,0.86075,0.860988
8,0.1125,1.322793,0.863532,0.863475,0.863549,0.863502
9,0.0951,1.346829,0.861239,0.8617,0.861592,0.861237
10,0.0845,1.341524,0.861239,0.861351,0.861044,0.861142


[I 2025-03-25 12:34:19,365] Trial 137 finished with value: 0.8611419283942331 and parameters: {'learning_rate': 0.0005800710580054836, 'weight_decay': 0.003, 'warmup_steps': 39, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 138 with params: {'learning_rate': 0.000486736344672825, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5137,1.612577,0.822248,0.825369,0.823135,0.822051
2,0.7378,1.338375,0.834862,0.837948,0.833965,0.834178
3,0.4754,1.397508,0.841743,0.846657,0.840637,0.840831
4,0.3344,1.349074,0.853211,0.85536,0.852488,0.852765
5,0.252,1.384473,0.847477,0.847648,0.847236,0.847352
6,0.1966,1.293232,0.866972,0.867445,0.866633,0.866815
7,0.1591,1.368438,0.860092,0.860492,0.860424,0.860091
8,0.1332,1.370159,0.861239,0.861286,0.861087,0.861158
9,0.1142,1.359078,0.863532,0.863555,0.863676,0.863523
10,0.1015,1.370769,0.865826,0.865827,0.865717,0.865762


[I 2025-03-25 12:36:20,039] Trial 138 finished with value: 0.8657619572039268 and parameters: {'learning_rate': 0.000486736344672825, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 1.0, 'temperature': 2.5}. Best is trial 103 with value: 0.8691827872088432.


Trial 139 with params: {'learning_rate': 0.0003957331698592127, 'weight_decay': 0.002, 'warmup_steps': 29, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6024,1.518686,0.819954,0.823059,0.820841,0.819755
2,0.8145,1.326593,0.833716,0.835139,0.833091,0.83331
3,0.5314,1.350635,0.84633,0.84988,0.845394,0.845647


[I 2025-03-25 12:36:55,691] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0002349441777966634, 'weight_decay': 0.0, 'warmup_steps': 26, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7819,1.516539,0.800459,0.803858,0.801402,0.80019
2,0.9868,1.387403,0.826835,0.828421,0.826166,0.826372
3,0.7251,1.462871,0.822248,0.828598,0.820946,0.820923
4,0.5698,1.42503,0.834862,0.834913,0.83468,0.834757
5,0.4619,1.491744,0.833716,0.833696,0.833807,0.833698
6,0.3887,1.495804,0.841743,0.84169,0.84169,0.84169


[I 2025-03-25 12:38:00,852] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0009727345719405654, 'weight_decay': 0.007, 'warmup_steps': 40, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3608,1.445709,0.829128,0.836532,0.830481,0.828542
2,0.5499,1.207103,0.857798,0.857912,0.858003,0.857795
3,0.3198,1.196362,0.857798,0.860847,0.85695,0.857251
4,0.2097,1.179717,0.858945,0.861013,0.858245,0.858534
5,0.1486,1.27171,0.850917,0.852345,0.85032,0.850571
6,0.1141,1.18937,0.860092,0.861575,0.859497,0.859766
7,0.0886,1.168529,0.864679,0.86466,0.864591,0.864621
8,0.072,1.174011,0.860092,0.860169,0.859918,0.860003
9,0.0606,1.199736,0.862385,0.862385,0.862507,0.862374
10,0.0534,1.211771,0.860092,0.86024,0.859876,0.859986


[I 2025-03-25 12:39:56,303] Trial 141 finished with value: 0.859985680592992 and parameters: {'learning_rate': 0.0009727345719405654, 'weight_decay': 0.007, 'warmup_steps': 40, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 142 with params: {'learning_rate': 0.000443641389576366, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5519,1.725542,0.806193,0.817156,0.807874,0.805042
2,0.7622,1.403229,0.832569,0.836949,0.831502,0.831662
3,0.4918,1.486832,0.844037,0.853716,0.842511,0.842505
4,0.3563,1.388412,0.841743,0.845564,0.840764,0.84099
5,0.2691,1.402293,0.84633,0.846393,0.846152,0.846232
6,0.2147,1.342216,0.860092,0.860034,0.860087,0.860056
7,0.1741,1.403255,0.856651,0.856929,0.856372,0.856514
8,0.1461,1.393963,0.855505,0.85545,0.85554,0.855477
9,0.1256,1.42,0.854358,0.85451,0.854582,0.854356
10,0.1121,1.417657,0.857798,0.85775,0.85775,0.85775


[I 2025-03-25 12:41:58,340] Trial 142 finished with value: 0.8577502736381242 and parameters: {'learning_rate': 0.000443641389576366, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 103 with value: 0.8691827872088432.


Trial 143 with params: {'learning_rate': 0.0011931656792572123, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.248,1.465316,0.823394,0.828014,0.824472,0.823058
2,0.5114,1.149057,0.856651,0.856605,0.856708,0.856629
3,0.2932,1.258544,0.84633,0.850941,0.845268,0.845498


[I 2025-03-25 12:42:35,267] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.000749729276503193, 'weight_decay': 0.0, 'warmup_steps': 30, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4193,1.749447,0.809633,0.819871,0.811253,0.808602
2,0.6169,1.229981,0.84289,0.843361,0.842521,0.842691
3,0.3719,1.240693,0.854358,0.854399,0.854204,0.854273
4,0.247,1.237317,0.855505,0.856385,0.855035,0.855258
5,0.178,1.310507,0.848624,0.849648,0.84811,0.848336
6,0.1359,1.250285,0.850917,0.851779,0.850446,0.850663


[I 2025-03-25 12:43:44,873] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0004121462872357458, 'weight_decay': 0.0, 'warmup_steps': 17, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5649,1.506859,0.816514,0.82037,0.817504,0.816234
2,0.794,1.314338,0.840596,0.84186,0.840016,0.840243
3,0.5218,1.45039,0.845183,0.852562,0.843847,0.843967


[I 2025-03-25 12:44:21,375] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0005166911514170459, 'weight_decay': 0.0, 'warmup_steps': 22, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4897,1.826901,0.798165,0.812118,0.800076,0.796537
2,0.7142,1.295785,0.84289,0.846559,0.841932,0.842167
3,0.4536,1.373012,0.838303,0.840712,0.837512,0.837748
4,0.3176,1.307136,0.848624,0.851569,0.847773,0.848041
5,0.24,1.340892,0.849771,0.850141,0.849446,0.849604
6,0.1867,1.302178,0.868119,0.868239,0.867928,0.868027
7,0.1512,1.303187,0.866972,0.86713,0.866759,0.866872
8,0.1266,1.329789,0.856651,0.856758,0.856456,0.856552
9,0.1073,1.354954,0.854358,0.854381,0.854498,0.854348
10,0.0955,1.360498,0.857798,0.857944,0.857582,0.85769


[I 2025-03-25 12:46:20,068] Trial 146 finished with value: 0.8576903638814016 and parameters: {'learning_rate': 0.0005166911514170459, 'weight_decay': 0.0, 'warmup_steps': 22, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 103 with value: 0.8691827872088432.


Trial 147 with params: {'learning_rate': 0.0009743930042906128, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3319,1.423773,0.818807,0.823375,0.819883,0.818463
2,0.5562,1.132292,0.861239,0.862231,0.86075,0.860988
3,0.3248,1.072387,0.857798,0.858379,0.857413,0.857606
4,0.2081,1.153406,0.863532,0.865149,0.862918,0.863199
5,0.1493,1.241366,0.858945,0.859337,0.858624,0.858789
6,0.1125,1.198993,0.861239,0.862231,0.86075,0.860988
7,0.0887,1.219844,0.868119,0.868067,0.868096,0.86808
8,0.0712,1.188349,0.869266,0.869352,0.869096,0.869183
9,0.06,1.192717,0.869266,0.869232,0.869348,0.869249
10,0.0529,1.221606,0.862385,0.862728,0.862086,0.862243


[I 2025-03-25 12:48:17,053] Trial 147 finished with value: 0.8622432859399685 and parameters: {'learning_rate': 0.0009743930042906128, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.4, 'temperature': 5.5}. Best is trial 103 with value: 0.8691827872088432.


Trial 148 with params: {'learning_rate': 0.003199645143713299, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0695,1.224997,0.832569,0.833692,0.833102,0.832537
2,0.3781,1.180843,0.853211,0.853649,0.852867,0.853037
3,0.2003,1.176175,0.854358,0.854999,0.853951,0.854149
4,0.1252,1.229715,0.845183,0.845794,0.844774,0.844961
5,0.0831,1.333551,0.83945,0.839394,0.83948,0.839419
6,0.0625,1.251139,0.848624,0.848573,0.848573,0.848573


[I 2025-03-25 12:49:23,366] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0002803349651617039, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7056,1.489692,0.808486,0.81047,0.809201,0.808375
2,0.9372,1.332799,0.837156,0.837552,0.836806,0.836963
3,0.6685,1.43342,0.829128,0.833631,0.828039,0.828174
4,0.5112,1.465675,0.824541,0.827356,0.823661,0.82384
5,0.4034,1.478586,0.832569,0.832618,0.832723,0.832561
6,0.3334,1.564063,0.838303,0.839554,0.837722,0.837944


[I 2025-03-25 12:50:30,210] Trial 149 pruned. 


In [32]:
print(best_trial_distill)

BestRun(run_id='103', objective=0.8691827872088432, hyperparameters={'learning_rate': 0.0008555235299408995, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.1, 'temperature': 5.5}, run_summary=None)


Přepočet kroků s ohledem na změnu velikosti datasetu.

In [40]:
data_length = len(all_train_data)
min_r = math.ceil(data_length/batch_size)*3
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

In [41]:
base.reset_seed()

## Prohledávání s normálním tréninkem nad augmentovaným datasetem
Konfigurace jednotlivých tréninků.

In [42]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-base-embedd-aug_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-base-embedd-aug_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [43]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [44]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [45]:
trainer = Trainer(
    args=training_args,
    train_dataset=all_train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM(),
)
  

Nastavení prohledávání.

In [46]:
best_trial_normal_aug = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-aug-embedd",
    n_trials=150
)

[I 2025-03-25 12:50:30,502] A new study created in memory with name: Base-aug-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 169}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2426,0.49718,0.832569,0.837702,0.831418,0.831544
2,0.1109,0.513967,0.847477,0.847512,0.84732,0.847389
3,0.0752,0.66117,0.848624,0.848564,0.848615,0.848585


[I 2025-03-25 12:51:56,792] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 36}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1813,0.459556,0.845183,0.846903,0.844521,0.84477
2,0.0719,0.643256,0.822248,0.827372,0.821072,0.821128
3,0.0434,0.822563,0.841743,0.842151,0.841395,0.841556
4,0.0273,1.019603,0.827982,0.827946,0.82805,0.827959
5,0.0178,1.384387,0.829128,0.831194,0.828376,0.828587
6,0.0113,1.49835,0.824541,0.826332,0.82383,0.82403


[I 2025-03-25 12:54:47,168] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 138}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3403,0.42189,0.821101,0.821092,0.820956,0.821007
2,0.213,0.451162,0.819954,0.820362,0.819578,0.819726
3,0.1665,0.489447,0.829128,0.829207,0.829302,0.829123


[I 2025-03-25 12:56:14,401] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 224}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1797,0.458199,0.844037,0.845422,0.843437,0.843674
2,0.0644,0.626143,0.824541,0.827649,0.823619,0.823788
3,0.0371,0.979736,0.837156,0.837156,0.83727,0.837142
4,0.0221,1.282313,0.830275,0.830212,0.83026,0.830231
5,0.0133,1.33121,0.821101,0.821839,0.82062,0.820795
6,0.0085,1.462022,0.817661,0.818062,0.817283,0.81743


[I 2025-03-25 12:59:11,465] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 42}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1517,0.485518,0.837156,0.83781,0.836722,0.836908
2,0.0555,0.690007,0.830275,0.831576,0.829671,0.829881
3,0.0308,1.060892,0.834862,0.835821,0.834344,0.834548


[I 2025-03-25 13:00:39,191] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 121}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2971,0.427925,0.831422,0.831862,0.83105,0.831209
2,0.1654,0.485036,0.833716,0.833795,0.833891,0.83371
3,0.1229,0.599963,0.827982,0.828365,0.828303,0.827981
4,0.1004,0.633675,0.822248,0.825369,0.823135,0.822051
5,0.0846,0.640846,0.845183,0.84544,0.8449,0.845035
6,0.0727,0.715613,0.837156,0.837352,0.836891,0.837011
7,0.0637,0.758864,0.84289,0.843361,0.842521,0.842691
8,0.0565,0.8645,0.836009,0.836889,0.835512,0.835713
9,0.051,0.87582,0.837156,0.837352,0.836891,0.837011
10,0.0466,0.925005,0.836009,0.836091,0.835807,0.835895


[I 2025-03-25 13:05:39,418] Trial 5 finished with value: 0.8358950062840937 and parameters: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 121}. Best is trial 5 with value: 0.8358950062840937.


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 141}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2251,0.462404,0.840596,0.84186,0.840016,0.840243
2,0.0982,0.549467,0.838303,0.839366,0.837764,0.837979
3,0.0647,0.698539,0.84289,0.842831,0.842858,0.842843
4,0.0458,0.801987,0.822248,0.822326,0.822419,0.822242
5,0.0329,0.969331,0.830275,0.830905,0.829839,0.830017
6,0.0238,1.146299,0.829128,0.830326,0.828545,0.82875


[I 2025-03-25 13:08:40,241] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 84}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3095,0.423324,0.830275,0.831051,0.829797,0.829985
2,0.1814,0.485571,0.825688,0.825649,0.825587,0.825614
3,0.1376,0.555279,0.830275,0.830275,0.830386,0.830261
4,0.1128,0.612951,0.821101,0.82534,0.822135,0.820795
5,0.0968,0.617716,0.83945,0.839395,0.839395,0.839395
6,0.0849,0.656728,0.840596,0.841061,0.840227,0.840395
7,0.0756,0.706853,0.841743,0.841801,0.841564,0.841642
8,0.0687,0.796896,0.837156,0.837445,0.836849,0.836988
9,0.063,0.807972,0.833716,0.833865,0.83347,0.833579
10,0.0586,0.845708,0.836009,0.836091,0.835807,0.835895


[I 2025-03-25 13:13:41,663] Trial 7 finished with value: 0.8358950062840937 and parameters: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 84}. Best is trial 5 with value: 0.8358950062840937.


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 46}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2118,0.461989,0.84633,0.846639,0.846026,0.846172
2,0.0926,0.570135,0.84289,0.845079,0.842142,0.842392
3,0.0605,0.659599,0.84633,0.84634,0.846194,0.846249
4,0.0421,0.803398,0.830275,0.830778,0.830639,0.830272
5,0.0299,0.945208,0.827982,0.829081,0.827419,0.827619
6,0.0212,1.08784,0.827982,0.830695,0.827124,0.827319


[I 2025-03-25 13:16:40,584] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1956,0.449197,0.850917,0.851236,0.850615,0.850764
2,0.0827,0.597834,0.827982,0.833828,0.826745,0.826801
3,0.0525,0.721678,0.845183,0.845277,0.844984,0.845076
4,0.0351,0.885271,0.818807,0.818807,0.818915,0.818792
5,0.0241,1.134416,0.824541,0.826566,0.823788,0.823985
6,0.0163,1.328876,0.821101,0.824018,0.820199,0.82036


[I 2025-03-25 13:19:39,119] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 5.765419213017514e-05, 'weight_decay': 0.0, 'warmup_steps': 195}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3533,0.417714,0.817661,0.817631,0.817536,0.817574
2,0.2247,0.436148,0.824541,0.82461,0.824335,0.824419
3,0.178,0.474189,0.823394,0.823433,0.823209,0.823282


[I 2025-03-25 13:21:06,449] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 8.864358030226235e-05, 'weight_decay': 0.003, 'warmup_steps': 34}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3117,0.417755,0.829128,0.829106,0.829008,0.829047
2,0.1869,0.473548,0.831422,0.83136,0.831386,0.831372
3,0.143,0.54378,0.831422,0.831375,0.83147,0.831395
4,0.1179,0.619416,0.822248,0.825993,0.823219,0.821993
5,0.1014,0.629382,0.834862,0.834912,0.835017,0.834855
6,0.0896,0.649938,0.840596,0.840945,0.840269,0.84042
7,0.0802,0.701175,0.836009,0.835992,0.835891,0.835931
8,0.0733,0.786367,0.841743,0.842042,0.841437,0.84158
9,0.0676,0.789338,0.838303,0.838388,0.838101,0.83819
10,0.0635,0.825015,0.836009,0.835992,0.835891,0.835931


[I 2025-03-25 13:26:03,568] Trial 11 finished with value: 0.8359312810270215 and parameters: {'learning_rate': 8.864358030226235e-05, 'weight_decay': 0.003, 'warmup_steps': 34}. Best is trial 11 with value: 0.8359312810270215.


Trial 12 with params: {'learning_rate': 0.00014093878268222688, 'weight_decay': 0.005, 'warmup_steps': 61}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2792,0.446655,0.826835,0.830281,0.825871,0.826038
2,0.1512,0.490792,0.832569,0.832618,0.832723,0.832561
3,0.1107,0.652501,0.837156,0.838289,0.83769,0.837125


[I 2025-03-25 13:27:30,744] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 5.372291923575569e-05, 'weight_decay': 0.001, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3477,0.421099,0.815367,0.815312,0.815284,0.815297
2,0.231,0.43908,0.822248,0.822918,0.821788,0.821961
3,0.1842,0.483735,0.818807,0.81969,0.818283,0.818463
4,0.1557,0.536172,0.815367,0.821229,0.816589,0.814874
5,0.1372,0.56001,0.821101,0.821871,0.821546,0.821086
6,0.1236,0.580911,0.825688,0.825737,0.82584,0.82568


[I 2025-03-25 13:30:26,229] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 6.784665453172506e-05, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3315,0.421537,0.823394,0.823354,0.823293,0.823319
2,0.2099,0.450042,0.818807,0.8189,0.818578,0.81867
3,0.1636,0.494016,0.832569,0.832758,0.832807,0.832568


[I 2025-03-25 13:32:02,396] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.00010953168876306178, 'weight_decay': 0.006, 'warmup_steps': 198}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.306,0.42298,0.830275,0.830654,0.829923,0.830074
2,0.1696,0.481657,0.827982,0.828031,0.828134,0.827974
3,0.1271,0.589791,0.832569,0.832849,0.832849,0.832569
4,0.1041,0.619075,0.827982,0.831298,0.828892,0.827778
5,0.0884,0.639578,0.840596,0.840844,0.840311,0.840443
6,0.0765,0.711194,0.837156,0.837352,0.836891,0.837011
7,0.0674,0.750459,0.844037,0.84434,0.843732,0.843876
8,0.0603,0.855544,0.836009,0.837247,0.835428,0.835646
9,0.0548,0.85218,0.838303,0.83833,0.838143,0.838209
10,0.0503,0.903717,0.836009,0.835992,0.835891,0.835931


[I 2025-03-25 13:37:02,366] Trial 15 finished with value: 0.8359312810270215 and parameters: {'learning_rate': 0.00010953168876306178, 'weight_decay': 0.006, 'warmup_steps': 198}. Best is trial 11 with value: 0.8359312810270215.


Trial 16 with params: {'learning_rate': 8.255130227805408e-05, 'weight_decay': 0.005, 'warmup_steps': 199}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3262,0.417713,0.825688,0.825623,0.825671,0.825643
2,0.1922,0.465076,0.823394,0.823354,0.823293,0.823319
3,0.1478,0.523818,0.822248,0.822326,0.822419,0.822242


[I 2025-03-25 13:38:31,600] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0020085822314002493, 'weight_decay': 0.008, 'warmup_steps': 186}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.165,0.494048,0.83945,0.841242,0.838764,0.839002
2,0.0589,0.668865,0.822248,0.823229,0.821704,0.821891
3,0.0327,1.130525,0.825688,0.825865,0.825419,0.825533
4,0.0196,1.31986,0.822248,0.822554,0.821914,0.822051
5,0.0119,1.403173,0.829128,0.829825,0.828671,0.828853
6,0.0074,1.898036,0.821101,0.822542,0.820451,0.820644


[I 2025-03-25 13:41:26,969] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 6.846725874252589e-05, 'weight_decay': 0.008, 'warmup_steps': 221}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3415,0.41802,0.822248,0.822221,0.822125,0.822163
2,0.2093,0.449854,0.822248,0.822662,0.821872,0.822023
3,0.163,0.489571,0.824541,0.824689,0.824756,0.824539
4,0.1363,0.576159,0.827982,0.830996,0.82885,0.827804
5,0.1185,0.596793,0.823394,0.823775,0.823714,0.823394
6,0.1056,0.625153,0.837156,0.837121,0.837227,0.837135
7,0.096,0.657292,0.831422,0.831443,0.83126,0.831324
8,0.0888,0.71997,0.831422,0.832128,0.830965,0.83115
9,0.0833,0.721766,0.826835,0.827152,0.826503,0.826643
10,0.0792,0.744776,0.827982,0.827978,0.82784,0.827891


[I 2025-03-25 13:46:24,032] Trial 18 finished with value: 0.8278911134971263 and parameters: {'learning_rate': 6.846725874252589e-05, 'weight_decay': 0.008, 'warmup_steps': 221}. Best is trial 11 with value: 0.8359312810270215.


Trial 19 with params: {'learning_rate': 0.00031715506418016835, 'weight_decay': 0.006, 'warmup_steps': 218}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2386,0.490034,0.826835,0.82888,0.826082,0.826286
2,0.1056,0.53057,0.844037,0.84434,0.843732,0.843876
3,0.0708,0.684588,0.841743,0.84169,0.84169,0.84169
4,0.0512,0.755618,0.841743,0.842027,0.842027,0.841743
5,0.0375,0.915831,0.831422,0.832282,0.830923,0.831118
6,0.0279,0.978635,0.834862,0.836191,0.834259,0.834478
7,0.021,1.035008,0.837156,0.8371,0.837185,0.837125
8,0.0152,1.318128,0.830275,0.831576,0.829671,0.829881
9,0.0111,1.546993,0.836009,0.836462,0.835638,0.835802
10,0.0081,1.690648,0.834862,0.835147,0.834554,0.834692


[I 2025-03-25 13:51:24,549] Trial 19 finished with value: 0.834691943127962 and parameters: {'learning_rate': 0.00031715506418016835, 'weight_decay': 0.006, 'warmup_steps': 218}. Best is trial 11 with value: 0.8359312810270215.


Trial 20 with params: {'learning_rate': 0.0002457814542721864, 'weight_decay': 0.002, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.24,0.510945,0.829128,0.836559,0.827745,0.827714
2,0.117,0.498811,0.850917,0.850892,0.850825,0.850854
3,0.0812,0.645954,0.847477,0.847436,0.847405,0.847419
4,0.0609,0.721556,0.834862,0.836181,0.835438,0.83482
5,0.0468,0.843096,0.838303,0.839756,0.83768,0.837909
6,0.0361,0.883544,0.831422,0.832633,0.830839,0.831049


[I 2025-03-25 13:54:22,496] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.0035048263769005107, 'weight_decay': 0.009000000000000001, 'warmup_steps': 95}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1506,0.527434,0.834862,0.835253,0.834512,0.834667
2,0.0549,0.753827,0.830275,0.832738,0.82946,0.82967
3,0.0322,1.226807,0.836009,0.836032,0.836143,0.835999


[I 2025-03-25 13:55:58,664] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.00016201045089681351, 'weight_decay': 0.004, 'warmup_steps': 153}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2771,0.434867,0.830275,0.83055,0.829965,0.8301
2,0.1429,0.503905,0.831422,0.831991,0.831807,0.831416
3,0.1033,0.665922,0.837156,0.838111,0.837648,0.837135
4,0.0818,0.682193,0.837156,0.839939,0.837985,0.837011
5,0.0665,0.734196,0.834862,0.836191,0.834259,0.834478
6,0.0546,0.77024,0.837156,0.837445,0.836849,0.836988
7,0.0463,0.833706,0.84289,0.843244,0.842563,0.842716
8,0.039,0.982693,0.827982,0.830162,0.827208,0.827414
9,0.0335,1.027883,0.833716,0.834289,0.833302,0.833477
10,0.0292,1.093886,0.833716,0.833739,0.833554,0.833619


[I 2025-03-25 14:01:22,696] Trial 22 finished with value: 0.833619100379897 and parameters: {'learning_rate': 0.00016201045089681351, 'weight_decay': 0.004, 'warmup_steps': 153}. Best is trial 11 with value: 0.8359312810270215.


Trial 23 with params: {'learning_rate': 5.530620761752875e-05, 'weight_decay': 0.004, 'warmup_steps': 133}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3514,0.418619,0.816514,0.816546,0.816326,0.816397
2,0.2284,0.435258,0.822248,0.82238,0.821998,0.822101
3,0.1815,0.477361,0.816514,0.816858,0.816157,0.816296


[I 2025-03-25 14:02:58,790] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.00016175884703099126, 'weight_decay': 0.0, 'warmup_steps': 141}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.275,0.449224,0.825688,0.829278,0.824703,0.824859
2,0.1421,0.492365,0.836009,0.836342,0.836312,0.836009
3,0.1025,0.663141,0.841743,0.842885,0.842279,0.841713
4,0.0809,0.665593,0.849771,0.851232,0.850373,0.849726
5,0.0657,0.734463,0.84289,0.843977,0.842353,0.842575
6,0.0539,0.76884,0.844037,0.844451,0.843689,0.843852
7,0.0456,0.821226,0.845183,0.845277,0.844984,0.845076
8,0.0384,0.977373,0.829128,0.831194,0.828376,0.828587
9,0.0328,1.029823,0.838303,0.839035,0.837848,0.838042
10,0.0286,1.096628,0.837156,0.837352,0.836891,0.837011


[I 2025-03-25 14:08:10,983] Trial 24 finished with value: 0.8370110621449294 and parameters: {'learning_rate': 0.00016175884703099126, 'weight_decay': 0.0, 'warmup_steps': 141}. Best is trial 24 with value: 0.8370110621449294.


Trial 25 with params: {'learning_rate': 0.0003359562987022477, 'weight_decay': 0.0, 'warmup_steps': 137}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2295,0.472771,0.83945,0.841242,0.838764,0.839002
2,0.1021,0.541096,0.84289,0.844167,0.842311,0.842542
3,0.0679,0.703547,0.844037,0.844007,0.843942,0.84397
4,0.0487,0.782294,0.838303,0.83875,0.838648,0.838301
5,0.0355,0.934076,0.830275,0.830905,0.829839,0.830017
6,0.0262,1.034201,0.831422,0.83283,0.830797,0.831011
7,0.0194,1.094918,0.826835,0.826858,0.826966,0.826824
8,0.014,1.386565,0.824541,0.825535,0.823998,0.82419
9,0.0101,1.611506,0.836009,0.836034,0.835849,0.835914
10,0.0072,1.784733,0.834862,0.834864,0.834723,0.834775


[I 2025-03-25 14:13:07,764] Trial 25 finished with value: 0.8347754689572412 and parameters: {'learning_rate': 0.0003359562987022477, 'weight_decay': 0.0, 'warmup_steps': 137}. Best is trial 24 with value: 0.8370110621449294.


Trial 26 with params: {'learning_rate': 0.00010236049069759917, 'weight_decay': 0.0, 'warmup_steps': 72}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3029,0.421674,0.833716,0.834289,0.833302,0.833477
2,0.1756,0.482414,0.826835,0.826773,0.82684,0.826796
3,0.1327,0.575388,0.833716,0.833738,0.833849,0.833705
4,0.1086,0.634437,0.821101,0.825694,0.822177,0.820761
5,0.0927,0.633127,0.841743,0.84169,0.84169,0.84169
6,0.0809,0.67892,0.834862,0.835055,0.834596,0.834715


[I 2025-03-25 14:16:02,938] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.00021059103361382344, 'weight_decay': 0.001, 'warmup_steps': 224}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2645,0.488408,0.81422,0.818293,0.813147,0.813214
2,0.127,0.505209,0.83945,0.839562,0.839648,0.839446
3,0.089,0.659029,0.853211,0.853404,0.853456,0.85321
4,0.0681,0.706222,0.844037,0.845003,0.844531,0.844016
5,0.0534,0.803567,0.836009,0.837247,0.835428,0.835646
6,0.0422,0.828297,0.833716,0.83494,0.833133,0.833347
7,0.0341,0.881199,0.848624,0.848757,0.848404,0.848509
8,0.0274,1.069544,0.831422,0.833508,0.830671,0.830888
9,0.0223,1.172304,0.834862,0.835253,0.834512,0.834667
10,0.0183,1.255784,0.83945,0.839454,0.839311,0.839365


[I 2025-03-25 14:21:05,488] Trial 27 finished with value: 0.8393650392639844 and parameters: {'learning_rate': 0.00021059103361382344, 'weight_decay': 0.001, 'warmup_steps': 224}. Best is trial 27 with value: 0.8393650392639844.


Trial 28 with params: {'learning_rate': 0.00017728590975156735, 'weight_decay': 0.002, 'warmup_steps': 195}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2731,0.449937,0.821101,0.822978,0.820367,0.820557
2,0.1361,0.500609,0.838303,0.838636,0.838606,0.838303
3,0.0976,0.672316,0.844037,0.845003,0.844531,0.844016
4,0.0761,0.6885,0.844037,0.845589,0.844658,0.843984
5,0.0611,0.759184,0.831422,0.833268,0.830713,0.830931
6,0.0495,0.790601,0.837156,0.838126,0.836638,0.836846
7,0.0412,0.837393,0.844037,0.84434,0.843732,0.843876
8,0.0342,1.014884,0.831422,0.833764,0.830629,0.830843
9,0.0287,1.069992,0.833716,0.834289,0.833302,0.833477
10,0.0246,1.140008,0.837156,0.837352,0.836891,0.837011


[I 2025-03-25 14:26:04,062] Trial 28 finished with value: 0.8370110621449294 and parameters: {'learning_rate': 0.00017728590975156735, 'weight_decay': 0.002, 'warmup_steps': 195}. Best is trial 27 with value: 0.8393650392639844.


Trial 29 with params: {'learning_rate': 0.0002697548467014429, 'weight_decay': 0.001, 'warmup_steps': 213}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2484,0.492385,0.826835,0.830944,0.825787,0.825926
2,0.1135,0.518233,0.84633,0.846393,0.846152,0.846232
3,0.0772,0.677503,0.84633,0.84627,0.846321,0.846291
4,0.0573,0.72668,0.84289,0.843612,0.843321,0.84288
5,0.0432,0.879121,0.829128,0.830145,0.828587,0.828786
6,0.0329,0.870945,0.837156,0.837674,0.836764,0.836936


[I 2025-03-25 14:29:03,785] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 9.633933577350104e-05, 'weight_decay': 0.002, 'warmup_steps': 220}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3167,0.423431,0.826835,0.82784,0.826292,0.826488
2,0.1794,0.486696,0.821101,0.821035,0.821083,0.821055
3,0.1358,0.555693,0.829128,0.829151,0.82926,0.829117
4,0.1115,0.602545,0.825688,0.829631,0.826682,0.825423
5,0.0955,0.624358,0.83945,0.83945,0.839564,0.839436
6,0.0837,0.671147,0.841743,0.841867,0.841521,0.841623
7,0.0744,0.713779,0.840596,0.840626,0.840437,0.840504
8,0.0674,0.806514,0.833716,0.834289,0.833302,0.833477
9,0.0619,0.801339,0.834862,0.834828,0.834765,0.834792
10,0.0575,0.843161,0.836009,0.835963,0.835933,0.835947


[I 2025-03-25 14:34:01,260] Trial 30 finished with value: 0.8359468224366691 and parameters: {'learning_rate': 9.633933577350104e-05, 'weight_decay': 0.002, 'warmup_steps': 220}. Best is trial 27 with value: 0.8393650392639844.


Trial 31 with params: {'learning_rate': 8.205970172946259e-05, 'weight_decay': 0.002, 'warmup_steps': 224}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3282,0.419015,0.826835,0.826772,0.826798,0.826784
2,0.1934,0.467073,0.829128,0.829106,0.829008,0.829047
3,0.1487,0.531121,0.829128,0.829109,0.829218,0.82911


[I 2025-03-25 14:35:30,454] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.00010525836606385927, 'weight_decay': 0.0, 'warmup_steps': 142}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3045,0.423959,0.836009,0.836462,0.835638,0.835802
2,0.1725,0.479533,0.829128,0.829151,0.82926,0.829117
3,0.1294,0.581007,0.829128,0.829151,0.82926,0.829117
4,0.106,0.618264,0.825688,0.828686,0.826556,0.825508
5,0.0901,0.628137,0.841743,0.841947,0.841479,0.841602
6,0.0784,0.689876,0.83945,0.83965,0.839185,0.839307
7,0.0693,0.737572,0.83945,0.83957,0.839227,0.839328
8,0.0622,0.837985,0.836009,0.836889,0.835512,0.835713
9,0.0567,0.840401,0.834862,0.835147,0.834554,0.834692
10,0.0523,0.886909,0.831422,0.831443,0.83126,0.831324


[I 2025-03-25 14:40:29,022] Trial 32 finished with value: 0.8313241914196197 and parameters: {'learning_rate': 0.00010525836606385927, 'weight_decay': 0.0, 'warmup_steps': 142}. Best is trial 27 with value: 0.8393650392639844.


Trial 33 with params: {'learning_rate': 0.00030107036825366743, 'weight_decay': 0.003, 'warmup_steps': 216}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2419,0.493875,0.824541,0.827649,0.823619,0.823788
2,0.1081,0.535173,0.844037,0.844715,0.843605,0.843799
3,0.0727,0.689981,0.84289,0.842831,0.842858,0.842843
4,0.0529,0.754482,0.841743,0.842391,0.842153,0.841736
5,0.0391,0.922783,0.831422,0.832282,0.830923,0.831118
6,0.0293,0.946472,0.83945,0.83965,0.839185,0.839307
7,0.0222,1.01713,0.837156,0.837094,0.837143,0.837114
8,0.0164,1.301532,0.823394,0.825291,0.822661,0.822858
9,0.0121,1.502365,0.836009,0.836248,0.835722,0.835852
10,0.009,1.664345,0.829128,0.829271,0.828881,0.828988


[I 2025-03-25 14:45:38,648] Trial 33 finished with value: 0.8289878764187064 and parameters: {'learning_rate': 0.00030107036825366743, 'weight_decay': 0.003, 'warmup_steps': 216}. Best is trial 27 with value: 0.8393650392639844.


Trial 34 with params: {'learning_rate': 0.00016703948209411328, 'weight_decay': 0.001, 'warmup_steps': 222}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2785,0.481647,0.818807,0.827653,0.817273,0.817028
2,0.1407,0.498616,0.838303,0.838636,0.838606,0.838303
3,0.1016,0.661365,0.84633,0.847134,0.846784,0.846317
4,0.08,0.670113,0.844037,0.845814,0.8447,0.84397
5,0.0647,0.743406,0.836009,0.837447,0.835386,0.835609
6,0.053,0.773257,0.836009,0.83659,0.835596,0.835774


[I 2025-03-25 14:48:46,072] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.00020067480388694813, 'weight_decay': 0.002, 'warmup_steps': 196}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2659,0.476448,0.818807,0.82262,0.817778,0.817887
2,0.1291,0.504281,0.83945,0.839499,0.839606,0.839442
3,0.091,0.659405,0.849771,0.850109,0.850078,0.84977
4,0.0699,0.701221,0.841743,0.842885,0.842279,0.841713
5,0.0551,0.800411,0.831422,0.833042,0.830755,0.830972
6,0.0439,0.807258,0.832569,0.833073,0.832176,0.832343


[I 2025-03-25 14:52:00,361] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.002248224121235652, 'weight_decay': 0.004, 'warmup_steps': 115}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1583,0.47877,0.84633,0.846542,0.846068,0.846194
2,0.0564,0.728606,0.811927,0.815017,0.810979,0.811091
3,0.0316,1.107855,0.825688,0.825865,0.825419,0.825533
4,0.0185,1.364826,0.821101,0.821092,0.820956,0.821007
5,0.0112,1.449437,0.824541,0.82522,0.824082,0.824258
6,0.0072,2.166911,0.800459,0.803985,0.799423,0.799445


[I 2025-03-25 14:55:13,892] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.00022009144856751574, 'weight_decay': 0.0, 'warmup_steps': 168}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2575,0.497459,0.826835,0.830944,0.825787,0.825926
2,0.124,0.505734,0.83945,0.839562,0.839648,0.839446
3,0.0864,0.662412,0.848624,0.848624,0.848741,0.848611
4,0.0657,0.711433,0.847477,0.848728,0.848036,0.847443
5,0.051,0.822276,0.834862,0.836191,0.834259,0.834478
6,0.0399,0.835233,0.834862,0.835821,0.834344,0.834548
7,0.0321,0.893979,0.84289,0.842847,0.842816,0.84283
8,0.0255,1.098934,0.824541,0.826566,0.823788,0.823985
9,0.0205,1.215912,0.834862,0.834977,0.834638,0.834737
10,0.0167,1.301243,0.836009,0.835992,0.835891,0.835931


[I 2025-03-25 15:00:36,607] Trial 37 finished with value: 0.8359312810270215 and parameters: {'learning_rate': 0.00022009144856751574, 'weight_decay': 0.0, 'warmup_steps': 168}. Best is trial 27 with value: 0.8393650392639844.


Trial 38 with params: {'learning_rate': 5.047945320315184e-05, 'weight_decay': 0.003, 'warmup_steps': 226}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3641,0.421626,0.811927,0.81188,0.811821,0.811846
2,0.2367,0.430897,0.824541,0.824677,0.824293,0.824397
3,0.1903,0.477765,0.817661,0.819395,0.816947,0.817129


[I 2025-03-25 15:02:12,239] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0001855717283286608, 'weight_decay': 0.002, 'warmup_steps': 177}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2695,0.456081,0.823394,0.825291,0.822661,0.822858
2,0.1341,0.497743,0.834862,0.835503,0.83527,0.834855
3,0.0954,0.666097,0.84289,0.843612,0.843321,0.84288
4,0.0741,0.688273,0.84289,0.84377,0.843363,0.842873
5,0.059,0.76667,0.836009,0.837247,0.835428,0.835646
6,0.0474,0.790853,0.837156,0.837961,0.83668,0.836878


[I 2025-03-25 15:05:23,647] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0013895077245751437, 'weight_decay': 0.002, 'warmup_steps': 178}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1732,0.485357,0.833716,0.835139,0.833091,0.83331
2,0.0624,0.710722,0.824541,0.825371,0.82404,0.824225
3,0.0354,0.963556,0.821101,0.822753,0.820409,0.820602


[I 2025-03-25 15:06:58,297] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.00018140820189608455, 'weight_decay': 0.001, 'warmup_steps': 229}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2738,0.458126,0.81422,0.81582,0.813526,0.813702
2,0.1354,0.505423,0.83945,0.839732,0.839732,0.83945
3,0.0971,0.675186,0.84289,0.843612,0.843321,0.84288
4,0.0756,0.696897,0.84289,0.844549,0.843532,0.84283
5,0.0605,0.760868,0.837156,0.838499,0.836554,0.836777
6,0.0489,0.782574,0.833716,0.834756,0.833175,0.833382


[I 2025-03-25 15:10:08,429] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.002755662375642045, 'weight_decay': 0.005, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1483,0.459415,0.832569,0.833207,0.832134,0.832314
2,0.054,0.815202,0.808486,0.809244,0.80798,0.808141
3,0.0301,1.102288,0.832569,0.832848,0.83226,0.832396


[I 2025-03-25 15:11:42,797] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.00014423688412524424, 'weight_decay': 0.004, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2746,0.426057,0.829128,0.829686,0.828713,0.828883
2,0.1496,0.495172,0.834862,0.834827,0.834933,0.834841
3,0.1098,0.635441,0.836009,0.836455,0.836354,0.836007
4,0.0882,0.660141,0.840596,0.842723,0.841321,0.840504
5,0.0726,0.695281,0.84289,0.843493,0.842479,0.842665
6,0.0607,0.76447,0.841743,0.841947,0.841479,0.841602
7,0.0521,0.818963,0.840596,0.840757,0.840353,0.840465
8,0.0451,0.937685,0.831422,0.83283,0.830797,0.831011
9,0.0394,0.97048,0.836009,0.836348,0.83568,0.835828
10,0.0351,1.024901,0.831422,0.831568,0.831176,0.831283


[I 2025-03-25 15:16:58,551] Trial 43 finished with value: 0.8312833411647641 and parameters: {'learning_rate': 0.00014423688412524424, 'weight_decay': 0.004, 'warmup_steps': 10}. Best is trial 27 with value: 0.8393650392639844.


Trial 44 with params: {'learning_rate': 7.038328060679867e-05, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3278,0.421971,0.819954,0.819901,0.819872,0.819886
2,0.2071,0.454269,0.817661,0.817721,0.817452,0.817534
3,0.1615,0.501549,0.829128,0.829151,0.82926,0.829117


[I 2025-03-25 15:18:36,384] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.00018800285951347054, 'weight_decay': 0.001, 'warmup_steps': 100}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2629,0.457797,0.824541,0.828613,0.823493,0.823621
2,0.1337,0.499046,0.838303,0.838537,0.838564,0.838303
3,0.0951,0.661771,0.84633,0.846983,0.846742,0.846323
4,0.0737,0.705551,0.844037,0.845184,0.844574,0.844007
5,0.0588,0.775413,0.834862,0.836398,0.834217,0.834441
6,0.0473,0.797586,0.838303,0.839366,0.837764,0.837979
7,0.0391,0.859164,0.84633,0.846639,0.846026,0.846172
8,0.0321,1.024371,0.825688,0.827161,0.82504,0.825243
9,0.0268,1.105777,0.837156,0.837445,0.836849,0.836988
10,0.0228,1.179516,0.83945,0.83957,0.839227,0.839328


[I 2025-03-25 15:23:58,427] Trial 45 finished with value: 0.8393278301886793 and parameters: {'learning_rate': 0.00018800285951347054, 'weight_decay': 0.001, 'warmup_steps': 100}. Best is trial 27 with value: 0.8393650392639844.


Trial 46 with params: {'learning_rate': 0.00022665362310086547, 'weight_decay': 0.0, 'warmup_steps': 87}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.249,0.500151,0.829128,0.835211,0.827871,0.827922
2,0.1216,0.494255,0.852064,0.852045,0.852162,0.852048
3,0.0847,0.65219,0.850917,0.850863,0.850951,0.850889
4,0.0642,0.704998,0.84633,0.846723,0.846657,0.846329
5,0.0498,0.805149,0.834862,0.836191,0.834259,0.834478
6,0.039,0.834064,0.834862,0.835821,0.834344,0.834548
7,0.0312,0.914957,0.84289,0.842921,0.842732,0.842799
8,0.0246,1.103639,0.827982,0.82947,0.827334,0.827543
9,0.0197,1.224704,0.837156,0.837352,0.836891,0.837011
10,0.0159,1.308486,0.836009,0.836034,0.835849,0.835914


[I 2025-03-25 15:29:20,350] Trial 46 finished with value: 0.8359140093401742 and parameters: {'learning_rate': 0.00022665362310086547, 'weight_decay': 0.0, 'warmup_steps': 87}. Best is trial 27 with value: 0.8393650392639844.


Trial 47 with params: {'learning_rate': 0.00010807519825983098, 'weight_decay': 0.0, 'warmup_steps': 131}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3018,0.427231,0.830275,0.830654,0.829923,0.830074
2,0.1703,0.484125,0.827982,0.827982,0.828092,0.827967
3,0.1273,0.589378,0.830275,0.830464,0.830513,0.830274
4,0.1042,0.619245,0.824541,0.82768,0.825429,0.824347
5,0.0885,0.636186,0.83945,0.83957,0.839227,0.839328
6,0.0768,0.694985,0.840596,0.840757,0.840353,0.840465
7,0.0676,0.743026,0.83945,0.83965,0.839185,0.839307
8,0.0606,0.848456,0.834862,0.836191,0.834259,0.834478
9,0.0551,0.846942,0.832569,0.832757,0.832302,0.83242
10,0.0506,0.897433,0.832569,0.83268,0.832344,0.832442


[I 2025-03-25 15:34:41,352] Trial 47 finished with value: 0.8324418800539084 and parameters: {'learning_rate': 0.00010807519825983098, 'weight_decay': 0.0, 'warmup_steps': 131}. Best is trial 27 with value: 0.8393650392639844.


Trial 48 with params: {'learning_rate': 0.0002899890888864183, 'weight_decay': 0.003, 'warmup_steps': 126}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2373,0.497379,0.832569,0.838101,0.831376,0.831483
2,0.1094,0.514619,0.847477,0.847842,0.847152,0.847308
3,0.0739,0.684278,0.848624,0.848569,0.848657,0.848595
4,0.0543,0.729857,0.847477,0.848364,0.847952,0.847461
5,0.0405,0.86411,0.836009,0.836732,0.835554,0.835745
6,0.0306,0.924989,0.837156,0.837961,0.83668,0.836878


[I 2025-03-25 15:37:51,658] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0003090313654777241, 'weight_decay': 0.002, 'warmup_steps': 81}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2291,0.47226,0.827982,0.829686,0.827292,0.827502
2,0.1046,0.513064,0.847477,0.847573,0.847278,0.847371
3,0.0703,0.674427,0.847477,0.847418,0.847489,0.847443
4,0.0509,0.749694,0.840596,0.842248,0.841237,0.840536
5,0.0376,0.909488,0.841743,0.842917,0.841185,0.841409
6,0.0282,0.950344,0.837156,0.837674,0.836764,0.836936
7,0.0214,1.003413,0.841743,0.841743,0.841858,0.84173
8,0.0157,1.262673,0.830275,0.831212,0.829755,0.829952
9,0.0116,1.462459,0.838303,0.83833,0.838143,0.838209
10,0.0085,1.628276,0.836009,0.835992,0.835891,0.835931


[I 2025-03-25 15:43:11,614] Trial 49 finished with value: 0.8359312810270215 and parameters: {'learning_rate': 0.0003090313654777241, 'weight_decay': 0.002, 'warmup_steps': 81}. Best is trial 27 with value: 0.8393650392639844.


Trial 50 with params: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 88}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1519,0.507474,0.838303,0.839554,0.837722,0.837944
2,0.0555,0.776252,0.827982,0.828355,0.827629,0.827778
3,0.0312,1.090855,0.831422,0.83136,0.831428,0.831385


[I 2025-03-25 15:44:49,848] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 5.508469461371033e-05, 'weight_decay': 0.002, 'warmup_steps': 85}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3497,0.418054,0.815367,0.815312,0.815284,0.815297
2,0.2293,0.438676,0.822248,0.822554,0.821914,0.822051
3,0.1824,0.475604,0.813073,0.813269,0.812779,0.812894
4,0.1542,0.518826,0.819954,0.822496,0.820757,0.819806
5,0.1355,0.562801,0.815367,0.816205,0.815831,0.815347
6,0.1219,0.581208,0.825688,0.8258,0.825882,0.825684


[I 2025-03-25 15:47:59,386] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.004803130612126116, 'weight_decay': 0.0, 'warmup_steps': 170}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1548,0.538403,0.831422,0.831403,0.831513,0.831404
2,0.059,0.765388,0.836009,0.835963,0.835933,0.835947
3,0.0383,0.971144,0.825688,0.826301,0.82525,0.825423


[I 2025-03-25 15:49:38,969] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 6.741957555513373e-05, 'weight_decay': 0.004, 'warmup_steps': 51}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3325,0.421213,0.821101,0.821092,0.820956,0.821007
2,0.2099,0.450522,0.819954,0.819965,0.819788,0.81985
3,0.1635,0.494352,0.826835,0.826858,0.826966,0.826824
4,0.1368,0.585233,0.822248,0.825079,0.823093,0.822077
5,0.1194,0.597828,0.824541,0.824869,0.82484,0.824541
6,0.1066,0.616746,0.833716,0.833696,0.833807,0.833698


[I 2025-03-25 15:52:45,891] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.00012898887947982582, 'weight_decay': 0.001, 'warmup_steps': 173}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.292,0.428443,0.823394,0.824142,0.822914,0.823093
2,0.1569,0.47602,0.830275,0.83066,0.830597,0.830274
3,0.1157,0.625465,0.836009,0.836582,0.836396,0.836004
4,0.0938,0.651844,0.83945,0.842542,0.840322,0.839284
5,0.0784,0.672092,0.841743,0.842567,0.841269,0.841473
6,0.0663,0.744655,0.837156,0.837209,0.836975,0.837052
7,0.0576,0.79348,0.841743,0.842275,0.841353,0.84153
8,0.0504,0.907472,0.832569,0.833693,0.832007,0.832216
9,0.0448,0.918598,0.837156,0.837352,0.836891,0.837011
10,0.0404,0.973921,0.837156,0.837159,0.837017,0.83707


[I 2025-03-25 15:58:01,012] Trial 54 finished with value: 0.8370702541106129 and parameters: {'learning_rate': 0.00012898887947982582, 'weight_decay': 0.001, 'warmup_steps': 173}. Best is trial 27 with value: 0.8393650392639844.


Trial 55 with params: {'learning_rate': 0.00012741518042868103, 'weight_decay': 0.001, 'warmup_steps': 174}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.293,0.433572,0.826835,0.828213,0.826208,0.826413
2,0.1584,0.47557,0.833716,0.834286,0.834102,0.83371
3,0.1172,0.623323,0.833716,0.834286,0.834102,0.83371
4,0.0951,0.648027,0.838303,0.841238,0.839153,0.838148
5,0.0799,0.667756,0.84289,0.843493,0.842479,0.842665
6,0.0678,0.742447,0.837156,0.837352,0.836891,0.837011
7,0.059,0.792928,0.84289,0.843493,0.842479,0.842665
8,0.0518,0.899022,0.834862,0.835999,0.834302,0.834514
9,0.0462,0.909591,0.836009,0.836163,0.835765,0.835874
10,0.0418,0.96761,0.830275,0.830383,0.83005,0.830147


[I 2025-03-25 16:03:29,598] Trial 55 finished with value: 0.8301465633423181 and parameters: {'learning_rate': 0.00012741518042868103, 'weight_decay': 0.001, 'warmup_steps': 174}. Best is trial 27 with value: 0.8393650392639844.


Trial 56 with params: {'learning_rate': 0.00013715659159959468, 'weight_decay': 0.002, 'warmup_steps': 162}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2873,0.43868,0.818807,0.820665,0.818073,0.818257
2,0.1526,0.4811,0.836009,0.836243,0.83627,0.836009
3,0.112,0.634267,0.838303,0.83902,0.838732,0.838292
4,0.0904,0.658006,0.83945,0.842854,0.840364,0.839259
5,0.0749,0.691542,0.84289,0.844373,0.842269,0.842507
6,0.0628,0.743532,0.840596,0.840844,0.840311,0.840443
7,0.0541,0.797477,0.84289,0.843493,0.842479,0.842665
8,0.047,0.915597,0.830275,0.83223,0.829544,0.829759
9,0.0413,0.938682,0.834862,0.835055,0.834596,0.834715
10,0.0369,0.996121,0.829128,0.829148,0.828966,0.829029


[I 2025-03-25 16:08:50,371] Trial 56 finished with value: 0.8290292824593424 and parameters: {'learning_rate': 0.00013715659159959468, 'weight_decay': 0.002, 'warmup_steps': 162}. Best is trial 27 with value: 0.8393650392639844.


Trial 57 with params: {'learning_rate': 0.00019971481842194432, 'weight_decay': 0.003, 'warmup_steps': 229}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2681,0.479717,0.817661,0.821964,0.816568,0.816642
2,0.1304,0.506982,0.837156,0.837346,0.837396,0.837155
3,0.092,0.675118,0.844037,0.844428,0.844363,0.844036
4,0.0708,0.718948,0.841743,0.842885,0.842279,0.841713
5,0.056,0.79801,0.836009,0.837892,0.835301,0.835531
6,0.0446,0.831219,0.833716,0.83494,0.833133,0.833347


[I 2025-03-25 16:12:04,779] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.00021771047684957567, 'weight_decay': 0.01, 'warmup_steps': 162}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2573,0.494731,0.826835,0.83245,0.825619,0.825679
2,0.124,0.502013,0.841743,0.841792,0.8419,0.841736
3,0.0864,0.663365,0.848624,0.848673,0.848783,0.848617
4,0.0657,0.710179,0.845183,0.84685,0.845826,0.845125
5,0.051,0.812618,0.834862,0.836398,0.834217,0.834441
6,0.04,0.833111,0.834862,0.835999,0.834302,0.834514


[I 2025-03-25 16:15:19,811] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0048602160405686, 'weight_decay': 0.01, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1465,0.539346,0.825688,0.82579,0.825461,0.825556
2,0.0578,0.822154,0.797018,0.798798,0.796255,0.796375
3,0.0368,1.132453,0.808486,0.809398,0.807938,0.808102


[I 2025-03-25 16:16:54,201] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 6.55668131365908e-05, 'weight_decay': 0.0, 'warmup_steps': 209}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3444,0.419357,0.818807,0.818796,0.818662,0.818712
2,0.2129,0.44616,0.823394,0.823756,0.82304,0.823185
3,0.1668,0.484997,0.831422,0.831571,0.831639,0.83142
4,0.1398,0.558512,0.829128,0.831204,0.82985,0.829029
5,0.1217,0.595576,0.817661,0.818094,0.817999,0.817658
6,0.1086,0.615552,0.832569,0.832534,0.832639,0.832547


[I 2025-03-25 16:20:09,033] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.000101170910824671, 'weight_decay': 0.002, 'warmup_steps': 33}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3019,0.42167,0.834862,0.835374,0.83447,0.83464
2,0.1764,0.476878,0.825688,0.825631,0.825714,0.825655
3,0.1334,0.570093,0.834862,0.834806,0.834891,0.834831


[I 2025-03-25 16:21:43,863] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0005031088822101422, 'weight_decay': 0.002, 'warmup_steps': 175}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2119,0.460633,0.847477,0.847467,0.847363,0.847405
2,0.0872,0.577534,0.838303,0.840988,0.837469,0.837703
3,0.0559,0.762293,0.837156,0.837156,0.83727,0.837142
4,0.038,0.843744,0.817661,0.818666,0.818167,0.817632
5,0.0261,0.999761,0.830275,0.831212,0.829755,0.829952
6,0.0181,1.223813,0.829128,0.830522,0.828503,0.828712


[I 2025-03-25 16:24:59,923] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.00046341948408935407, 'weight_decay': 0.0, 'warmup_steps': 197}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2177,0.462521,0.845183,0.845172,0.845068,0.84511
2,0.0898,0.57501,0.841743,0.844611,0.84089,0.841134
3,0.0577,0.757429,0.841743,0.841682,0.841732,0.841702
4,0.0395,0.835377,0.825688,0.826188,0.82605,0.825684
5,0.0274,0.998609,0.836009,0.837247,0.835428,0.835646
6,0.0192,1.195701,0.824541,0.828277,0.823535,0.823678


[I 2025-03-25 16:28:10,923] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.00021612616183931864, 'weight_decay': 0.001, 'warmup_steps': 143}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2565,0.491704,0.827982,0.833828,0.826745,0.826801
2,0.1247,0.503331,0.840596,0.840619,0.840732,0.840586
3,0.0871,0.661466,0.849771,0.849751,0.849867,0.849755
4,0.0664,0.715712,0.844037,0.845184,0.844574,0.844007
5,0.0517,0.823353,0.830275,0.831998,0.829587,0.829802
6,0.0406,0.830664,0.833716,0.83494,0.833133,0.833347


[I 2025-03-25 16:31:20,558] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0003205851744196624, 'weight_decay': 0.007, 'warmup_steps': 94}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2285,0.465515,0.832569,0.833354,0.832091,0.832283
2,0.1032,0.527743,0.845183,0.845543,0.844858,0.845012
3,0.069,0.688774,0.852064,0.852018,0.85212,0.852041
4,0.0499,0.749547,0.844037,0.845184,0.844574,0.844007
5,0.0366,0.896761,0.841743,0.842567,0.841269,0.841473
6,0.0273,0.960241,0.838303,0.838761,0.837933,0.838098
7,0.0205,1.031461,0.83945,0.839499,0.839606,0.839442
8,0.0148,1.273924,0.834862,0.835374,0.83447,0.83464
9,0.0108,1.529698,0.845183,0.845141,0.84511,0.845125
10,0.0078,1.677505,0.840596,0.840582,0.840479,0.840521


[I 2025-03-25 16:36:48,743] Trial 65 finished with value: 0.8405206158234686 and parameters: {'learning_rate': 0.0003205851744196624, 'weight_decay': 0.007, 'warmup_steps': 94}. Best is trial 65 with value: 0.8405206158234686.


Trial 66 with params: {'learning_rate': 0.00027519384220209134, 'weight_decay': 0.006, 'warmup_steps': 105}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2383,0.497162,0.834862,0.840039,0.833712,0.833852
2,0.1111,0.512048,0.84633,0.846542,0.846068,0.846194
3,0.0756,0.666812,0.848624,0.848569,0.848657,0.848595
4,0.0559,0.752917,0.84289,0.843943,0.843405,0.842865
5,0.0421,0.876837,0.832569,0.833517,0.832049,0.83225
6,0.032,0.913994,0.830275,0.830654,0.829923,0.830074


[I 2025-03-25 16:39:59,612] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0002281456454006721, 'weight_decay': 0.009000000000000001, 'warmup_steps': 88}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2486,0.50026,0.833716,0.839902,0.83246,0.832542
2,0.1212,0.492761,0.84633,0.846275,0.846363,0.846301
3,0.0845,0.652812,0.853211,0.853176,0.853288,0.853192
4,0.064,0.712377,0.847477,0.848364,0.847952,0.847461
5,0.0496,0.806408,0.832569,0.834089,0.831923,0.832141
6,0.0388,0.834628,0.836009,0.837061,0.83547,0.83568
7,0.031,0.918017,0.84633,0.846393,0.846152,0.846232
8,0.0245,1.107535,0.829128,0.830326,0.828545,0.82875
9,0.0195,1.245917,0.838303,0.838388,0.838101,0.83819
10,0.0157,1.323753,0.838303,0.838258,0.838227,0.838241


[I 2025-03-25 16:45:17,808] Trial 67 finished with value: 0.8382412724725199 and parameters: {'learning_rate': 0.0002281456454006721, 'weight_decay': 0.009000000000000001, 'warmup_steps': 88}. Best is trial 65 with value: 0.8405206158234686.


Trial 68 with params: {'learning_rate': 9.189810555280755e-05, 'weight_decay': 0.008, 'warmup_steps': 101}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3128,0.425695,0.827982,0.828748,0.827503,0.827688
2,0.1841,0.479193,0.825688,0.825682,0.825545,0.825596
3,0.1398,0.548135,0.831422,0.83136,0.831428,0.831385


[I 2025-03-25 16:46:53,199] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0005771766827376632, 'weight_decay': 0.008, 'warmup_steps': 92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2012,0.444334,0.855505,0.855647,0.855288,0.855395
2,0.0812,0.595211,0.831422,0.835605,0.830376,0.830537
3,0.051,0.782952,0.84289,0.842913,0.843026,0.84288
4,0.0335,0.928385,0.821101,0.821288,0.821335,0.8211
5,0.0228,1.101153,0.831422,0.832633,0.830839,0.831049
6,0.0152,1.205386,0.825688,0.827847,0.824914,0.825113


[I 2025-03-25 16:50:08,955] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00025939622300321627, 'weight_decay': 0.009000000000000001, 'warmup_steps': 82}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2402,0.486193,0.827982,0.832648,0.826871,0.826991
2,0.1136,0.502678,0.847477,0.847436,0.847405,0.847419
3,0.0779,0.669525,0.852064,0.852018,0.85212,0.852041
4,0.058,0.730114,0.840596,0.841831,0.841153,0.840561
5,0.044,0.824766,0.83945,0.84043,0.838932,0.839144
6,0.0337,0.866171,0.838303,0.838761,0.837933,0.838098
7,0.0264,0.959507,0.838303,0.838256,0.838354,0.838277
8,0.0202,1.165735,0.831422,0.83245,0.830881,0.831084
9,0.0155,1.349736,0.838303,0.83833,0.838143,0.838209
10,0.0121,1.465557,0.837156,0.837209,0.836975,0.837052


[I 2025-03-25 16:55:17,330] Trial 70 finished with value: 0.8370522437162784 and parameters: {'learning_rate': 0.00025939622300321627, 'weight_decay': 0.009000000000000001, 'warmup_steps': 82}. Best is trial 65 with value: 0.8405206158234686.


Trial 71 with params: {'learning_rate': 0.00024820534178570373, 'weight_decay': 0.01, 'warmup_steps': 108}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2458,0.509845,0.827982,0.835148,0.826619,0.826594
2,0.1172,0.502347,0.845183,0.845125,0.845152,0.845138
3,0.0807,0.670695,0.848624,0.848569,0.848657,0.848595
4,0.0607,0.731953,0.84633,0.847678,0.84691,0.846291
5,0.0465,0.851682,0.832569,0.833693,0.832007,0.832216
6,0.0359,0.859285,0.837156,0.838305,0.836596,0.836813
7,0.0284,0.945911,0.840596,0.840537,0.840564,0.840549
8,0.022,1.166526,0.831422,0.832128,0.830965,0.83115
9,0.0172,1.331187,0.834862,0.834977,0.834638,0.834737
10,0.0136,1.412801,0.833716,0.833697,0.833596,0.833637


[I 2025-03-25 17:00:12,979] Trial 71 finished with value: 0.833636613628798 and parameters: {'learning_rate': 0.00024820534178570373, 'weight_decay': 0.01, 'warmup_steps': 108}. Best is trial 65 with value: 0.8405206158234686.


Trial 72 with params: {'learning_rate': 0.0002626312595148772, 'weight_decay': 0.009000000000000001, 'warmup_steps': 75}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2389,0.489177,0.829128,0.832933,0.828124,0.828288
2,0.1128,0.501641,0.849771,0.84973,0.849699,0.849714
3,0.0772,0.671478,0.850917,0.850883,0.850994,0.850898
4,0.0575,0.735527,0.848624,0.849141,0.848994,0.848621
5,0.0434,0.841523,0.83945,0.840807,0.838848,0.839076
6,0.0332,0.887381,0.840596,0.841192,0.840185,0.840368
7,0.026,0.960676,0.840596,0.840619,0.840732,0.840586
8,0.0199,1.192983,0.841743,0.842414,0.841311,0.841502
9,0.0152,1.359635,0.838303,0.838388,0.838101,0.83819
10,0.0118,1.469679,0.840596,0.840582,0.840479,0.840521


[I 2025-03-25 17:05:09,444] Trial 72 finished with value: 0.8405206158234686 and parameters: {'learning_rate': 0.0002626312595148772, 'weight_decay': 0.009000000000000001, 'warmup_steps': 75}. Best is trial 65 with value: 0.8405206158234686.


Trial 73 with params: {'learning_rate': 0.00017049928990776612, 'weight_decay': 0.01, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2646,0.449749,0.831422,0.834318,0.830544,0.830749
2,0.1388,0.497448,0.838303,0.838452,0.838522,0.838301
3,0.0997,0.651238,0.841743,0.842391,0.842153,0.841736
4,0.0785,0.687908,0.84289,0.844332,0.84349,0.842843
5,0.0634,0.724015,0.837156,0.838305,0.836596,0.836813
6,0.0516,0.765357,0.83945,0.83957,0.839227,0.839328
7,0.0434,0.84098,0.84633,0.846302,0.846236,0.846265
8,0.0364,0.98052,0.833716,0.835352,0.833049,0.833272
9,0.0308,1.059096,0.840596,0.840844,0.840311,0.840443
10,0.0267,1.12253,0.841743,0.841712,0.841648,0.841676


[I 2025-03-25 17:10:04,521] Trial 73 finished with value: 0.8416756571849591 and parameters: {'learning_rate': 0.00017049928990776612, 'weight_decay': 0.01, 'warmup_steps': 30}. Best is trial 73 with value: 0.8416756571849591.


Trial 74 with params: {'learning_rate': 0.0003104830198867352, 'weight_decay': 0.01, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2255,0.489845,0.834862,0.837107,0.834091,0.834318
2,0.1045,0.529841,0.84289,0.843361,0.842521,0.842691
3,0.0706,0.645236,0.850917,0.850863,0.850951,0.850889
4,0.0513,0.767783,0.841743,0.84375,0.842448,0.84166
5,0.038,0.905372,0.824541,0.825713,0.823956,0.824153
6,0.0284,0.945066,0.830275,0.831051,0.829797,0.829985


[I 2025-03-25 17:13:01,583] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00011108390776807968, 'weight_decay': 0.01, 'warmup_steps': 66}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2963,0.421235,0.831422,0.831652,0.831134,0.83126
2,0.1691,0.478298,0.829128,0.829207,0.829302,0.829123
3,0.1266,0.595069,0.833716,0.833738,0.833849,0.833705


[I 2025-03-25 17:14:34,666] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.00020369675719306433, 'weight_decay': 0.007, 'warmup_steps': 37}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2528,0.475248,0.825688,0.829953,0.824619,0.824744
2,0.1274,0.488765,0.84289,0.842969,0.843069,0.842885
3,0.0899,0.647187,0.84633,0.846444,0.846531,0.846327
4,0.0689,0.706244,0.840596,0.841645,0.841111,0.840571
5,0.0544,0.768388,0.834862,0.836191,0.834259,0.834478
6,0.0431,0.822966,0.834862,0.835821,0.834344,0.834548


[I 2025-03-25 17:17:38,547] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.00028212422883931357, 'weight_decay': 0.009000000000000001, 'warmup_steps': 85}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2352,0.488374,0.827982,0.830695,0.827124,0.827319
2,0.1094,0.510679,0.847477,0.847573,0.847278,0.847371
3,0.0742,0.677365,0.852064,0.852005,0.852078,0.852031
4,0.0546,0.753704,0.840596,0.841831,0.841153,0.840561
5,0.0408,0.863416,0.836009,0.837447,0.835386,0.835609
6,0.0309,0.910782,0.837156,0.837273,0.836933,0.837033
7,0.0239,0.998585,0.83945,0.839499,0.839606,0.839442
8,0.0179,1.208861,0.833716,0.834289,0.833302,0.833477
9,0.0135,1.40205,0.838303,0.838287,0.838185,0.838226
10,0.0102,1.52848,0.838303,0.838258,0.838227,0.838241


[I 2025-03-25 17:22:55,191] Trial 77 finished with value: 0.8382412724725199 and parameters: {'learning_rate': 0.00028212422883931357, 'weight_decay': 0.009000000000000001, 'warmup_steps': 85}. Best is trial 73 with value: 0.8416756571849591.


Trial 78 with params: {'learning_rate': 0.00020775777089008043, 'weight_decay': 0.008, 'warmup_steps': 78}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2539,0.480695,0.832569,0.836949,0.831502,0.831662
2,0.1261,0.48877,0.847477,0.847628,0.847699,0.847475
3,0.0887,0.654616,0.848624,0.848589,0.848699,0.848604
4,0.0681,0.699195,0.84633,0.847134,0.846784,0.846317
5,0.0535,0.787089,0.837156,0.838499,0.836554,0.836777
6,0.0424,0.811301,0.837156,0.837961,0.83668,0.836878
7,0.0345,0.883669,0.844037,0.844164,0.843816,0.843918
8,0.0277,1.062186,0.831422,0.833042,0.830755,0.830972
9,0.0227,1.14961,0.837156,0.837445,0.836849,0.836988
10,0.0187,1.233099,0.838303,0.838287,0.838185,0.838226


[I 2025-03-25 17:27:51,459] Trial 78 finished with value: 0.838225948425245 and parameters: {'learning_rate': 0.00020775777089008043, 'weight_decay': 0.008, 'warmup_steps': 78}. Best is trial 73 with value: 0.8416756571849591.


Trial 79 with params: {'learning_rate': 0.0001597895253013094, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2668,0.443981,0.831422,0.833508,0.830671,0.830888
2,0.143,0.501246,0.840596,0.840676,0.840774,0.840591
3,0.1039,0.647707,0.838303,0.839347,0.838817,0.838277
4,0.0826,0.665169,0.841743,0.844003,0.84249,0.841642
5,0.0673,0.717037,0.837156,0.838499,0.836554,0.836777
6,0.0555,0.781835,0.841743,0.842042,0.841437,0.84158
7,0.0473,0.834393,0.853211,0.85335,0.852993,0.8531
8,0.0401,0.976744,0.833716,0.83608,0.832923,0.833145
9,0.0344,0.99645,0.840596,0.841061,0.840227,0.840395
10,0.0302,1.062721,0.841743,0.841867,0.841521,0.841623


[I 2025-03-25 17:32:47,807] Trial 79 finished with value: 0.8416231469002695 and parameters: {'learning_rate': 0.0001597895253013094, 'weight_decay': 0.01, 'warmup_steps': 1}. Best is trial 73 with value: 0.8416756571849591.


Trial 80 with params: {'learning_rate': 0.00022589215617591915, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2441,0.503462,0.831422,0.838915,0.830039,0.830027
2,0.1221,0.501351,0.838303,0.838242,0.838311,0.838267
3,0.0858,0.658613,0.847477,0.8475,0.847615,0.847467
4,0.0656,0.718013,0.841743,0.84308,0.842321,0.841702
5,0.0511,0.822481,0.831422,0.833042,0.830755,0.830972
6,0.0402,0.855122,0.833716,0.834756,0.833175,0.833382


[I 2025-03-25 17:35:45,450] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 8.141892331285636e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3168,0.422728,0.825688,0.825682,0.825545,0.825596
2,0.1936,0.470337,0.821101,0.821092,0.820956,0.821007
3,0.1493,0.531551,0.830275,0.830212,0.83026,0.830231
4,0.1234,0.610091,0.818807,0.822685,0.819799,0.818531
5,0.1064,0.62186,0.833716,0.833865,0.833933,0.833714
6,0.0945,0.64457,0.841743,0.841749,0.841606,0.84166
7,0.085,0.693447,0.838303,0.838242,0.838311,0.838267
8,0.078,0.780967,0.834862,0.836398,0.834217,0.834441
9,0.0723,0.774276,0.834862,0.835055,0.834596,0.834715
10,0.0681,0.807054,0.834862,0.834864,0.834723,0.834775


[I 2025-03-25 17:40:40,810] Trial 81 finished with value: 0.8347754689572412 and parameters: {'learning_rate': 8.141892331285636e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17}. Best is trial 73 with value: 0.8416756571849591.


Trial 82 with params: {'learning_rate': 0.00041625129855283363, 'weight_decay': 0.007, 'warmup_steps': 70}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2155,0.450914,0.844037,0.844045,0.8439,0.843955
2,0.0924,0.552183,0.836009,0.837061,0.83547,0.83568
3,0.06,0.721736,0.838303,0.838258,0.838227,0.838241


[I 2025-03-25 17:42:11,688] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0002932493172622025, 'weight_decay': 0.007, 'warmup_steps': 137}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.238,0.483772,0.827982,0.831937,0.826955,0.827108
2,0.1083,0.524746,0.84633,0.84646,0.84611,0.846214
3,0.0731,0.692002,0.848624,0.848569,0.848657,0.848595
4,0.0535,0.768183,0.84289,0.844549,0.843532,0.84283
5,0.0399,0.920052,0.830275,0.831576,0.829671,0.829881
6,0.03,0.934955,0.83945,0.83957,0.839227,0.839328
7,0.023,1.029957,0.838303,0.838242,0.838311,0.838267
8,0.0171,1.277796,0.831422,0.83245,0.830881,0.831084
9,0.0126,1.502405,0.836009,0.836163,0.835765,0.835874
10,0.0095,1.633441,0.833716,0.833795,0.833512,0.8336


[I 2025-03-25 17:47:20,918] Trial 83 finished with value: 0.8335998315468083 and parameters: {'learning_rate': 0.0002932493172622025, 'weight_decay': 0.007, 'warmup_steps': 137}. Best is trial 73 with value: 0.8416756571849591.


Trial 84 with params: {'learning_rate': 0.00020609822148187924, 'weight_decay': 0.009000000000000001, 'warmup_steps': 62}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2534,0.476411,0.824541,0.827649,0.823619,0.823788
2,0.1265,0.494425,0.84633,0.846444,0.846531,0.846327
3,0.0889,0.657458,0.850917,0.850967,0.851078,0.85091
4,0.0682,0.706186,0.84633,0.846983,0.846742,0.846323
5,0.0538,0.785178,0.834862,0.836398,0.834217,0.834441
6,0.0425,0.80921,0.838303,0.839035,0.837848,0.838042
7,0.0347,0.885722,0.84633,0.846393,0.846152,0.846232
8,0.028,1.062015,0.833716,0.835352,0.833049,0.833272
9,0.0229,1.154871,0.836009,0.836248,0.835722,0.835852
10,0.019,1.234079,0.836009,0.836091,0.835807,0.835895


[I 2025-03-25 17:52:29,217] Trial 84 finished with value: 0.8358950062840937 and parameters: {'learning_rate': 0.00020609822148187924, 'weight_decay': 0.009000000000000001, 'warmup_steps': 62}. Best is trial 73 with value: 0.8416756571849591.


Trial 85 with params: {'learning_rate': 0.00020010733460521876, 'weight_decay': 0.007, 'warmup_steps': 95}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2576,0.478205,0.829128,0.833631,0.828039,0.828174
2,0.1283,0.498025,0.840596,0.841045,0.840943,0.840594
3,0.0904,0.662515,0.848624,0.848737,0.848825,0.848621
4,0.0695,0.705595,0.84633,0.847482,0.846868,0.846301
5,0.0549,0.794499,0.833716,0.83558,0.833007,0.833231
6,0.0437,0.808596,0.837156,0.838305,0.836596,0.836813


[I 2025-03-25 17:55:24,160] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0004049223281760269, 'weight_decay': 0.009000000000000001, 'warmup_steps': 83}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2157,0.453416,0.845183,0.845351,0.844942,0.845056
2,0.0934,0.535381,0.840596,0.84186,0.840016,0.840243
3,0.061,0.7069,0.840596,0.840552,0.840522,0.840536
4,0.0426,0.81329,0.823394,0.823582,0.82363,0.823394
5,0.0304,0.931433,0.832569,0.833884,0.831965,0.832179
6,0.0218,1.073573,0.827982,0.828748,0.827503,0.827688


[I 2025-03-25 17:58:18,351] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.00010365750553938263, 'weight_decay': 0.01, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2993,0.416811,0.830275,0.830218,0.830218,0.830218
2,0.1749,0.47132,0.822248,0.8222,0.822293,0.822219
3,0.1325,0.585595,0.829128,0.829081,0.829176,0.829101
4,0.1081,0.629258,0.825688,0.829631,0.826682,0.825423
5,0.0921,0.640416,0.84633,0.846393,0.846152,0.846232
6,0.0802,0.691229,0.837156,0.837352,0.836891,0.837011
7,0.0709,0.741246,0.836009,0.835948,0.836017,0.835973
8,0.0639,0.842078,0.833716,0.834289,0.833302,0.833477
9,0.0583,0.840194,0.834862,0.834864,0.834723,0.834775
10,0.0539,0.884443,0.833716,0.833739,0.833554,0.833619


[I 2025-03-25 18:03:12,007] Trial 87 finished with value: 0.833619100379897 and parameters: {'learning_rate': 0.00010365750553938263, 'weight_decay': 0.01, 'warmup_steps': 16}. Best is trial 73 with value: 0.8416756571849591.


Trial 88 with params: {'learning_rate': 0.00013805441957034123, 'weight_decay': 0.008, 'warmup_steps': 67}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2811,0.429076,0.826835,0.827262,0.826461,0.826616
2,0.1525,0.483698,0.834862,0.835144,0.835144,0.834862
3,0.1118,0.653071,0.838303,0.839347,0.838817,0.838277
4,0.0903,0.658309,0.836009,0.838644,0.836817,0.835874
5,0.0748,0.698939,0.84289,0.843361,0.842521,0.842691
6,0.0628,0.752285,0.841743,0.841867,0.841521,0.841623
7,0.0543,0.795101,0.838303,0.83833,0.838143,0.838209
8,0.0472,0.913542,0.833716,0.83494,0.833133,0.833347
9,0.0416,0.949772,0.834862,0.834913,0.83468,0.834757
10,0.0373,0.999,0.83945,0.839454,0.839311,0.839365


[I 2025-03-25 18:08:07,839] Trial 88 finished with value: 0.8393650392639844 and parameters: {'learning_rate': 0.00013805441957034123, 'weight_decay': 0.008, 'warmup_steps': 67}. Best is trial 73 with value: 0.8416756571849591.


Trial 89 with params: {'learning_rate': 0.00013374342782545702, 'weight_decay': 0.01, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2812,0.442863,0.826835,0.82888,0.826082,0.826286
2,0.1558,0.4907,0.837156,0.837269,0.837354,0.837153
3,0.1149,0.635108,0.837156,0.837438,0.837438,0.837156


[I 2025-03-25 18:09:41,350] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.00011942253372897792, 'weight_decay': 0.008, 'warmup_steps': 52}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2902,0.426974,0.829128,0.829562,0.828755,0.828912
2,0.1628,0.476833,0.834862,0.835052,0.835101,0.834862
3,0.1207,0.609477,0.833716,0.833865,0.833933,0.833714


[I 2025-03-25 18:11:11,065] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0001707857425790612, 'weight_decay': 0.008, 'warmup_steps': 81}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2681,0.453784,0.823394,0.827978,0.822283,0.822377
2,0.1391,0.502513,0.837156,0.837948,0.837606,0.837142
3,0.1,0.669395,0.841743,0.842885,0.842279,0.841713
4,0.0786,0.699709,0.844037,0.846308,0.844784,0.843937
5,0.0635,0.7526,0.836009,0.837447,0.835386,0.835609
6,0.0518,0.792014,0.834862,0.835374,0.83447,0.83464


[I 2025-03-25 18:14:08,357] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0003825671489822656, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.213,0.480575,0.834862,0.836398,0.834217,0.834441
2,0.0958,0.572372,0.829128,0.830956,0.828418,0.828631
3,0.0632,0.678042,0.844037,0.844007,0.843942,0.84397
4,0.0446,0.78921,0.831422,0.833043,0.83206,0.831358
5,0.032,0.926326,0.834862,0.837107,0.834091,0.834318
6,0.0232,1.0065,0.826835,0.831298,0.825745,0.825867


[I 2025-03-25 18:17:10,989] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.00028441720409192237, 'weight_decay': 0.01, 'warmup_steps': 65}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2329,0.482345,0.831422,0.834932,0.83046,0.830647
2,0.1087,0.499464,0.84633,0.846302,0.846236,0.846265
3,0.0736,0.663177,0.852064,0.852018,0.85212,0.852041
4,0.0541,0.742004,0.834862,0.836606,0.835522,0.834792
5,0.0405,0.879113,0.83945,0.841242,0.838764,0.839002
6,0.0307,0.903446,0.83945,0.839744,0.839143,0.839284
7,0.0236,1.013202,0.841743,0.841856,0.841942,0.84174
8,0.0178,1.222347,0.831422,0.831987,0.831007,0.83118
9,0.0133,1.406025,0.837156,0.837159,0.837017,0.83707
10,0.0101,1.576742,0.837156,0.837123,0.837059,0.837087


[I 2025-03-25 18:22:10,386] Trial 93 finished with value: 0.8370865457990159 and parameters: {'learning_rate': 0.00028441720409192237, 'weight_decay': 0.01, 'warmup_steps': 65}. Best is trial 73 with value: 0.8416756571849591.


Trial 94 with params: {'learning_rate': 0.00018375741144648357, 'weight_decay': 0.008, 'warmup_steps': 120}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2651,0.453253,0.825688,0.827847,0.824914,0.825113
2,0.1337,0.498699,0.840596,0.840746,0.840816,0.840594
3,0.0953,0.666219,0.847477,0.848205,0.84791,0.847467
4,0.0741,0.706861,0.841743,0.843288,0.842363,0.84169
5,0.0594,0.77336,0.831422,0.833268,0.830713,0.830931
6,0.0478,0.791294,0.838303,0.839035,0.837848,0.838042
7,0.0396,0.835337,0.850917,0.850985,0.850741,0.850822
8,0.0327,1.000481,0.829128,0.830732,0.82846,0.828672
9,0.0273,1.086287,0.837156,0.837445,0.836849,0.836988
10,0.0232,1.157382,0.836009,0.836091,0.835807,0.835895


[I 2025-03-25 18:27:08,762] Trial 94 finished with value: 0.8358950062840937 and parameters: {'learning_rate': 0.00018375741144648357, 'weight_decay': 0.008, 'warmup_steps': 120}. Best is trial 73 with value: 0.8416756571849591.


Trial 95 with params: {'learning_rate': 0.00019531297081194378, 'weight_decay': 0.008, 'warmup_steps': 72}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2573,0.471779,0.829128,0.832606,0.828166,0.828343
2,0.1299,0.490951,0.84289,0.843126,0.843153,0.84289
3,0.0922,0.655911,0.850917,0.850917,0.851036,0.850905
4,0.0713,0.693728,0.845183,0.845764,0.845573,0.845178
5,0.0565,0.774605,0.837156,0.838708,0.836512,0.83674
6,0.0452,0.801803,0.838303,0.838891,0.83789,0.838071
7,0.0371,0.865665,0.84633,0.84634,0.846194,0.846249
8,0.0303,1.047344,0.832569,0.833884,0.831965,0.832179
9,0.025,1.132015,0.837156,0.837273,0.836933,0.837033
10,0.021,1.207313,0.833716,0.833739,0.833554,0.833619


[I 2025-03-25 18:32:07,115] Trial 95 finished with value: 0.833619100379897 and parameters: {'learning_rate': 0.00019531297081194378, 'weight_decay': 0.008, 'warmup_steps': 72}. Best is trial 73 with value: 0.8416756571849591.


Trial 96 with params: {'learning_rate': 0.0003616447716095151, 'weight_decay': 0.009000000000000001, 'warmup_steps': 102}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2227,0.462443,0.833716,0.834289,0.833302,0.833477
2,0.0982,0.533774,0.834862,0.835658,0.834386,0.83458
3,0.0647,0.694629,0.841743,0.841712,0.841648,0.841676
4,0.046,0.779825,0.829128,0.829361,0.829387,0.829128
5,0.0331,0.959866,0.838303,0.839756,0.83768,0.837909
6,0.0242,0.996303,0.841743,0.843114,0.841143,0.841375
7,0.0178,1.092193,0.838303,0.838326,0.838438,0.838292
8,0.0125,1.387299,0.840596,0.841192,0.840185,0.840368
9,0.0088,1.643694,0.83945,0.839505,0.839269,0.839347
10,0.0062,1.823567,0.83945,0.839454,0.839311,0.839365


[I 2025-03-25 18:37:08,423] Trial 96 finished with value: 0.8393650392639844 and parameters: {'learning_rate': 0.0003616447716095151, 'weight_decay': 0.009000000000000001, 'warmup_steps': 102}. Best is trial 73 with value: 0.8416756571849591.


Trial 97 with params: {'learning_rate': 0.0011830811668981324, 'weight_decay': 0.01, 'warmup_steps': 90}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1715,0.475896,0.840596,0.842283,0.839932,0.840171
2,0.0643,0.662423,0.819954,0.823952,0.818904,0.819009
3,0.0374,0.980183,0.825688,0.82579,0.825461,0.825556
4,0.0226,1.170871,0.830275,0.83024,0.830344,0.830253
5,0.0139,1.498385,0.823394,0.826056,0.822535,0.822715
6,0.0089,1.52351,0.823394,0.824,0.822956,0.823126


[I 2025-03-25 18:40:17,207] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0009886034014177738, 'weight_decay': 0.004, 'warmup_steps': 66}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1751,0.469787,0.838303,0.840451,0.837554,0.837791
2,0.0673,0.658664,0.818807,0.822298,0.81782,0.817945
3,0.0395,0.927864,0.829128,0.829202,0.828924,0.829009
4,0.0243,1.063647,0.819954,0.819935,0.820041,0.819935
5,0.0152,1.36189,0.819954,0.820362,0.819578,0.819726
6,0.0097,1.556088,0.824541,0.825371,0.82404,0.824225


[I 2025-03-25 18:43:20,320] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0005814280333154242, 'weight_decay': 0.01, 'warmup_steps': 147}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2023,0.445223,0.850917,0.851477,0.85053,0.850716
2,0.0809,0.596398,0.832569,0.836258,0.831586,0.831772
3,0.051,0.785787,0.841743,0.841801,0.841564,0.841642
4,0.0337,0.935056,0.81422,0.814981,0.814663,0.814205
5,0.0227,1.200904,0.829128,0.829562,0.828755,0.828912
6,0.0153,1.27135,0.821101,0.824624,0.820115,0.82025


[I 2025-03-25 18:46:23,406] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.00040812113879166767, 'weight_decay': 0.009000000000000001, 'warmup_steps': 117}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2174,0.455193,0.840596,0.840626,0.840437,0.840504
2,0.093,0.539577,0.83945,0.841482,0.838722,0.838962
3,0.0606,0.698026,0.840596,0.840626,0.840437,0.840504
4,0.0423,0.803014,0.824541,0.825394,0.825008,0.824523
5,0.0301,0.964409,0.841743,0.843326,0.8411,0.841339
6,0.0215,1.065686,0.837156,0.838499,0.836554,0.836777


[I 2025-03-25 18:49:24,670] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.00048050797430649505, 'weight_decay': 0.008, 'warmup_steps': 99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2082,0.445453,0.854358,0.854462,0.854161,0.854256
2,0.0872,0.556298,0.838303,0.84128,0.837427,0.837657
3,0.0561,0.731895,0.841743,0.841947,0.841479,0.841602


[I 2025-03-25 18:50:55,948] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0005053699818990836, 'weight_decay': 0.008, 'warmup_steps': 60}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.202,0.454356,0.853211,0.853435,0.852951,0.85308
2,0.0848,0.561957,0.833716,0.837257,0.832754,0.832951
3,0.0541,0.722706,0.834862,0.835055,0.834596,0.834715
4,0.0365,0.873503,0.818807,0.818856,0.818957,0.818799
5,0.0252,1.058819,0.825688,0.828375,0.82483,0.825017
6,0.0172,1.167294,0.817661,0.819181,0.816989,0.817174


[I 2025-03-25 18:53:59,188] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.00013636468875448824, 'weight_decay': 0.01, 'warmup_steps': 100}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2835,0.429944,0.829128,0.829562,0.828755,0.828912
2,0.1527,0.494456,0.831422,0.831445,0.831555,0.831411
3,0.112,0.643662,0.827982,0.828761,0.828429,0.827967


[I 2025-03-25 18:55:32,267] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 9.335260939298593e-05, 'weight_decay': 0.007, 'warmup_steps': 65}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3094,0.417947,0.831422,0.83175,0.831092,0.831235
2,0.1827,0.481057,0.823394,0.823354,0.823293,0.823319
3,0.1391,0.553868,0.832569,0.832506,0.832555,0.832526
4,0.114,0.620702,0.823394,0.827316,0.824387,0.823126
5,0.098,0.626195,0.83945,0.839394,0.83948,0.839419
6,0.0861,0.657914,0.840596,0.840684,0.840395,0.840485
7,0.0768,0.707901,0.836009,0.836091,0.835807,0.835895
8,0.07,0.794565,0.834862,0.835374,0.83447,0.83464
9,0.0643,0.804256,0.834862,0.835055,0.834596,0.834715
10,0.06,0.839604,0.830275,0.830273,0.830134,0.830186


[I 2025-03-25 19:00:43,430] Trial 104 finished with value: 0.830185898650498 and parameters: {'learning_rate': 9.335260939298593e-05, 'weight_decay': 0.007, 'warmup_steps': 65}. Best is trial 73 with value: 0.8416756571849591.


Trial 105 with params: {'learning_rate': 0.0008265634520344934, 'weight_decay': 0.006, 'warmup_steps': 130}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1879,0.463402,0.834862,0.837107,0.834091,0.834318
2,0.0718,0.631106,0.819954,0.825033,0.818778,0.81882
3,0.0431,0.823451,0.830275,0.83066,0.830597,0.830274


[I 2025-03-25 19:02:16,062] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.00021261909483259904, 'weight_decay': 0.008, 'warmup_steps': 89}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2531,0.487137,0.830275,0.833611,0.829334,0.829521
2,0.1249,0.50051,0.845183,0.845164,0.845279,0.845167
3,0.0875,0.659245,0.849771,0.84985,0.849952,0.849766
4,0.0668,0.70764,0.847477,0.848539,0.847994,0.847453
5,0.0523,0.788658,0.837156,0.838708,0.836512,0.83674
6,0.0412,0.806582,0.837156,0.837961,0.83668,0.836878
7,0.0333,0.875391,0.847477,0.847467,0.847363,0.847405
8,0.0268,1.052512,0.830275,0.83223,0.829544,0.829759
9,0.0217,1.155758,0.837156,0.837273,0.836933,0.837033
10,0.0179,1.225056,0.836009,0.835992,0.835891,0.835931


[I 2025-03-25 19:07:30,805] Trial 106 finished with value: 0.8359312810270215 and parameters: {'learning_rate': 0.00021261909483259904, 'weight_decay': 0.008, 'warmup_steps': 89}. Best is trial 73 with value: 0.8416756571849591.


Trial 107 with params: {'learning_rate': 0.002493660865891546, 'weight_decay': 0.005, 'warmup_steps': 200}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1625,0.470158,0.847477,0.848095,0.847068,0.847258
2,0.0571,0.69672,0.831422,0.832282,0.830923,0.831118
3,0.032,1.143307,0.827982,0.827946,0.82805,0.827959


[I 2025-03-25 19:09:07,571] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.000209614387932689, 'weight_decay': 0.009000000000000001, 'warmup_steps': 53}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2518,0.490807,0.825688,0.831484,0.824451,0.824492
2,0.1258,0.490633,0.84633,0.846444,0.846531,0.846327
3,0.0886,0.652787,0.847477,0.847458,0.847573,0.847461
4,0.068,0.710933,0.841743,0.842706,0.842237,0.841722
5,0.0535,0.783884,0.837156,0.838931,0.83647,0.836702
6,0.0423,0.814953,0.838303,0.839193,0.837806,0.838011
7,0.0343,0.884168,0.847477,0.847436,0.847405,0.847419
8,0.0276,1.069346,0.825688,0.827375,0.824998,0.825202
9,0.0225,1.179754,0.832569,0.832617,0.832386,0.832462
10,0.0186,1.266941,0.831422,0.831374,0.831344,0.831358


[I 2025-03-25 19:14:24,467] Trial 108 finished with value: 0.8313579223649675 and parameters: {'learning_rate': 0.000209614387932689, 'weight_decay': 0.009000000000000001, 'warmup_steps': 53}. Best is trial 73 with value: 0.8416756571849591.


Trial 109 with params: {'learning_rate': 0.00012876983536873018, 'weight_decay': 0.01, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.283,0.445363,0.821101,0.824314,0.820157,0.820306
2,0.158,0.481542,0.836009,0.836243,0.83627,0.836009
3,0.1168,0.619669,0.83945,0.83964,0.83969,0.839449
4,0.0946,0.643501,0.834862,0.837631,0.835691,0.834715
5,0.079,0.672475,0.844037,0.844164,0.843816,0.843918
6,0.0669,0.735737,0.834862,0.834913,0.83468,0.834757


[I 2025-03-25 19:17:36,160] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0002159566685998926, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2474,0.483732,0.827982,0.831937,0.826955,0.827108
2,0.1244,0.503627,0.840596,0.840577,0.84069,0.840579
3,0.0875,0.651638,0.84633,0.84638,0.846489,0.846323
4,0.0672,0.717425,0.840596,0.841645,0.841111,0.840571
5,0.0526,0.796554,0.838303,0.839554,0.837722,0.837944
6,0.0415,0.844046,0.831422,0.832633,0.830839,0.831049


[I 2025-03-25 19:20:34,302] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0003694171747391661, 'weight_decay': 0.01, 'warmup_steps': 68}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2204,0.462358,0.836009,0.837447,0.835386,0.835609
2,0.0972,0.53622,0.841743,0.842735,0.841227,0.841442
3,0.064,0.696375,0.844037,0.844045,0.8439,0.843955
4,0.0454,0.794582,0.825688,0.826464,0.826135,0.825673
5,0.0328,0.929586,0.838303,0.839193,0.837806,0.838011
6,0.0239,1.072247,0.826835,0.828213,0.826208,0.826413


[I 2025-03-25 19:23:32,828] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.00018309946782089958, 'weight_decay': 0.01, 'warmup_steps': 68}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2617,0.448747,0.825688,0.828104,0.824872,0.825066
2,0.1346,0.492716,0.84289,0.842969,0.843069,0.842885
3,0.096,0.657137,0.848624,0.849141,0.848994,0.848621
4,0.0748,0.692625,0.844037,0.845184,0.844574,0.844007
5,0.0599,0.761043,0.836009,0.837447,0.835386,0.835609
6,0.0483,0.783681,0.840596,0.841192,0.840185,0.840368
7,0.0401,0.855111,0.848624,0.848689,0.848447,0.848527
8,0.0331,1.01743,0.831422,0.833042,0.830755,0.830972
9,0.0277,1.090738,0.837156,0.837209,0.836975,0.837052
10,0.0237,1.164353,0.838303,0.838287,0.838185,0.838226


[I 2025-03-25 19:28:33,520] Trial 112 finished with value: 0.838225948425245 and parameters: {'learning_rate': 0.00018309946782089958, 'weight_decay': 0.01, 'warmup_steps': 68}. Best is trial 73 with value: 0.8416756571849591.


Trial 113 with params: {'learning_rate': 0.00027329151440782574, 'weight_decay': 0.009000000000000001, 'warmup_steps': 103}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2385,0.496925,0.832569,0.837702,0.831418,0.831544
2,0.1112,0.508384,0.849771,0.849762,0.849657,0.849699
3,0.0758,0.676074,0.847477,0.847418,0.847489,0.847443
4,0.056,0.749129,0.84289,0.84413,0.843447,0.842855
5,0.0421,0.860947,0.831422,0.83245,0.830881,0.831084
6,0.032,0.9015,0.833716,0.83395,0.833428,0.833556


[I 2025-03-25 19:31:30,268] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.00012485820472536813, 'weight_decay': 0.01, 'warmup_steps': 73}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2883,0.452853,0.823394,0.828737,0.822198,0.822249
2,0.1601,0.475333,0.833716,0.833949,0.833975,0.833715
3,0.1183,0.613432,0.829128,0.82957,0.829471,0.829126
4,0.0962,0.642328,0.838303,0.841541,0.839196,0.838124
5,0.0806,0.660285,0.844037,0.84434,0.843732,0.843876
6,0.0687,0.726838,0.840596,0.840684,0.840395,0.840485
7,0.0598,0.763465,0.84289,0.843244,0.842563,0.842716
8,0.0527,0.871697,0.831422,0.832128,0.830965,0.83115
9,0.0472,0.893083,0.837156,0.837273,0.836933,0.837033
10,0.0428,0.945604,0.836009,0.836091,0.835807,0.835895


[I 2025-03-25 19:36:43,527] Trial 114 finished with value: 0.8358950062840937 and parameters: {'learning_rate': 0.00012485820472536813, 'weight_decay': 0.01, 'warmup_steps': 73}. Best is trial 73 with value: 0.8416756571849591.


Trial 115 with params: {'learning_rate': 0.00028845248114584373, 'weight_decay': 0.008, 'warmup_steps': 62}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2321,0.474307,0.832569,0.834543,0.831839,0.83206
2,0.1082,0.502223,0.844037,0.84434,0.843732,0.843876
3,0.0731,0.656919,0.855505,0.85545,0.85554,0.855477
4,0.0536,0.741041,0.83945,0.841447,0.840153,0.839365
5,0.0399,0.845816,0.837156,0.838305,0.836596,0.836813
6,0.0302,0.908432,0.837156,0.837674,0.836764,0.836936
7,0.0232,0.98978,0.836009,0.836088,0.836185,0.836004
8,0.0173,1.219841,0.832569,0.833884,0.831965,0.832179
9,0.0129,1.426769,0.833716,0.83395,0.833428,0.833556
10,0.0097,1.577392,0.832569,0.832568,0.832428,0.832481


[I 2025-03-25 19:41:57,268] Trial 115 finished with value: 0.8324806838038695 and parameters: {'learning_rate': 0.00028845248114584373, 'weight_decay': 0.008, 'warmup_steps': 62}. Best is trial 73 with value: 0.8416756571849591.


Trial 116 with params: {'learning_rate': 0.0002138465301094106, 'weight_decay': 0.002, 'warmup_steps': 86}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2525,0.489513,0.831422,0.83673,0.83025,0.83036
2,0.1244,0.496386,0.849771,0.84985,0.849952,0.849766
3,0.0872,0.663756,0.848624,0.848673,0.848783,0.848617
4,0.0665,0.716593,0.845183,0.846429,0.845742,0.845149
5,0.0521,0.806333,0.837156,0.838305,0.836596,0.836813
6,0.0411,0.831999,0.834862,0.835374,0.83447,0.83464


[I 2025-03-25 19:44:56,800] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.0001278177465963901, 'weight_decay': 0.009000000000000001, 'warmup_steps': 68}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2867,0.429392,0.826835,0.82784,0.826292,0.826488
2,0.1586,0.479595,0.837156,0.837438,0.837438,0.837156
3,0.1169,0.624616,0.829128,0.829361,0.829387,0.829128


[I 2025-03-25 19:46:26,469] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.00028371807380581874, 'weight_decay': 0.01, 'warmup_steps': 73}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2334,0.483363,0.830275,0.834617,0.829208,0.829356
2,0.1084,0.506946,0.849771,0.849762,0.849657,0.849699
3,0.0735,0.675553,0.849771,0.849751,0.849867,0.849755
4,0.0541,0.759219,0.83945,0.84078,0.840027,0.839408
5,0.0404,0.897581,0.836009,0.837061,0.83547,0.83568
6,0.0307,0.950855,0.831422,0.831862,0.83105,0.831209


[I 2025-03-25 19:49:21,840] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0003049428292440835, 'weight_decay': 0.0, 'warmup_steps': 228}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2422,0.489235,0.827982,0.830162,0.827208,0.827414
2,0.1078,0.524708,0.847477,0.847512,0.84732,0.847389
3,0.0725,0.68645,0.845183,0.845125,0.845152,0.845138
4,0.0529,0.725346,0.840596,0.841831,0.841153,0.840561
5,0.039,0.914096,0.833716,0.83443,0.83326,0.833447
6,0.0292,0.956032,0.836009,0.836462,0.835638,0.835802


[I 2025-03-25 19:52:18,741] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.00021365774678008364, 'weight_decay': 0.009000000000000001, 'warmup_steps': 38}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2495,0.481046,0.825688,0.830313,0.824577,0.824684
2,0.1248,0.489056,0.845183,0.845206,0.845321,0.845174
3,0.0876,0.6402,0.848624,0.848624,0.848741,0.848611
4,0.0667,0.711241,0.841743,0.842706,0.842237,0.841722
5,0.0523,0.79122,0.831422,0.83283,0.830797,0.831011
6,0.0412,0.827817,0.836009,0.836889,0.835512,0.835713
7,0.0333,0.903495,0.845183,0.845172,0.845068,0.84511
8,0.0267,1.077462,0.838303,0.839973,0.837638,0.837871
9,0.0216,1.192246,0.836009,0.836091,0.835807,0.835895
10,0.0177,1.275914,0.831422,0.83136,0.831386,0.831372


[I 2025-03-25 19:57:16,991] Trial 120 finished with value: 0.8313721208326152 and parameters: {'learning_rate': 0.00021365774678008364, 'weight_decay': 0.009000000000000001, 'warmup_steps': 38}. Best is trial 73 with value: 0.8416756571849591.


Trial 121 with params: {'learning_rate': 0.0002770279002598169, 'weight_decay': 0.01, 'warmup_steps': 61}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2345,0.473396,0.832569,0.835055,0.831755,0.831971
2,0.1104,0.502259,0.84289,0.842877,0.842774,0.842815
3,0.0752,0.659052,0.853211,0.853176,0.853288,0.853192
4,0.0555,0.751628,0.834862,0.83709,0.835607,0.834757
5,0.0416,0.868254,0.836009,0.837447,0.835386,0.835609
6,0.0317,0.893835,0.836009,0.836348,0.83568,0.835828


[I 2025-03-25 20:00:14,417] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.00024416573974811906, 'weight_decay': 0.007, 'warmup_steps': 85}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2443,0.505018,0.830275,0.837502,0.828913,0.828906
2,0.1169,0.49493,0.848624,0.848564,0.848615,0.848585
3,0.0809,0.661396,0.853211,0.853157,0.853246,0.853183
4,0.0608,0.725655,0.844037,0.844687,0.844447,0.844029
5,0.0467,0.82436,0.834862,0.835999,0.834302,0.834514
6,0.036,0.851544,0.833716,0.834586,0.833218,0.833416


[I 2025-03-25 20:03:13,489] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.00030150670717981053, 'weight_decay': 0.01, 'warmup_steps': 100}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2325,0.479517,0.836009,0.837892,0.835301,0.835531
2,0.1064,0.523115,0.841743,0.842151,0.841395,0.841556
3,0.0717,0.668037,0.847477,0.847419,0.847447,0.847432
4,0.0522,0.749131,0.841743,0.842391,0.842153,0.841736
5,0.0387,0.879459,0.830275,0.831576,0.829671,0.829881
6,0.029,0.95406,0.838303,0.838546,0.838017,0.838148
7,0.0221,1.010948,0.834862,0.834862,0.834975,0.834848
8,0.0162,1.266261,0.832569,0.833517,0.832049,0.83225
9,0.0121,1.458344,0.840596,0.840552,0.840522,0.840536
10,0.009,1.601571,0.837156,0.837209,0.836975,0.837052


[I 2025-03-25 20:08:29,420] Trial 123 finished with value: 0.8370522437162784 and parameters: {'learning_rate': 0.00030150670717981053, 'weight_decay': 0.01, 'warmup_steps': 100}. Best is trial 73 with value: 0.8416756571849591.


Trial 124 with params: {'learning_rate': 0.00023875028505938967, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2408,0.483443,0.831422,0.834932,0.83046,0.830647
2,0.1191,0.503261,0.84289,0.842831,0.842858,0.842843
3,0.0832,0.672854,0.850917,0.850917,0.851036,0.850905
4,0.063,0.725323,0.844037,0.844838,0.844489,0.844024
5,0.0486,0.843449,0.829128,0.831194,0.828376,0.828587
6,0.038,0.869799,0.834862,0.835821,0.834344,0.834548


[I 2025-03-25 20:11:41,109] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0008823473143211748, 'weight_decay': 0.009000000000000001, 'warmup_steps': 36}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1768,0.459141,0.849771,0.851088,0.849194,0.849438
2,0.069,0.62125,0.830275,0.833014,0.829418,0.829622
3,0.041,0.867743,0.826835,0.827523,0.826377,0.826555
4,0.0255,1.020466,0.824541,0.824478,0.824503,0.824489
5,0.0164,1.435392,0.832569,0.833207,0.832134,0.832314
6,0.0104,1.552661,0.823394,0.824653,0.822788,0.822984


[I 2025-03-25 20:14:50,939] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.00017878851654551408, 'weight_decay': 0.01, 'warmup_steps': 61}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2628,0.451718,0.826835,0.830281,0.825871,0.826038
2,0.1355,0.492267,0.840596,0.840746,0.840816,0.840594
3,0.0969,0.657797,0.845183,0.845635,0.845531,0.845182
4,0.0757,0.692023,0.840596,0.841831,0.841153,0.840561
5,0.0609,0.752535,0.833716,0.835352,0.833049,0.833272
6,0.0493,0.784089,0.837156,0.83781,0.836722,0.836908
7,0.041,0.850894,0.844037,0.844245,0.843774,0.843898
8,0.034,1.019304,0.829128,0.830956,0.828418,0.828631
9,0.0287,1.080343,0.832569,0.832954,0.832218,0.83237
10,0.0245,1.151619,0.836009,0.836034,0.835849,0.835914


[I 2025-03-25 20:19:57,661] Trial 126 finished with value: 0.8359140093401742 and parameters: {'learning_rate': 0.00017878851654551408, 'weight_decay': 0.01, 'warmup_steps': 61}. Best is trial 73 with value: 0.8416756571849591.


Trial 127 with params: {'learning_rate': 0.00041071270855524923, 'weight_decay': 0.01, 'warmup_steps': 86}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2149,0.460306,0.83945,0.839975,0.839059,0.839233
2,0.0929,0.540621,0.83945,0.840611,0.83889,0.839111
3,0.0604,0.696482,0.83945,0.839395,0.839395,0.839395
4,0.0422,0.78708,0.822248,0.822685,0.822588,0.822246
5,0.0299,0.928231,0.833716,0.835139,0.833091,0.83331
6,0.0214,1.048465,0.831422,0.83283,0.830797,0.831011


[I 2025-03-25 20:22:56,569] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0004736361353158027, 'weight_decay': 0.005, 'warmup_steps': 87}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2095,0.450719,0.853211,0.853226,0.853077,0.853134
2,0.088,0.56193,0.836009,0.83867,0.835175,0.835401
3,0.0566,0.745166,0.849771,0.84973,0.849699,0.849714
4,0.0387,0.843962,0.829128,0.829458,0.829429,0.829128
5,0.0269,1.036137,0.833716,0.83443,0.83326,0.833447
6,0.0188,1.162489,0.830275,0.831998,0.829587,0.829802


[I 2025-03-25 20:25:56,354] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.00011477299332748166, 'weight_decay': 0.009000000000000001, 'warmup_steps': 138}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2984,0.427622,0.829128,0.829825,0.828671,0.828853
2,0.1661,0.482177,0.836009,0.836159,0.836228,0.836007
3,0.1235,0.600072,0.830275,0.830555,0.830555,0.830275


[I 2025-03-25 20:27:25,297] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 6.950823519596806e-05, 'weight_decay': 0.006, 'warmup_steps': 100}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3338,0.423159,0.823394,0.823387,0.823251,0.823302
2,0.2076,0.453493,0.817661,0.817864,0.817368,0.817486
3,0.1615,0.499716,0.822248,0.822395,0.822461,0.822246


[I 2025-03-25 20:28:53,897] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0001678667481374649, 'weight_decay': 0.001, 'warmup_steps': 197}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2768,0.456041,0.821101,0.824018,0.820199,0.82036
2,0.1402,0.501288,0.837156,0.837799,0.837564,0.837148
3,0.1011,0.665811,0.84289,0.843943,0.843405,0.842865
4,0.0796,0.673491,0.844037,0.845589,0.844658,0.843984
5,0.0643,0.737298,0.836009,0.837662,0.835344,0.835571
6,0.0526,0.772111,0.837156,0.837961,0.83668,0.836878
7,0.0442,0.825222,0.845183,0.846282,0.844647,0.844873
8,0.037,0.987262,0.826835,0.829677,0.825956,0.826143
9,0.0315,1.024507,0.834862,0.835508,0.834428,0.834611
10,0.0272,1.094781,0.837156,0.837352,0.836891,0.837011


[I 2025-03-25 20:34:02,064] Trial 131 finished with value: 0.8370110621449294 and parameters: {'learning_rate': 0.0001678667481374649, 'weight_decay': 0.001, 'warmup_steps': 197}. Best is trial 73 with value: 0.8416756571849591.


Trial 132 with params: {'learning_rate': 0.00011782030450007445, 'weight_decay': 0.001, 'warmup_steps': 104}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2946,0.433092,0.825688,0.826961,0.825082,0.825283
2,0.1647,0.478921,0.833716,0.833865,0.833933,0.833714
3,0.1223,0.6034,0.830275,0.830464,0.830513,0.830274


[I 2025-03-25 20:35:37,238] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0004586527334434759, 'weight_decay': 0.0, 'warmup_steps': 88}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2108,0.441027,0.856651,0.856647,0.85654,0.856583
2,0.0886,0.558119,0.834862,0.838259,0.833923,0.834129
3,0.0572,0.726317,0.845183,0.845125,0.845152,0.845138
4,0.0393,0.848028,0.830275,0.830555,0.830555,0.830275
5,0.0274,0.956924,0.836009,0.836889,0.835512,0.835713
6,0.0192,1.205428,0.825688,0.827847,0.824914,0.825113


[I 2025-03-25 20:38:43,261] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0003138111627266479, 'weight_decay': 0.008, 'warmup_steps': 117}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2316,0.477404,0.834862,0.837107,0.834091,0.834318
2,0.1048,0.529751,0.841743,0.842414,0.841311,0.841502
3,0.0703,0.692651,0.84633,0.84627,0.846321,0.846291
4,0.0509,0.759938,0.837156,0.837664,0.837522,0.837153
5,0.0375,0.928214,0.830275,0.831212,0.829755,0.829952
6,0.028,0.977512,0.83945,0.839975,0.839059,0.839233
7,0.0211,1.009488,0.836009,0.836088,0.836185,0.836004
8,0.0153,1.298873,0.830275,0.831387,0.829713,0.829917
9,0.0112,1.524373,0.83945,0.839505,0.839269,0.839347
10,0.0082,1.676863,0.83945,0.839505,0.839269,0.839347


[I 2025-03-25 20:43:48,246] Trial 134 finished with value: 0.8393472825371759 and parameters: {'learning_rate': 0.0003138111627266479, 'weight_decay': 0.008, 'warmup_steps': 117}. Best is trial 73 with value: 0.8416756571849591.


Trial 135 with params: {'learning_rate': 0.0004133567030275809, 'weight_decay': 0.007, 'warmup_steps': 120}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2173,0.459035,0.84289,0.842847,0.842816,0.84283
2,0.0926,0.544086,0.840596,0.842283,0.839932,0.840171
3,0.0601,0.698495,0.841743,0.841801,0.841564,0.841642
4,0.0421,0.798152,0.830275,0.831582,0.83085,0.830231
5,0.0298,0.940324,0.834862,0.835999,0.834302,0.834514
6,0.0213,1.066707,0.838303,0.839035,0.837848,0.838042
7,0.0154,1.18302,0.837156,0.837156,0.83727,0.837142
8,0.0105,1.50005,0.834862,0.835821,0.834344,0.834548
9,0.0071,1.79533,0.838303,0.838243,0.838269,0.838255
10,0.0049,1.984003,0.832569,0.832533,0.83247,0.832497


[I 2025-03-25 20:48:53,128] Trial 135 finished with value: 0.8324974344130727 and parameters: {'learning_rate': 0.0004133567030275809, 'weight_decay': 0.007, 'warmup_steps': 120}. Best is trial 73 with value: 0.8416756571849591.


Trial 136 with params: {'learning_rate': 0.00020082049892428878, 'weight_decay': 0.008, 'warmup_steps': 124}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2594,0.474208,0.832569,0.836258,0.831586,0.831772
2,0.1282,0.499337,0.83945,0.839732,0.839732,0.83945
3,0.0906,0.657209,0.849771,0.849922,0.849994,0.849769
4,0.0697,0.704232,0.838303,0.839347,0.838817,0.838277
5,0.0551,0.793616,0.832569,0.834089,0.831923,0.832141
6,0.0437,0.808442,0.838303,0.839035,0.837848,0.838042
7,0.0357,0.865594,0.849771,0.849762,0.849657,0.849699
8,0.029,1.039361,0.829128,0.830732,0.82846,0.828672
9,0.0238,1.13139,0.838303,0.83846,0.838059,0.83817
10,0.0198,1.208086,0.840596,0.840552,0.840522,0.840536


[I 2025-03-25 20:53:52,882] Trial 136 finished with value: 0.8405357225083707 and parameters: {'learning_rate': 0.00020082049892428878, 'weight_decay': 0.008, 'warmup_steps': 124}. Best is trial 73 with value: 0.8416756571849591.


Trial 137 with params: {'learning_rate': 0.00021382716663524698, 'weight_decay': 0.008, 'warmup_steps': 131}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2565,0.496068,0.825688,0.831078,0.824493,0.824558
2,0.1251,0.497677,0.84289,0.842969,0.843069,0.842885
3,0.0875,0.658858,0.848624,0.848624,0.848741,0.848611
4,0.0668,0.715815,0.84289,0.843943,0.843405,0.842865
5,0.0523,0.819127,0.832569,0.834309,0.831881,0.832102
6,0.0412,0.815409,0.834862,0.835658,0.834386,0.83458


[I 2025-03-25 20:56:54,466] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.00018842551238427172, 'weight_decay': 0.007, 'warmup_steps': 76}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.26,0.462852,0.829128,0.832294,0.828208,0.828395
2,0.132,0.496372,0.844037,0.844321,0.844321,0.844037
3,0.0938,0.668095,0.848624,0.849141,0.848994,0.848621
4,0.0729,0.699975,0.844037,0.845379,0.844616,0.843996
5,0.0582,0.76856,0.837156,0.838931,0.83647,0.836702
6,0.0466,0.790705,0.841743,0.842275,0.841353,0.84153
7,0.0386,0.85485,0.847477,0.847467,0.847363,0.847405
8,0.0316,1.031816,0.830275,0.831998,0.829587,0.829802
9,0.0263,1.107809,0.838303,0.838388,0.838101,0.83819
10,0.0223,1.182843,0.838303,0.838287,0.838185,0.838226


[I 2025-03-25 21:01:58,100] Trial 138 finished with value: 0.838225948425245 and parameters: {'learning_rate': 0.00018842551238427172, 'weight_decay': 0.007, 'warmup_steps': 76}. Best is trial 73 with value: 0.8416756571849591.


Trial 139 with params: {'learning_rate': 0.0005935610781908282, 'weight_decay': 0.008, 'warmup_steps': 139}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.201,0.438357,0.852064,0.85244,0.851741,0.8519
2,0.0806,0.575545,0.827982,0.831604,0.826998,0.827163
3,0.0508,0.778943,0.838303,0.838242,0.838311,0.838267


[I 2025-03-25 21:03:31,312] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.00025740372272641996, 'weight_decay': 0.007, 'warmup_steps': 126}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2451,0.507263,0.832569,0.838516,0.831334,0.831419
2,0.1155,0.513201,0.848624,0.848635,0.848489,0.848544
3,0.0792,0.673802,0.849771,0.849724,0.849825,0.849747
4,0.0592,0.740281,0.84289,0.84413,0.843447,0.842855
5,0.045,0.863644,0.831422,0.832128,0.830965,0.83115
6,0.0346,0.88915,0.837156,0.838305,0.836596,0.836813
7,0.0272,0.967157,0.83945,0.839394,0.83948,0.839419
8,0.0209,1.189213,0.831422,0.833042,0.830755,0.830972
9,0.0161,1.380009,0.833716,0.83395,0.833428,0.833556
10,0.0126,1.459771,0.832569,0.83268,0.832344,0.832442


[I 2025-03-25 21:08:43,424] Trial 140 finished with value: 0.8324418800539084 and parameters: {'learning_rate': 0.00025740372272641996, 'weight_decay': 0.007, 'warmup_steps': 126}. Best is trial 73 with value: 0.8416756571849591.


Trial 141 with params: {'learning_rate': 0.00017708465947607557, 'weight_decay': 0.007, 'warmup_steps': 96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2666,0.456779,0.827982,0.831937,0.826955,0.827108
2,0.1368,0.500219,0.83945,0.839732,0.839732,0.83945
3,0.0979,0.659535,0.844037,0.845003,0.844531,0.844016
4,0.0767,0.697297,0.845183,0.846429,0.845742,0.845149
5,0.0615,0.757357,0.833716,0.835139,0.833091,0.83331
6,0.05,0.781214,0.833716,0.83443,0.83326,0.833447


[I 2025-03-25 21:11:50,087] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0002586625619903835, 'weight_decay': 0.007, 'warmup_steps': 92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2413,0.510521,0.826835,0.834203,0.82545,0.825402
2,0.1145,0.499028,0.845183,0.845277,0.844984,0.845076
3,0.0784,0.663714,0.853211,0.853152,0.853204,0.853173
4,0.0584,0.732711,0.84633,0.846983,0.846742,0.846323
5,0.0444,0.853808,0.834862,0.835999,0.834302,0.834514
6,0.034,0.900083,0.834862,0.835508,0.834428,0.834611


[I 2025-03-25 21:14:52,335] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0001450761095908577, 'weight_decay': 0.007, 'warmup_steps': 75}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2781,0.433083,0.823394,0.824469,0.82283,0.823022
2,0.1493,0.486218,0.834862,0.835052,0.835101,0.834862
3,0.109,0.645368,0.834862,0.836181,0.835438,0.83482


[I 2025-03-25 21:16:20,677] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.00036421616848579004, 'weight_decay': 0.009000000000000001, 'warmup_steps': 107}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2228,0.461415,0.834862,0.835508,0.834428,0.834611
2,0.0981,0.530773,0.845183,0.845794,0.844774,0.844961
3,0.0647,0.695104,0.841743,0.84169,0.84169,0.84169
4,0.0459,0.773083,0.833716,0.834427,0.834144,0.833705
5,0.0331,0.909491,0.838303,0.839366,0.837764,0.837979
6,0.0242,0.990388,0.836009,0.837061,0.83547,0.83568


[I 2025-03-25 21:19:16,698] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 5.74092968199494e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3479,0.418324,0.815367,0.815374,0.8152,0.81526
2,0.2252,0.440416,0.823394,0.823655,0.823082,0.823212
3,0.1781,0.480734,0.819954,0.820362,0.819578,0.819726


[I 2025-03-25 21:20:46,437] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.00019704954810072006, 'weight_decay': 0.008, 'warmup_steps': 176}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2652,0.470203,0.821101,0.823469,0.820283,0.820462
2,0.1307,0.501694,0.83945,0.839499,0.839606,0.839442
3,0.0922,0.670095,0.844037,0.84455,0.844405,0.844033
4,0.071,0.703967,0.841743,0.842391,0.842153,0.841736
5,0.0561,0.792759,0.832569,0.834309,0.831881,0.832102
6,0.0447,0.811851,0.837156,0.837961,0.83668,0.836878
7,0.0366,0.856665,0.845183,0.845351,0.844942,0.845056
8,0.0297,1.040653,0.827982,0.830421,0.827166,0.827368
9,0.0245,1.147361,0.833716,0.83443,0.83326,0.833447
10,0.0205,1.212026,0.836009,0.836163,0.835765,0.835874


[I 2025-03-25 21:25:49,958] Trial 146 finished with value: 0.8358742706568794 and parameters: {'learning_rate': 0.00019704954810072006, 'weight_decay': 0.008, 'warmup_steps': 176}. Best is trial 73 with value: 0.8416756571849591.


Trial 147 with params: {'learning_rate': 0.0006266262623609948, 'weight_decay': 0.002, 'warmup_steps': 227}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2059,0.453257,0.847477,0.847738,0.847194,0.847331
2,0.08,0.586701,0.834862,0.837107,0.834091,0.834318
3,0.0498,0.811302,0.836009,0.836032,0.836143,0.835999


[I 2025-03-25 21:27:18,835] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.00035951728826254763, 'weight_decay': 0.006, 'warmup_steps': 51}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2197,0.468791,0.837156,0.838126,0.836638,0.836846
2,0.0982,0.536981,0.837156,0.838126,0.836638,0.836846
3,0.065,0.672091,0.844037,0.843984,0.843984,0.843984
4,0.046,0.780488,0.833716,0.833949,0.833975,0.833715
5,0.0333,0.878917,0.833716,0.835139,0.833091,0.83331
6,0.0244,1.023829,0.827982,0.830983,0.827082,0.827269


[I 2025-03-25 21:30:13,840] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.00020709448845232986, 'weight_decay': 0.009000000000000001, 'warmup_steps': 116}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2581,0.467387,0.825688,0.828104,0.824872,0.825066
2,0.1276,0.502365,0.841743,0.841856,0.841942,0.84174
3,0.0898,0.655639,0.850917,0.851204,0.851204,0.850917
4,0.0688,0.713962,0.84289,0.844332,0.84349,0.842843
5,0.0541,0.801374,0.834862,0.836856,0.834133,0.83436
6,0.043,0.814185,0.838303,0.839973,0.837638,0.837871
7,0.0351,0.883581,0.84289,0.843054,0.842648,0.842761
8,0.0283,1.057064,0.830275,0.831576,0.829671,0.829881
9,0.0232,1.172056,0.834862,0.835055,0.834596,0.834715
10,0.0193,1.249564,0.832569,0.832617,0.832386,0.832462


[I 2025-03-25 21:35:07,218] Trial 149 finished with value: 0.8324621660744835 and parameters: {'learning_rate': 0.00020709448845232986, 'weight_decay': 0.009000000000000001, 'warmup_steps': 116}. Best is trial 73 with value: 0.8416756571849591.


In [None]:
print(best_trial_normal_aug)

BestRun(run_id='73', objective=0.8416756571849591, hyperparameters={'learning_rate': 0.00017049928990776612, 'weight_decay': 0.01, 'warmup_steps': 30}, run_summary=None)


In [None]:
base.reset_seed()

## Prohledávání s destilací nad augmentovaným datasetem
Konfigurace jednotlivých tréninků.

In [None]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-distill-embedd-aug_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-distill-embedd-aug_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [None]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [None]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [None]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=all_train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM(),
)
  

Nastavení prohledávání.

In [None]:
best_trial_distill_aug = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill-aug-embedd",
    n_trials=150
)

[I 2025-03-25 21:35:07,737] A new study created in memory with name: Distill-aug-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 169, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8417,1.14558,0.850917,0.853593,0.850109,0.850385
2,0.2989,1.124657,0.868119,0.868532,0.867801,0.867973
3,0.2032,1.145954,0.869266,0.869293,0.869138,0.869197
4,0.1577,1.162383,0.866972,0.867804,0.867433,0.866961
5,0.128,1.151717,0.860092,0.860071,0.860003,0.860032
6,0.1065,1.056618,0.875,0.87497,0.874937,0.874952


[I 2025-03-25 21:38:08,945] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 200, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1883,1.263828,0.836009,0.83659,0.835596,0.835774
2,0.5436,1.215165,0.836009,0.836348,0.83568,0.835828
3,0.3856,1.21569,0.848624,0.848757,0.848404,0.848509


[I 2025-03-25 21:39:45,065] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 192, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4162,1.41506,0.81078,0.811548,0.810274,0.810439
2,0.7874,1.313361,0.832569,0.832568,0.832428,0.832481
3,0.5962,1.28915,0.829128,0.830956,0.828418,0.828631
4,0.4842,1.281413,0.83945,0.840408,0.839943,0.839428
5,0.4152,1.311314,0.840596,0.840549,0.840648,0.840571
6,0.3701,1.276915,0.83945,0.83945,0.839564,0.839436


[I 2025-03-25 21:42:50,299] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 121, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1108,1.254434,0.831422,0.832282,0.830923,0.831118
2,0.503,1.191562,0.83945,0.839415,0.839522,0.839428
3,0.3526,1.239886,0.848624,0.84884,0.848362,0.848489


[I 2025-03-25 21:44:25,949] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 67, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5718,1.165566,0.854358,0.856649,0.853614,0.853896
2,0.1813,1.099498,0.861239,0.86143,0.861002,0.861124
3,0.1205,1.060071,0.857798,0.857744,0.857834,0.857771
4,0.0893,1.092289,0.863532,0.863612,0.863718,0.863528
5,0.0711,1.068857,0.857798,0.857744,0.857834,0.857771
6,0.0584,1.076755,0.850917,0.851349,0.850573,0.850741


[I 2025-03-25 21:47:31,296] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 118, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4966,1.04706,0.865826,0.866361,0.865465,0.865656
2,0.1536,1.197169,0.83945,0.841242,0.838764,0.839002
3,0.1018,1.131966,0.853211,0.853162,0.853162,0.853162
4,0.0751,1.156106,0.84289,0.843801,0.842395,0.842607
5,0.0594,1.054677,0.853211,0.853226,0.853077,0.853134
6,0.049,1.061688,0.856651,0.857454,0.856203,0.85642


[I 2025-03-25 21:50:40,685] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5485,1.24478,0.849771,0.855445,0.848604,0.848824
2,0.182,1.155595,0.848624,0.849319,0.848194,0.848393
3,0.1202,1.123525,0.852064,0.852242,0.851825,0.851943
4,0.0896,1.067671,0.855505,0.85545,0.85554,0.855477
5,0.0707,1.072801,0.857798,0.857744,0.857834,0.857771
6,0.0583,1.058244,0.857798,0.858247,0.857456,0.85763
7,0.0494,1.057855,0.861239,0.861237,0.861129,0.861173
8,0.0434,1.054603,0.863532,0.863647,0.863339,0.863437
9,0.0387,1.02914,0.862385,0.862407,0.862255,0.862313
10,0.0355,1.015028,0.863532,0.863647,0.863339,0.863437


[I 2025-03-25 21:55:56,223] Trial 6 finished with value: 0.8634371031315184 and parameters: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 6 with value: 0.8634371031315184.


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 22, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4534,1.055161,0.872706,0.873408,0.872306,0.872524
2,0.1503,1.117262,0.863532,0.863612,0.863718,0.863528
3,0.0999,1.074965,0.861239,0.861203,0.861171,0.861186
4,0.0744,1.097987,0.854358,0.854299,0.854372,0.854325
5,0.0584,1.021831,0.854358,0.854631,0.854077,0.854218
6,0.0488,1.072372,0.860092,0.860112,0.85996,0.860018


[I 2025-03-25 21:59:00,095] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1913,1.316661,0.833716,0.83443,0.83326,0.833447
2,0.6049,1.22843,0.832569,0.833073,0.832176,0.832343
3,0.4333,1.278855,0.84289,0.843244,0.842563,0.842716


[I 2025-03-25 22:00:34,576] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 120, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5563,1.154897,0.848624,0.850997,0.847857,0.848125
2,0.1688,1.13702,0.852064,0.852005,0.852078,0.852031
3,0.1115,1.059148,0.868119,0.868273,0.868348,0.868118
4,0.0824,1.132528,0.849771,0.849711,0.849783,0.849737
5,0.0655,1.052718,0.860092,0.860045,0.860045,0.860045
6,0.0541,1.057988,0.854358,0.854631,0.854077,0.854218


[I 2025-03-25 22:03:40,052] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0004285183260552018, 'weight_decay': 0.001, 'warmup_steps': 18, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6792,1.160147,0.864679,0.867203,0.863918,0.864233
2,0.2387,1.083166,0.864679,0.865281,0.864297,0.864496
3,0.1595,1.105087,0.872706,0.872764,0.872558,0.872633
4,0.1207,1.144198,0.864679,0.865081,0.865012,0.864678
5,0.0964,1.100369,0.870413,0.870394,0.870517,0.870399
6,0.0794,1.01906,0.873853,0.873884,0.873727,0.873787
7,0.0677,1.092383,0.862385,0.862464,0.862213,0.862298
8,0.0595,1.116502,0.860092,0.860547,0.85975,0.859926
9,0.0536,1.074269,0.869266,0.86925,0.86918,0.86921
10,0.0494,1.073277,0.872706,0.872661,0.872769,0.872686


[I 2025-03-25 22:08:46,565] Trial 10 finished with value: 0.8726861625516433 and parameters: {'learning_rate': 0.0004285183260552018, 'weight_decay': 0.001, 'warmup_steps': 18, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 10 with value: 0.8726861625516433.


Trial 11 with params: {'learning_rate': 0.0014321301966915287, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4732,1.105362,0.860092,0.862861,0.859287,0.859592
2,0.1581,1.152758,0.847477,0.848244,0.847026,0.847231
3,0.1033,1.135473,0.858945,0.859227,0.858666,0.85881
4,0.0772,1.108797,0.847477,0.847512,0.84732,0.847389
5,0.0613,1.007845,0.862385,0.862339,0.862339,0.862339
6,0.05,1.045579,0.861239,0.861286,0.861087,0.861158
7,0.0424,1.046593,0.858945,0.85889,0.858918,0.858903
8,0.0369,1.011627,0.855505,0.855732,0.855245,0.855376
9,0.0327,1.027938,0.861239,0.861237,0.861129,0.861173
10,0.0297,1.008962,0.858945,0.859054,0.85875,0.858847


[I 2025-03-25 22:13:56,371] Trial 11 finished with value: 0.8588467536569477 and parameters: {'learning_rate': 0.0014321301966915287, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 6.5}. Best is trial 10 with value: 0.8726861625516433.


Trial 12 with params: {'learning_rate': 9.686152689152715e-05, 'weight_decay': 0.002, 'warmup_steps': 47, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1597,1.274171,0.838303,0.83833,0.838143,0.838209
2,0.5623,1.228857,0.827982,0.827982,0.828092,0.827967
3,0.4029,1.24138,0.841743,0.841947,0.841479,0.841602
4,0.3199,1.377636,0.841743,0.844003,0.84249,0.841642
5,0.2709,1.236069,0.858945,0.858899,0.859003,0.858923
6,0.237,1.225732,0.854358,0.854739,0.854035,0.854197


[I 2025-03-25 22:16:59,066] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0004052254440503788, 'weight_decay': 0.003, 'warmup_steps': 121, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.727,1.102783,0.862385,0.863671,0.861834,0.862095
2,0.2464,1.051502,0.876147,0.87624,0.875979,0.876068
3,0.1649,1.124829,0.863532,0.863532,0.863423,0.863467
4,0.1243,1.121026,0.866972,0.866972,0.867096,0.866961
5,0.0991,1.13752,0.865826,0.865827,0.865717,0.865762
6,0.0817,1.055217,0.863532,0.863475,0.863549,0.863502
7,0.07,1.09627,0.865826,0.865792,0.865759,0.865775
8,0.0611,1.157966,0.863532,0.863532,0.863423,0.863467
9,0.055,1.127844,0.862385,0.862366,0.862297,0.862327
10,0.0507,1.118363,0.864679,0.864633,0.864633,0.864633


[I 2025-03-25 22:22:01,603] Trial 13 finished with value: 0.8646333249136988 and parameters: {'learning_rate': 0.0004052254440503788, 'weight_decay': 0.003, 'warmup_steps': 121, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 10 with value: 0.8726861625516433.


Trial 14 with params: {'learning_rate': 0.0002967370539368567, 'weight_decay': 0.004, 'warmup_steps': 88, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7908,1.148988,0.849771,0.852292,0.848983,0.849255
2,0.2875,1.083182,0.875,0.87506,0.874853,0.874927
3,0.1953,1.157277,0.865826,0.866121,0.865549,0.865697
4,0.1504,1.155548,0.865826,0.866067,0.866096,0.865826
5,0.1212,1.098622,0.865826,0.865792,0.865759,0.865775
6,0.1007,1.029135,0.870413,0.870356,0.870432,0.870384
7,0.0863,1.069146,0.865826,0.866505,0.865423,0.865633
8,0.0757,1.11179,0.87156,0.871589,0.871432,0.871492
9,0.0683,1.10474,0.869266,0.86921,0.869264,0.869232
10,0.0631,1.101885,0.868119,0.868067,0.868096,0.86808


[I 2025-03-25 22:27:18,250] Trial 14 finished with value: 0.8680802305833385 and parameters: {'learning_rate': 0.0002967370539368567, 'weight_decay': 0.004, 'warmup_steps': 88, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 10 with value: 0.8726861625516433.


Trial 15 with params: {'learning_rate': 0.0009349007798192055, 'weight_decay': 0.008, 'warmup_steps': 81, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5612,1.171085,0.850917,0.853889,0.850067,0.850344
2,0.1741,1.120627,0.856651,0.857454,0.856203,0.85642
3,0.1158,1.080349,0.863532,0.863532,0.863423,0.863467
4,0.086,1.053796,0.854358,0.854352,0.854246,0.854289
5,0.0678,1.053104,0.852064,0.852104,0.851909,0.851978
6,0.0561,1.075548,0.854358,0.854739,0.854035,0.854197


[I 2025-03-25 22:30:25,831] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.00022429163078221243, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8594,1.169258,0.83945,0.840611,0.83889,0.839111
2,0.3418,1.13732,0.870413,0.870469,0.870264,0.870338
3,0.2324,1.208795,0.872706,0.872832,0.872516,0.872618
4,0.1821,1.252003,0.861239,0.861983,0.861676,0.86123
5,0.1502,1.208928,0.869266,0.869352,0.869096,0.869183
6,0.1276,1.128722,0.864679,0.864633,0.864633,0.864633
7,0.1109,1.205519,0.864679,0.865026,0.864381,0.864539
8,0.0984,1.237667,0.860092,0.860547,0.85975,0.859926
9,0.0889,1.232423,0.857798,0.858247,0.857456,0.85763
10,0.0824,1.216556,0.863532,0.863582,0.863381,0.863453


[I 2025-03-25 22:35:49,433] Trial 16 finished with value: 0.8634529168635017 and parameters: {'learning_rate': 0.00022429163078221243, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 6.0}. Best is trial 10 with value: 0.8726861625516433.


Trial 17 with params: {'learning_rate': 0.0006412609358779237, 'weight_decay': 0.004, 'warmup_steps': 97, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6267,1.130091,0.858945,0.861277,0.858203,0.858498
2,0.2,1.119425,0.863532,0.864361,0.863086,0.863312
3,0.1326,1.118884,0.863532,0.863486,0.863591,0.86351
4,0.0991,1.164684,0.84633,0.846393,0.846152,0.846232
5,0.0786,1.075469,0.858945,0.858887,0.858961,0.858914
6,0.0653,1.058254,0.850917,0.851349,0.850573,0.850741


[I 2025-03-25 22:38:57,355] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 5.957853392927128e-05, 'weight_decay': 0.004, 'warmup_steps': 137, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3712,1.404878,0.823394,0.823672,0.823672,0.823394
2,0.7534,1.308095,0.837156,0.837674,0.836764,0.836936
3,0.5632,1.337008,0.831422,0.833764,0.830629,0.830843


[I 2025-03-25 22:40:33,547] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00045046258144846343, 'weight_decay': 0.002, 'warmup_steps': 50, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6735,1.160106,0.861239,0.863075,0.860581,0.860868
2,0.233,1.068776,0.864679,0.865431,0.864254,0.864473
3,0.1557,1.108956,0.858945,0.859133,0.858708,0.858829
4,0.1173,1.127756,0.864679,0.864874,0.864928,0.864678
5,0.0933,1.07845,0.876147,0.876344,0.8764,0.876146
6,0.0772,1.029163,0.858945,0.859133,0.858708,0.858829
7,0.066,1.059042,0.864679,0.864633,0.864633,0.864633
8,0.0581,1.113327,0.863532,0.863647,0.863339,0.863437
9,0.0522,1.089317,0.866972,0.866916,0.86697,0.866938
10,0.0481,1.075076,0.868119,0.868062,0.868138,0.86809


[I 2025-03-25 22:45:51,763] Trial 19 finished with value: 0.8680899482383273 and parameters: {'learning_rate': 0.00045046258144846343, 'weight_decay': 0.002, 'warmup_steps': 50, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 10 with value: 0.8726861625516433.


Trial 20 with params: {'learning_rate': 0.00042547607186766345, 'weight_decay': 0.004, 'warmup_steps': 111, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7152,1.090215,0.869266,0.869745,0.868927,0.869111
2,0.2399,1.094115,0.869266,0.869517,0.869012,0.86915
3,0.1604,1.17048,0.858945,0.859227,0.858666,0.85881
4,0.1206,1.155139,0.854358,0.854302,0.85433,0.854315
5,0.0962,1.153444,0.863532,0.863479,0.863507,0.863492
6,0.0793,1.086798,0.864679,0.86476,0.864507,0.864593
7,0.0678,1.117115,0.865826,0.865943,0.865633,0.865732
8,0.0595,1.178615,0.856651,0.8573,0.856245,0.856446
9,0.0536,1.145715,0.865826,0.865877,0.865675,0.865748
10,0.0494,1.129561,0.865826,0.865827,0.865717,0.865762


[I 2025-03-25 22:51:19,838] Trial 20 finished with value: 0.8657619572039268 and parameters: {'learning_rate': 0.00042547607186766345, 'weight_decay': 0.004, 'warmup_steps': 111, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 10 with value: 0.8726861625516433.


Trial 21 with params: {'learning_rate': 0.0008087763473950767, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5508,1.186937,0.860092,0.863166,0.859245,0.859553
2,0.1823,1.13532,0.858945,0.860113,0.858413,0.858662
3,0.1208,1.11473,0.861239,0.861181,0.861255,0.861208
4,0.0896,1.135559,0.854358,0.854596,0.854624,0.854358
5,0.071,1.069391,0.854358,0.85451,0.854582,0.854356
6,0.0587,1.044848,0.856651,0.856836,0.856414,0.856533


[I 2025-03-25 22:54:36,791] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0001181546687313462, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0809,1.27067,0.836009,0.837447,0.835386,0.835609
2,0.5038,1.205989,0.837156,0.837205,0.837312,0.837148
3,0.3543,1.232539,0.850917,0.850985,0.850741,0.850822


[I 2025-03-25 22:56:16,138] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.00013857419525046944, 'weight_decay': 0.0, 'warmup_steps': 60, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0299,1.225177,0.83945,0.841482,0.838722,0.838962
2,0.4521,1.158772,0.850917,0.850863,0.850951,0.850889
3,0.3121,1.234369,0.861239,0.861184,0.861213,0.861197
4,0.2463,1.373127,0.861239,0.86373,0.862013,0.861142
5,0.2068,1.185311,0.866972,0.867901,0.866507,0.866745
6,0.1785,1.169674,0.870413,0.870469,0.870264,0.870338
7,0.1582,1.262922,0.858945,0.859927,0.858455,0.858691
8,0.1437,1.248752,0.860092,0.860547,0.85975,0.859926
9,0.132,1.25238,0.862385,0.862537,0.862171,0.862281
10,0.124,1.247667,0.863532,0.863727,0.863297,0.86342


[I 2025-03-25 23:01:46,108] Trial 23 finished with value: 0.8634198476095709 and parameters: {'learning_rate': 0.00013857419525046944, 'weight_decay': 0.0, 'warmup_steps': 60, 'lambda_param': 0.0, 'temperature': 5.5}. Best is trial 10 with value: 0.8726861625516433.


Trial 24 with params: {'learning_rate': 0.0004168092170879578, 'weight_decay': 0.0, 'warmup_steps': 23, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6997,1.125775,0.862385,0.86411,0.86175,0.862034
2,0.2427,1.087067,0.866972,0.867445,0.866633,0.866815
3,0.1618,1.105173,0.863532,0.863532,0.863423,0.863467
4,0.1226,1.185293,0.862385,0.863211,0.862844,0.862374
5,0.0976,1.139759,0.869266,0.869213,0.869306,0.869241
6,0.0809,1.050778,0.864679,0.865281,0.864297,0.864496
7,0.069,1.09716,0.866972,0.866916,0.86697,0.866938
8,0.0606,1.120147,0.862385,0.862537,0.862171,0.862281
9,0.0544,1.08475,0.865826,0.865792,0.865759,0.865775
10,0.0502,1.088945,0.865826,0.865792,0.865759,0.865775


[I 2025-03-25 23:07:13,507] Trial 24 finished with value: 0.8657746729027292 and parameters: {'learning_rate': 0.0004168092170879578, 'weight_decay': 0.0, 'warmup_steps': 23, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}. Best is trial 10 with value: 0.8726861625516433.


Trial 25 with params: {'learning_rate': 0.0019629797618003542, 'weight_decay': 0.0, 'warmup_steps': 100, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4846,1.036276,0.861239,0.861525,0.86096,0.861105
2,0.1526,1.207143,0.845183,0.845794,0.844774,0.844961
3,0.1,1.210882,0.847477,0.847628,0.847699,0.847475
4,0.0744,1.163248,0.853211,0.853261,0.853372,0.853204
5,0.0587,1.053094,0.854358,0.85451,0.854582,0.854356
6,0.0485,1.102795,0.858945,0.859025,0.859129,0.85894
7,0.0409,1.118088,0.852064,0.852018,0.85212,0.852041
8,0.0352,1.051331,0.854358,0.854462,0.854161,0.854256
9,0.0311,1.049595,0.852064,0.852057,0.851951,0.851994
10,0.0282,1.046169,0.850917,0.850931,0.850783,0.850839


[I 2025-03-25 23:12:41,323] Trial 25 finished with value: 0.8508389650308428 and parameters: {'learning_rate': 0.0019629797618003542, 'weight_decay': 0.0, 'warmup_steps': 100, 'lambda_param': 0.8, 'temperature': 5.0}. Best is trial 10 with value: 0.8726861625516433.


Trial 26 with params: {'learning_rate': 8.68945458483681e-05, 'weight_decay': 0.007, 'warmup_steps': 102, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.207,1.295193,0.833716,0.833668,0.833639,0.833652
2,0.5959,1.245605,0.830275,0.830219,0.830302,0.830243
3,0.4285,1.266407,0.840596,0.840945,0.840269,0.84042


[I 2025-03-25 23:14:18,059] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0029678454905841976, 'weight_decay': 0.009000000000000001, 'warmup_steps': 70, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4515,1.044221,0.857798,0.85774,0.857792,0.857762
2,0.149,1.175877,0.852064,0.853394,0.851488,0.851736
3,0.0998,1.196768,0.84289,0.843361,0.842521,0.842691
4,0.075,1.097837,0.852064,0.852087,0.852204,0.852055
5,0.0603,1.052524,0.854358,0.854302,0.85433,0.854315
6,0.0497,1.078539,0.864679,0.865026,0.864381,0.864539
7,0.0417,1.076119,0.857798,0.857744,0.857834,0.857771
8,0.036,1.064107,0.855505,0.855481,0.855414,0.855443
9,0.0316,1.026435,0.863532,0.863823,0.863255,0.863401
10,0.0285,1.031097,0.860092,0.86024,0.859876,0.859986


[I 2025-03-25 23:19:44,563] Trial 27 finished with value: 0.859985680592992 and parameters: {'learning_rate': 0.0029678454905841976, 'weight_decay': 0.009000000000000001, 'warmup_steps': 70, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 10 with value: 0.8726861625516433.


Trial 28 with params: {'learning_rate': 0.0014412600221747798, 'weight_decay': 0.006, 'warmup_steps': 171, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5317,1.116911,0.858945,0.861013,0.858245,0.858534
2,0.1598,1.238679,0.844037,0.847235,0.843142,0.843391
3,0.1053,1.104292,0.858945,0.859054,0.85875,0.858847
4,0.078,1.180929,0.841743,0.842042,0.841437,0.84158
5,0.0615,1.088238,0.856651,0.856647,0.85654,0.856583
6,0.0507,1.137332,0.847477,0.848244,0.847026,0.847231


[I 2025-03-25 23:22:56,001] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0003207146635033186, 'weight_decay': 0.004, 'warmup_steps': 51, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7573,1.115723,0.855505,0.858528,0.854656,0.854948
2,0.2781,1.093935,0.862385,0.86313,0.86196,0.862176
3,0.1897,1.144252,0.861239,0.861761,0.860876,0.861063
4,0.1459,1.173968,0.863532,0.864129,0.863928,0.863528
5,0.1168,1.137289,0.866972,0.866919,0.867012,0.866947
6,0.0969,1.0363,0.87156,0.871516,0.871516,0.871516
7,0.0828,1.107893,0.870413,0.870417,0.870306,0.870351
8,0.0728,1.13543,0.866972,0.866998,0.866843,0.866902
9,0.0653,1.101382,0.869266,0.869293,0.869138,0.869197
10,0.0603,1.102484,0.870413,0.870381,0.870348,0.870364


[I 2025-03-25 23:28:21,388] Trial 29 finished with value: 0.8703635729744308 and parameters: {'learning_rate': 0.0003207146635033186, 'weight_decay': 0.004, 'warmup_steps': 51, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}. Best is trial 10 with value: 0.8726861625516433.


Trial 30 with params: {'learning_rate': 0.0005406534960180262, 'weight_decay': 0.007, 'warmup_steps': 81, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.649,1.091609,0.861239,0.862621,0.860666,0.860931
2,0.2165,1.16676,0.855505,0.856753,0.854951,0.8552
3,0.1439,1.091681,0.866972,0.866938,0.867054,0.866955
4,0.1077,1.123324,0.860092,0.860206,0.860297,0.860089
5,0.0852,1.078883,0.861239,0.861479,0.861508,0.861238
6,0.0704,1.060737,0.853211,0.853435,0.852951,0.85308


[I 2025-03-25 23:31:34,896] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0002255603737182001, 'weight_decay': 0.005, 'warmup_steps': 63, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8684,1.123382,0.845183,0.845942,0.844731,0.844934
2,0.3329,1.092047,0.866972,0.866928,0.866928,0.866928
3,0.228,1.177358,0.875,0.87497,0.874937,0.874952
4,0.1791,1.223771,0.863532,0.864129,0.863928,0.863528
5,0.1467,1.163548,0.870413,0.870417,0.870306,0.870351
6,0.1238,1.123529,0.870413,0.870394,0.870517,0.870399
7,0.1072,1.149131,0.873853,0.874343,0.873516,0.873704
8,0.0949,1.185712,0.869266,0.869222,0.869222,0.869222
9,0.0857,1.189379,0.87156,0.871648,0.87139,0.871478
10,0.0794,1.168074,0.875,0.87497,0.874937,0.874952


[I 2025-03-25 23:36:57,015] Trial 31 finished with value: 0.8749524730461324 and parameters: {'learning_rate': 0.0002255603737182001, 'weight_decay': 0.005, 'warmup_steps': 63, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 32 with params: {'learning_rate': 0.00018937118158110557, 'weight_decay': 0.003, 'warmup_steps': 51, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.918,1.162264,0.844037,0.845422,0.843437,0.843674
2,0.3706,1.120437,0.870413,0.870367,0.870474,0.870392
3,0.2533,1.210554,0.87156,0.871516,0.871516,0.871516
4,0.2001,1.272372,0.856651,0.858356,0.857298,0.856597
5,0.1668,1.162475,0.875,0.875561,0.874642,0.874842
6,0.1419,1.132282,0.869266,0.86921,0.869264,0.869232
7,0.1245,1.189749,0.868119,0.868419,0.867843,0.867993
8,0.1113,1.238056,0.858945,0.859054,0.85875,0.858847
9,0.101,1.241526,0.865826,0.865827,0.865717,0.865762
10,0.0941,1.237171,0.866972,0.866955,0.866886,0.866916


[I 2025-03-25 23:42:23,585] Trial 32 finished with value: 0.8669157698076467 and parameters: {'learning_rate': 0.00018937118158110557, 'weight_decay': 0.003, 'warmup_steps': 51, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 33 with params: {'learning_rate': 0.0005898966486488114, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6048,1.099933,0.864679,0.865977,0.864128,0.864394
2,0.2068,1.126339,0.856651,0.857161,0.856287,0.85647
3,0.1376,1.154756,0.862385,0.862464,0.862213,0.862298
4,0.1026,1.124889,0.862385,0.862407,0.862255,0.862313
5,0.0812,1.090715,0.860092,0.860141,0.860255,0.860085
6,0.0674,1.081039,0.861239,0.861286,0.861087,0.861158
7,0.0572,1.099707,0.862385,0.86298,0.862002,0.8622
8,0.0502,1.10638,0.858945,0.859461,0.858582,0.858766
9,0.0451,1.068215,0.863532,0.863727,0.863297,0.86342
10,0.0414,1.061282,0.858945,0.859227,0.858666,0.85881


[I 2025-03-25 23:47:46,151] Trial 33 finished with value: 0.8588095911960034 and parameters: {'learning_rate': 0.0005898966486488114, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 34 with params: {'learning_rate': 0.0002951220667961592, 'weight_decay': 0.005, 'warmup_steps': 66, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7796,1.219184,0.847477,0.852694,0.846352,0.846571
2,0.2864,1.128658,0.865826,0.866233,0.865507,0.865677
3,0.1961,1.173477,0.870413,0.870619,0.87018,0.870306
4,0.151,1.169035,0.864679,0.86497,0.86497,0.864679
5,0.1216,1.146742,0.868119,0.8681,0.868222,0.868105
6,0.1009,1.085177,0.87156,0.871545,0.871474,0.871505
7,0.0867,1.105065,0.868119,0.868173,0.86797,0.868043
8,0.0761,1.164819,0.863532,0.863935,0.863213,0.863381
9,0.0685,1.14396,0.869266,0.869352,0.869096,0.869183
10,0.0632,1.134265,0.868119,0.868122,0.868012,0.868057


[I 2025-03-25 23:53:02,918] Trial 34 finished with value: 0.8680566246021502 and parameters: {'learning_rate': 0.0002951220667961592, 'weight_decay': 0.005, 'warmup_steps': 66, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 35 with params: {'learning_rate': 0.0005239408289563699, 'weight_decay': 0.006, 'warmup_steps': 202, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7043,1.134511,0.865826,0.866838,0.865338,0.865584
2,0.2205,1.142668,0.857798,0.858866,0.857287,0.857528
3,0.1454,1.146076,0.858945,0.858909,0.858876,0.858891
4,0.1087,1.159642,0.857798,0.85775,0.85775,0.85775
5,0.0863,1.1023,0.857798,0.857776,0.857708,0.857738
6,0.0713,1.071943,0.856651,0.856929,0.856372,0.856514


[I 2025-03-25 23:56:01,612] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 7.416948412433974e-05, 'weight_decay': 0.006, 'warmup_steps': 71, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2659,1.332101,0.827982,0.827944,0.827882,0.827908
2,0.6597,1.260801,0.832569,0.83268,0.832344,0.832442
3,0.478,1.276337,0.84289,0.842921,0.842732,0.842799


[I 2025-03-25 23:57:30,314] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.00044870008350169195, 'weight_decay': 0.003, 'warmup_steps': 55, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6795,1.096248,0.87156,0.873113,0.870969,0.871261
2,0.2343,1.053994,0.861239,0.862418,0.860708,0.86096
3,0.156,1.071072,0.865826,0.865827,0.865717,0.865762
4,0.1177,1.083299,0.870413,0.870394,0.870517,0.870399
5,0.0934,1.067309,0.862385,0.862332,0.862423,0.862359
6,0.0773,1.006897,0.866972,0.867056,0.866801,0.866888
7,0.066,1.067484,0.863532,0.863582,0.863381,0.863453
8,0.0581,1.087939,0.858945,0.859054,0.85875,0.858847
9,0.0522,1.061962,0.864679,0.86466,0.864591,0.864621
10,0.0481,1.053214,0.865826,0.865792,0.865759,0.865775


[I 2025-03-26 00:02:28,296] Trial 37 finished with value: 0.8657746729027292 and parameters: {'learning_rate': 0.00044870008350169195, 'weight_decay': 0.003, 'warmup_steps': 55, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 38 with params: {'learning_rate': 0.00032063386881613944, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7468,1.099928,0.857798,0.860257,0.857035,0.857329
2,0.2759,1.115478,0.865826,0.866361,0.865465,0.865656
3,0.189,1.139078,0.865826,0.865943,0.865633,0.865732
4,0.1454,1.208586,0.861239,0.862958,0.861886,0.861186
5,0.1169,1.161957,0.861239,0.861184,0.861213,0.861197
6,0.0975,1.043682,0.866972,0.866998,0.866843,0.866902
7,0.0833,1.149764,0.861239,0.861525,0.86096,0.861105
8,0.0732,1.153619,0.862385,0.862846,0.862044,0.862222
9,0.0658,1.131005,0.865826,0.866121,0.865549,0.865697
10,0.0607,1.13672,0.865826,0.865877,0.865675,0.865748


[I 2025-03-26 00:07:26,501] Trial 38 finished with value: 0.865747825823779 and parameters: {'learning_rate': 0.00032063386881613944, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 5.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 39 with params: {'learning_rate': 0.0014780818159468043, 'weight_decay': 0.0, 'warmup_steps': 105, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5137,1.105195,0.857798,0.859059,0.857245,0.857498
2,0.1574,1.127101,0.847477,0.847573,0.847278,0.847371
3,0.1036,1.099766,0.855505,0.855446,0.855498,0.855467


[I 2025-03-26 00:08:58,446] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.00012124257132049206, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.07,1.267023,0.834862,0.835658,0.834386,0.83458
2,0.4977,1.192853,0.83945,0.839415,0.839522,0.839428
3,0.3491,1.225152,0.852064,0.852334,0.851783,0.851922
4,0.2748,1.262145,0.853211,0.854024,0.853667,0.853199
5,0.2314,1.183362,0.863532,0.863935,0.863213,0.863381
6,0.2017,1.169526,0.866972,0.86722,0.866717,0.866854
7,0.1799,1.23574,0.861239,0.862231,0.86075,0.860988
8,0.1644,1.235987,0.866972,0.867056,0.866801,0.866888
9,0.1519,1.226885,0.869266,0.869352,0.869096,0.869183
10,0.1432,1.223144,0.87156,0.871724,0.871348,0.871462


[I 2025-03-26 00:14:07,257] Trial 40 finished with value: 0.8714622641509434 and parameters: {'learning_rate': 0.00012124257132049206, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 41 with params: {'learning_rate': 6.428743650635986e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 44, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.317,1.376529,0.823394,0.824,0.822956,0.823126
2,0.723,1.289681,0.832569,0.833884,0.831965,0.832179
3,0.5309,1.342381,0.827982,0.830421,0.827166,0.827368


[I 2025-03-26 00:15:38,749] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 6.751129700258744e-05, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2904,1.388461,0.822248,0.823801,0.821577,0.821773
2,0.7022,1.295937,0.833716,0.83443,0.83326,0.833447
3,0.5165,1.321609,0.831422,0.833268,0.830713,0.830931
4,0.4184,1.292734,0.845183,0.845635,0.845531,0.845182
5,0.3592,1.299601,0.840596,0.840746,0.840816,0.840594
6,0.3193,1.26338,0.845183,0.845164,0.845279,0.845167


[I 2025-03-26 00:18:42,469] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.00014783252994741183, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0046,1.204691,0.84289,0.84364,0.842437,0.842636
2,0.4372,1.16783,0.856651,0.856605,0.856708,0.856629
3,0.3023,1.214766,0.868119,0.868067,0.868096,0.86808
4,0.2387,1.272936,0.862385,0.863383,0.862886,0.862367
5,0.2002,1.180656,0.865826,0.866233,0.865507,0.865677
6,0.1727,1.136257,0.875,0.875128,0.874811,0.874913
7,0.1532,1.215728,0.869266,0.869623,0.868969,0.869131
8,0.1386,1.224717,0.87156,0.871589,0.871432,0.871492
9,0.1269,1.234949,0.868119,0.868122,0.868012,0.868057
10,0.1191,1.221541,0.866972,0.86713,0.866759,0.866872


[I 2025-03-26 00:23:59,663] Trial 43 finished with value: 0.8668716307277629 and parameters: {'learning_rate': 0.00014783252994741183, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18, 'lambda_param': 0.2, 'temperature': 2.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 44 with params: {'learning_rate': 0.0003238339946107003, 'weight_decay': 0.01, 'warmup_steps': 91, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7718,1.146816,0.854358,0.856144,0.853698,0.853969
2,0.2743,1.11063,0.861239,0.861636,0.860918,0.861085
3,0.1866,1.181952,0.866972,0.867056,0.866801,0.866888
4,0.1432,1.171838,0.858945,0.859405,0.859297,0.858943
5,0.1152,1.137074,0.863532,0.863498,0.863465,0.86348
6,0.0955,1.053132,0.87156,0.871648,0.87139,0.871478
7,0.0818,1.111822,0.865826,0.866121,0.865549,0.865697
8,0.0717,1.16343,0.862385,0.86298,0.862002,0.8622
9,0.0645,1.146948,0.872706,0.872712,0.8726,0.872646
10,0.0595,1.141192,0.87156,0.871545,0.871474,0.871505


[I 2025-03-26 00:29:00,502] Trial 44 finished with value: 0.8715048811935899 and parameters: {'learning_rate': 0.0003238339946107003, 'weight_decay': 0.01, 'warmup_steps': 91, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 45 with params: {'learning_rate': 0.0003782666733047682, 'weight_decay': 0.009000000000000001, 'warmup_steps': 101, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7407,1.09723,0.860092,0.860991,0.859624,0.859853
2,0.2536,1.065787,0.865826,0.865792,0.865759,0.865775
3,0.1703,1.092879,0.860092,0.860327,0.859834,0.859967
4,0.1289,1.147067,0.869266,0.869559,0.869559,0.869266
5,0.1034,1.135768,0.860092,0.86024,0.859876,0.859986
6,0.0855,1.074746,0.861239,0.86143,0.861002,0.861124


[I 2025-03-26 00:31:58,270] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00025233579803446145, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8229,1.141135,0.84633,0.847529,0.845773,0.846006
2,0.3135,1.089435,0.868119,0.868087,0.868054,0.868069
3,0.2143,1.150399,0.873853,0.873944,0.873684,0.873773
4,0.1674,1.184255,0.864679,0.865081,0.865012,0.864678
5,0.1364,1.142879,0.868119,0.868062,0.868138,0.86809
6,0.115,1.091274,0.869266,0.869222,0.869222,0.869222
7,0.0995,1.128758,0.869266,0.869745,0.868927,0.869111
8,0.0878,1.169629,0.870413,0.870469,0.870264,0.870338
9,0.0794,1.161257,0.870413,0.870381,0.870348,0.870364
10,0.0734,1.15033,0.870413,0.870381,0.870348,0.870364


[I 2025-03-26 00:36:54,934] Trial 46 finished with value: 0.8703635729744308 and parameters: {'learning_rate': 0.00025233579803446145, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.8, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 47 with params: {'learning_rate': 0.0025789104733638904, 'weight_decay': 0.002, 'warmup_steps': 192, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4916,0.985364,0.861239,0.86143,0.861002,0.861124
2,0.1514,1.108073,0.856651,0.856695,0.856498,0.856568
3,0.1006,1.11372,0.856651,0.856593,0.856666,0.85662


[I 2025-03-26 00:38:22,068] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.00014092099644845275, 'weight_decay': 0.01, 'warmup_steps': 86, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.027,1.198862,0.837156,0.83781,0.836722,0.836908
2,0.4442,1.159265,0.847477,0.847467,0.847363,0.847405
3,0.308,1.215549,0.861239,0.861237,0.861129,0.861173
4,0.243,1.316319,0.855505,0.857322,0.856172,0.855443
5,0.2047,1.182346,0.866972,0.867901,0.866507,0.866745
6,0.1762,1.169659,0.865826,0.866121,0.865549,0.865697
7,0.1565,1.245363,0.863532,0.864203,0.863128,0.863336
8,0.1419,1.24331,0.857798,0.858379,0.857413,0.857606
9,0.1301,1.236433,0.860092,0.860429,0.859792,0.859947
10,0.1222,1.229849,0.861239,0.861761,0.860876,0.861063


[I 2025-03-26 00:43:24,031] Trial 48 finished with value: 0.8610629385731009 and parameters: {'learning_rate': 0.00014092099644845275, 'weight_decay': 0.01, 'warmup_steps': 86, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 49 with params: {'learning_rate': 7.720728957896204e-05, 'weight_decay': 0.01, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2345,1.335259,0.826835,0.827523,0.826377,0.826555
2,0.6478,1.241061,0.837156,0.837961,0.83668,0.836878
3,0.4698,1.297778,0.834862,0.834977,0.834638,0.834737
4,0.3811,1.295577,0.83945,0.83964,0.83969,0.839449
5,0.327,1.285744,0.847477,0.847458,0.847573,0.847461
6,0.2887,1.230111,0.853211,0.85335,0.852993,0.8531


[I 2025-03-26 00:46:23,773] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0009820900754755834, 'weight_decay': 0.007, 'warmup_steps': 24, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5272,1.113396,0.854358,0.856389,0.853656,0.853933
2,0.1726,1.099441,0.856651,0.856836,0.856414,0.856533
3,0.1144,1.09667,0.862385,0.862385,0.862507,0.862374
4,0.0852,1.15149,0.837156,0.837123,0.837059,0.837087
5,0.0672,1.138214,0.844037,0.844007,0.843942,0.84397
6,0.055,1.10492,0.845183,0.845172,0.845068,0.84511


[I 2025-03-26 00:49:26,808] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.00032322668784508696, 'weight_decay': 0.007, 'warmup_steps': 49, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7609,1.17696,0.853211,0.85591,0.852404,0.852687
2,0.2783,1.138344,0.856651,0.857038,0.856329,0.856493
3,0.1882,1.178497,0.868119,0.868122,0.868012,0.868057
4,0.1455,1.195251,0.865826,0.866922,0.866349,0.865804
5,0.1165,1.163916,0.869266,0.86921,0.869264,0.869232
6,0.0967,1.058896,0.868119,0.868122,0.868012,0.868057
7,0.0829,1.12476,0.869266,0.869882,0.868885,0.86909
8,0.0727,1.178303,0.866972,0.867581,0.866591,0.866793
9,0.0652,1.155493,0.868119,0.868173,0.86797,0.868043
10,0.0602,1.142969,0.870413,0.870381,0.870348,0.870364


[I 2025-03-26 00:54:30,712] Trial 51 finished with value: 0.8703635729744308 and parameters: {'learning_rate': 0.00032322668784508696, 'weight_decay': 0.007, 'warmup_steps': 49, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 52 with params: {'learning_rate': 7.762462878457772e-05, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2344,1.345436,0.826835,0.827385,0.826419,0.826587
2,0.6505,1.262519,0.833716,0.83494,0.833133,0.833347
3,0.4732,1.300652,0.836009,0.836462,0.835638,0.835802
4,0.3817,1.323773,0.838303,0.839347,0.838817,0.838277
5,0.3275,1.276073,0.847477,0.847628,0.847699,0.847475
6,0.2891,1.226625,0.852064,0.852057,0.851951,0.851994


[I 2025-03-26 00:57:25,984] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.00031339170090626786, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.759,1.177125,0.850917,0.853593,0.850109,0.850385
2,0.2782,1.121055,0.865826,0.865877,0.865675,0.865748
3,0.1904,1.172536,0.863532,0.864203,0.863128,0.863336
4,0.1469,1.215042,0.861239,0.862147,0.861718,0.861224
5,0.1185,1.163987,0.865826,0.865769,0.865844,0.865796
6,0.0984,1.055183,0.868119,0.868173,0.86797,0.868043
7,0.0843,1.133006,0.864679,0.864703,0.864549,0.864608
8,0.074,1.16819,0.863532,0.863727,0.863297,0.86342
9,0.0666,1.146771,0.862385,0.862339,0.862339,0.862339
10,0.0615,1.150086,0.863532,0.863479,0.863507,0.863492


[I 2025-03-26 01:02:22,638] Trial 53 finished with value: 0.863491716864498 and parameters: {'learning_rate': 0.00031339170090626786, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 54 with params: {'learning_rate': 0.00031436243308036264, 'weight_decay': 0.01, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7552,1.148984,0.858945,0.861557,0.858161,0.858461
2,0.2779,1.114106,0.868119,0.868419,0.867843,0.867993
3,0.1898,1.163554,0.869266,0.869517,0.869012,0.86915
4,0.146,1.181094,0.863532,0.864444,0.864012,0.863518
5,0.1176,1.123601,0.860092,0.860092,0.860213,0.86008
6,0.0979,1.043599,0.870413,0.870367,0.870474,0.870392
7,0.0839,1.130884,0.866972,0.867056,0.866801,0.866888
8,0.0736,1.152551,0.865826,0.866121,0.865549,0.865697
9,0.0664,1.131406,0.864679,0.864633,0.864633,0.864633
10,0.0613,1.128794,0.868119,0.868067,0.868096,0.86808


[I 2025-03-26 01:07:18,067] Trial 54 finished with value: 0.8680802305833385 and parameters: {'learning_rate': 0.00031436243308036264, 'weight_decay': 0.01, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 55 with params: {'learning_rate': 0.00030218063824909646, 'weight_decay': 0.01, 'warmup_steps': 89, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7881,1.110849,0.853211,0.856522,0.85232,0.852603
2,0.2853,1.075107,0.865826,0.866024,0.865591,0.865715
3,0.1936,1.1191,0.865826,0.865877,0.865675,0.865748
4,0.1489,1.146363,0.865826,0.866171,0.866138,0.865826
5,0.12,1.123739,0.863532,0.863555,0.863676,0.863523
6,0.0995,1.032449,0.869266,0.869266,0.86939,0.869255
7,0.0853,1.062405,0.869266,0.869623,0.868969,0.869131
8,0.0749,1.104775,0.868119,0.868087,0.868054,0.868069
9,0.0674,1.116361,0.863532,0.863479,0.863507,0.863492
10,0.0623,1.113092,0.868119,0.868062,0.868138,0.86809


[I 2025-03-26 01:12:23,003] Trial 55 finished with value: 0.8680899482383273 and parameters: {'learning_rate': 0.00030218063824909646, 'weight_decay': 0.01, 'warmup_steps': 89, 'lambda_param': 0.8, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 56 with params: {'learning_rate': 0.00010511259982524887, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1247,1.282389,0.831422,0.831443,0.83126,0.831324
2,0.5438,1.232216,0.831422,0.831655,0.831681,0.831422
3,0.3861,1.236848,0.844037,0.844097,0.843858,0.843937
4,0.3054,1.325314,0.84633,0.848356,0.847036,0.846249
5,0.2581,1.202527,0.857798,0.857744,0.857834,0.857771
6,0.2248,1.183836,0.863532,0.863582,0.863381,0.863453
7,0.2015,1.240128,0.857798,0.859986,0.857077,0.857366
8,0.1845,1.251058,0.858945,0.859337,0.858624,0.858789
9,0.1714,1.239084,0.863532,0.863935,0.863213,0.863381
10,0.1624,1.236091,0.864679,0.86476,0.864507,0.864593


[I 2025-03-26 01:17:28,756] Trial 56 finished with value: 0.8645927095670483 and parameters: {'learning_rate': 0.00010511259982524887, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 57 with params: {'learning_rate': 0.0003353439068434863, 'weight_decay': 0.003, 'warmup_steps': 118, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7745,1.143292,0.854358,0.857523,0.853488,0.853776
2,0.2733,1.111301,0.862385,0.863294,0.861918,0.86215
3,0.1848,1.139099,0.863532,0.863935,0.863213,0.863381
4,0.1414,1.158937,0.863532,0.864279,0.86397,0.863523
5,0.113,1.104307,0.869266,0.869213,0.869306,0.869241
6,0.0936,1.055718,0.865826,0.866121,0.865549,0.865697
7,0.0797,1.115386,0.863532,0.863727,0.863297,0.86342
8,0.07,1.160687,0.866972,0.86713,0.866759,0.866872
9,0.063,1.146985,0.866972,0.866955,0.866886,0.866916
10,0.058,1.127907,0.870413,0.870381,0.870348,0.870364


[I 2025-03-26 01:22:36,007] Trial 57 finished with value: 0.8703635729744308 and parameters: {'learning_rate': 0.0003353439068434863, 'weight_decay': 0.003, 'warmup_steps': 118, 'lambda_param': 0.5, 'temperature': 4.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 58 with params: {'learning_rate': 0.0004061752144391629, 'weight_decay': 0.001, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6867,1.121894,0.863532,0.864928,0.86296,0.86323
2,0.2438,1.070032,0.863532,0.863582,0.863381,0.863453
3,0.1636,1.107497,0.862385,0.862625,0.862128,0.862263
4,0.1242,1.161132,0.864679,0.86497,0.86497,0.864679
5,0.0992,1.087489,0.875,0.874949,0.874979,0.874963
6,0.0823,1.045125,0.872706,0.873015,0.872432,0.872584
7,0.0701,1.08618,0.869266,0.869427,0.869054,0.869167
8,0.0614,1.096887,0.870413,0.870536,0.870222,0.870323
9,0.0552,1.091544,0.870413,0.870381,0.870348,0.870364
10,0.0508,1.075415,0.87156,0.871516,0.871516,0.871516


[I 2025-03-26 01:27:32,903] Trial 58 finished with value: 0.8715163761892734 and parameters: {'learning_rate': 0.0004061752144391629, 'weight_decay': 0.001, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 5.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 59 with params: {'learning_rate': 0.0002616215146656782, 'weight_decay': 0.004, 'warmup_steps': 26, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8088,1.131593,0.849771,0.851765,0.849068,0.849333
2,0.3078,1.103489,0.862385,0.862464,0.862213,0.862298
3,0.2116,1.158435,0.87156,0.871724,0.871348,0.871462
4,0.1653,1.168726,0.87156,0.872094,0.871937,0.871557
5,0.1342,1.157622,0.87156,0.871648,0.87139,0.871478
6,0.1126,1.098043,0.87156,0.871589,0.871432,0.871492
7,0.0973,1.129173,0.870413,0.870619,0.87018,0.870306
8,0.0857,1.189922,0.868119,0.868661,0.867759,0.867952
9,0.0772,1.160942,0.876147,0.876105,0.876105,0.876105
10,0.0713,1.157705,0.872706,0.872676,0.872643,0.872658


[I 2025-03-26 01:32:33,205] Trial 59 finished with value: 0.8726580230102816 and parameters: {'learning_rate': 0.0002616215146656782, 'weight_decay': 0.004, 'warmup_steps': 26, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 60 with params: {'learning_rate': 0.000836774752719424, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5538,1.204075,0.84633,0.849557,0.845437,0.845694
2,0.182,1.130358,0.863532,0.863727,0.863297,0.86342
3,0.1205,1.125831,0.860092,0.860071,0.860003,0.860032


[I 2025-03-26 01:34:01,441] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.00022189331697960535, 'weight_decay': 0.004, 'warmup_steps': 33, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8675,1.130035,0.847477,0.847738,0.847194,0.847331
2,0.3384,1.125032,0.862385,0.862407,0.862255,0.862313
3,0.2318,1.227968,0.866972,0.866916,0.86697,0.866938
4,0.182,1.267639,0.860092,0.861926,0.86076,0.860032
5,0.1501,1.227876,0.861239,0.861286,0.861087,0.861158
6,0.127,1.144528,0.863532,0.863486,0.863591,0.86351
7,0.1104,1.197303,0.869266,0.869745,0.868927,0.869111
8,0.0979,1.238482,0.862385,0.862537,0.862171,0.862281
9,0.0886,1.227832,0.865826,0.865827,0.865717,0.865762
10,0.0822,1.223831,0.865826,0.865792,0.865759,0.865775


[I 2025-03-26 01:39:00,013] Trial 61 finished with value: 0.8657746729027292 and parameters: {'learning_rate': 0.00022189331697960535, 'weight_decay': 0.004, 'warmup_steps': 33, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 62 with params: {'learning_rate': 0.00039250497968790817, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6979,1.156557,0.852064,0.853606,0.851446,0.851704
2,0.2495,1.092299,0.864679,0.865431,0.864254,0.864473
3,0.1685,1.159018,0.858945,0.859054,0.85875,0.858847


[I 2025-03-26 01:40:27,940] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0003187880287384565, 'weight_decay': 0.001, 'warmup_steps': 11, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.755,1.114308,0.853211,0.854257,0.852698,0.852932
2,0.2782,1.148462,0.866972,0.86722,0.866717,0.866854
3,0.1898,1.157661,0.864679,0.864922,0.864423,0.864558
4,0.146,1.210835,0.860092,0.860492,0.860424,0.860091
5,0.1173,1.184147,0.856651,0.856596,0.856624,0.856609
6,0.0978,1.085362,0.866972,0.86713,0.866759,0.866872
7,0.0834,1.124902,0.861239,0.861351,0.861044,0.861142
8,0.0734,1.163805,0.862385,0.862728,0.862086,0.862243
9,0.066,1.137648,0.860092,0.860169,0.859918,0.860003
10,0.0609,1.139192,0.862385,0.862366,0.862297,0.862327


[I 2025-03-26 01:45:45,313] Trial 63 finished with value: 0.8623266584217035 and parameters: {'learning_rate': 0.0003187880287384565, 'weight_decay': 0.001, 'warmup_steps': 11, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 64 with params: {'learning_rate': 0.0003649873765554107, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.721,1.134434,0.855505,0.857182,0.854867,0.855136
2,0.2581,1.126579,0.863532,0.864203,0.863128,0.863336
3,0.1746,1.115123,0.868119,0.868173,0.86797,0.868043
4,0.1332,1.148211,0.866972,0.867503,0.867349,0.86697
5,0.1066,1.122964,0.865826,0.865792,0.865759,0.865775
6,0.0883,1.078447,0.868119,0.868322,0.867885,0.868011
7,0.0755,1.110228,0.862385,0.862728,0.862086,0.862243
8,0.0663,1.171553,0.863532,0.863935,0.863213,0.863381
9,0.0595,1.123991,0.869266,0.869293,0.869138,0.869197
10,0.0549,1.120715,0.870413,0.870469,0.870264,0.870338


[I 2025-03-26 01:50:54,174] Trial 64 finished with value: 0.8703376437443334 and parameters: {'learning_rate': 0.0003649873765554107, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 65 with params: {'learning_rate': 0.0010296649349829314, 'weight_decay': 0.001, 'warmup_steps': 35, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5304,1.152452,0.852064,0.855524,0.851151,0.851429
2,0.1713,1.166457,0.845183,0.84766,0.844395,0.844652
3,0.1131,1.046067,0.863532,0.863647,0.863339,0.863437
4,0.0839,1.08279,0.852064,0.852104,0.851909,0.851978
5,0.0665,1.079074,0.853211,0.853186,0.853119,0.853148
6,0.0545,1.085527,0.840596,0.84186,0.840016,0.840243


[I 2025-03-26 01:53:57,185] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.004387816666803014, 'weight_decay': 0.003, 'warmup_steps': 221, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4843,1.024826,0.869266,0.869266,0.86939,0.869255
2,0.1547,1.157324,0.84633,0.846393,0.846152,0.846232
3,0.1091,1.300475,0.841743,0.842735,0.841227,0.841442


[I 2025-03-26 01:55:27,744] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0005213277554788528, 'weight_decay': 0.001, 'warmup_steps': 50, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6412,1.094696,0.872706,0.873015,0.872432,0.872584
2,0.2177,1.07949,0.861239,0.861902,0.860834,0.86104
3,0.1447,1.105395,0.869266,0.86925,0.86918,0.86921
4,0.1088,1.094729,0.863532,0.863532,0.863423,0.863467
5,0.0862,1.065546,0.87156,0.871966,0.871895,0.871559
6,0.0716,1.082839,0.865826,0.866233,0.865507,0.865677
7,0.0611,1.090293,0.868119,0.868322,0.867885,0.868011
8,0.0537,1.0853,0.861239,0.86143,0.861002,0.861124
9,0.0482,1.066785,0.866972,0.86713,0.866759,0.866872
10,0.0444,1.055023,0.869266,0.869293,0.869138,0.869197


[I 2025-03-26 02:00:22,680] Trial 67 finished with value: 0.8691972462578159 and parameters: {'learning_rate': 0.0005213277554788528, 'weight_decay': 0.001, 'warmup_steps': 50, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 68 with params: {'learning_rate': 0.0005695220404637201, 'weight_decay': 0.008, 'warmup_steps': 188, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6863,1.197556,0.860092,0.863824,0.859161,0.85947
2,0.2114,1.159623,0.854358,0.855152,0.853909,0.854123
3,0.1389,1.158529,0.857798,0.85775,0.85775,0.85775
4,0.104,1.130495,0.855505,0.855456,0.855456,0.855456
5,0.0825,1.109006,0.858945,0.858899,0.859003,0.858923
6,0.0683,1.07323,0.856651,0.857161,0.856287,0.85647


[I 2025-03-26 02:03:23,475] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.00046850307312499905, 'weight_decay': 0.008, 'warmup_steps': 229, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7405,1.123413,0.864679,0.866419,0.864044,0.864333
2,0.2335,1.075623,0.864679,0.865281,0.864297,0.864496
3,0.1549,1.128563,0.855505,0.855521,0.855372,0.855429


[I 2025-03-26 02:04:52,430] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00016518581519547854, 'weight_decay': 0.008, 'warmup_steps': 75, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9725,1.192744,0.841743,0.843114,0.841143,0.841375
2,0.4071,1.131835,0.861239,0.861181,0.861255,0.861208
3,0.2785,1.218223,0.865826,0.865877,0.865675,0.865748
4,0.2188,1.321909,0.858945,0.861153,0.859676,0.858863
5,0.1829,1.187936,0.87156,0.871921,0.871264,0.871427
6,0.1565,1.158985,0.87156,0.871589,0.871432,0.871492
7,0.1381,1.221374,0.87156,0.872044,0.871222,0.871407
8,0.124,1.242354,0.864679,0.865431,0.864254,0.864473
9,0.1131,1.232966,0.864679,0.86476,0.864507,0.864593
10,0.1057,1.235395,0.865826,0.866024,0.865591,0.865715


[I 2025-03-26 02:09:47,316] Trial 70 finished with value: 0.8657153123556285 and parameters: {'learning_rate': 0.00016518581519547854, 'weight_decay': 0.008, 'warmup_steps': 75, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 71 with params: {'learning_rate': 0.0008766257488999306, 'weight_decay': 0.0, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5386,1.186605,0.847477,0.853533,0.846268,0.846459
2,0.1791,1.156552,0.854358,0.855152,0.853909,0.854123
3,0.1196,1.053797,0.864679,0.864633,0.864633,0.864633
4,0.0889,1.109226,0.855505,0.855446,0.855498,0.855467
5,0.0701,1.049014,0.861239,0.861184,0.861213,0.861197
6,0.0582,1.053424,0.861239,0.86143,0.861002,0.861124


[I 2025-03-26 02:12:47,742] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0001638312854589366, 'weight_decay': 0.005, 'warmup_steps': 57, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9669,1.166092,0.836009,0.837662,0.835344,0.835571
2,0.405,1.128654,0.860092,0.860169,0.859918,0.860003
3,0.2776,1.228436,0.865826,0.86578,0.865886,0.865804
4,0.2184,1.375872,0.853211,0.85672,0.85413,0.853037
5,0.1829,1.192475,0.866972,0.867325,0.866675,0.866835
6,0.1565,1.164865,0.87156,0.871589,0.871432,0.871492
7,0.1381,1.23279,0.865826,0.866361,0.865465,0.865656
8,0.1244,1.254198,0.863532,0.863823,0.863255,0.863401
9,0.1135,1.262708,0.858945,0.859054,0.85875,0.858847
10,0.1062,1.253639,0.864679,0.864633,0.864633,0.864633


[I 2025-03-26 02:17:53,177] Trial 72 finished with value: 0.8646333249136988 and parameters: {'learning_rate': 0.0001638312854589366, 'weight_decay': 0.005, 'warmup_steps': 57, 'lambda_param': 0.8, 'temperature': 3.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 73 with params: {'learning_rate': 0.001083699936462591, 'weight_decay': 0.009000000000000001, 'warmup_steps': 122, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5481,1.146811,0.850917,0.853889,0.850067,0.850344
2,0.1676,1.173436,0.847477,0.848587,0.846942,0.847171
3,0.1109,1.156156,0.854358,0.854462,0.854161,0.854256
4,0.0824,1.090041,0.852064,0.852242,0.851825,0.851943
5,0.0652,1.126146,0.84633,0.84675,0.845984,0.846148
6,0.0536,1.100825,0.857798,0.857776,0.857708,0.857738


[I 2025-03-26 02:20:56,447] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0004387175639106545, 'weight_decay': 0.003, 'warmup_steps': 73, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6886,1.083287,0.863532,0.864724,0.863002,0.863259
2,0.2349,1.092551,0.861239,0.861761,0.860876,0.861063
3,0.1576,1.153993,0.858945,0.859133,0.858708,0.858829


[I 2025-03-26 02:22:25,744] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.000134708416155871, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0395,1.238115,0.834862,0.835374,0.83447,0.83464
2,0.4632,1.168018,0.853211,0.853157,0.853246,0.853183
3,0.3218,1.216254,0.860092,0.860169,0.859918,0.860003
4,0.254,1.266653,0.860092,0.861086,0.860592,0.860073
5,0.2139,1.156629,0.876147,0.876179,0.876021,0.876082
6,0.1853,1.148875,0.869266,0.869293,0.869138,0.869197
7,0.1648,1.212718,0.868119,0.868966,0.867675,0.867906
8,0.1499,1.217561,0.870413,0.870831,0.870096,0.870269
9,0.1381,1.218724,0.866972,0.867056,0.866801,0.866888
10,0.13,1.216139,0.868119,0.868239,0.867928,0.868027


[I 2025-03-26 02:27:24,923] Trial 75 finished with value: 0.8680274526060894 and parameters: {'learning_rate': 0.000134708416155871, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20, 'lambda_param': 0.8, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 76 with params: {'learning_rate': 0.00030137271532425773, 'weight_decay': 0.0, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7658,1.16526,0.850917,0.852798,0.850236,0.850501
2,0.2864,1.138737,0.860092,0.860327,0.859834,0.859967
3,0.1956,1.169802,0.862385,0.862407,0.862255,0.862313
4,0.1514,1.196895,0.865826,0.866425,0.866223,0.865821
5,0.1221,1.143992,0.863532,0.863479,0.863507,0.863492
6,0.102,1.095824,0.866972,0.866955,0.866886,0.866916
7,0.0875,1.136465,0.860092,0.860327,0.859834,0.859967
8,0.0767,1.17399,0.865826,0.866024,0.865591,0.865715
9,0.069,1.138771,0.868119,0.868087,0.868054,0.868069
10,0.0637,1.129398,0.869266,0.869293,0.869138,0.869197


[I 2025-03-26 02:32:19,380] Trial 76 finished with value: 0.8691972462578159 and parameters: {'learning_rate': 0.00030137271532425773, 'weight_decay': 0.0, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 77 with params: {'learning_rate': 0.00014486085132520515, 'weight_decay': 0.005, 'warmup_steps': 117, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0281,1.207303,0.836009,0.837447,0.835386,0.835609
2,0.4371,1.169289,0.854358,0.85451,0.854582,0.854356
3,0.3016,1.227019,0.861239,0.861181,0.861255,0.861208
4,0.2376,1.341051,0.855505,0.85783,0.856256,0.855413
5,0.1986,1.17661,0.870413,0.870961,0.870053,0.870249
6,0.1708,1.156348,0.864679,0.865026,0.864381,0.864539
7,0.1517,1.231179,0.860092,0.860547,0.85975,0.859926
8,0.1374,1.256921,0.856651,0.857161,0.856287,0.85647
9,0.126,1.241173,0.855505,0.855647,0.855288,0.855395
10,0.1182,1.246589,0.857798,0.857944,0.857582,0.85769


[I 2025-03-26 02:37:19,767] Trial 77 finished with value: 0.8576903638814016 and parameters: {'learning_rate': 0.00014486085132520515, 'weight_decay': 0.005, 'warmup_steps': 117, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 78 with params: {'learning_rate': 0.00032453503504647205, 'weight_decay': 0.006, 'warmup_steps': 71, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7594,1.1902,0.855505,0.858844,0.854614,0.854906
2,0.275,1.108814,0.869266,0.870035,0.868843,0.869067
3,0.1874,1.163478,0.866972,0.867445,0.866633,0.866815
4,0.1436,1.17782,0.862385,0.862787,0.862718,0.862385
5,0.1151,1.132562,0.87156,0.871507,0.871601,0.871535
6,0.0953,1.054856,0.872706,0.873408,0.872306,0.872524
7,0.0816,1.079293,0.870413,0.870536,0.870222,0.870323
8,0.0713,1.141871,0.863532,0.863935,0.863213,0.863381
9,0.0641,1.1093,0.872706,0.872712,0.8726,0.872646
10,0.0591,1.102649,0.87156,0.871648,0.87139,0.871478


[I 2025-03-26 02:42:16,575] Trial 78 finished with value: 0.8714778260297408 and parameters: {'learning_rate': 0.00032453503504647205, 'weight_decay': 0.006, 'warmup_steps': 71, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 79 with params: {'learning_rate': 0.0004975964985887406, 'weight_decay': 0.006, 'warmup_steps': 70, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6664,1.152614,0.856651,0.858701,0.855951,0.856234
2,0.2237,1.065107,0.866972,0.86722,0.866717,0.866854
3,0.1486,1.090004,0.868119,0.868074,0.86818,0.868098
4,0.1114,1.119706,0.856651,0.856593,0.856666,0.85662
5,0.0883,1.105193,0.862385,0.86258,0.862634,0.862385
6,0.0729,1.029533,0.863532,0.864061,0.86317,0.863359


[I 2025-03-26 02:45:18,121] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 7.904055961381666e-05, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.237,1.332729,0.829128,0.829825,0.828671,0.828853
2,0.6426,1.228278,0.833716,0.834586,0.833218,0.833416
3,0.4649,1.289943,0.833716,0.834586,0.833218,0.833416
4,0.376,1.253658,0.84289,0.84304,0.843111,0.842888
5,0.3202,1.272993,0.853211,0.853176,0.853288,0.853192
6,0.2824,1.225644,0.853211,0.853649,0.852867,0.853037


[I 2025-03-26 02:48:30,919] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.0002765761315676863, 'weight_decay': 0.004, 'warmup_steps': 71, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8065,1.182514,0.84633,0.849557,0.845437,0.845694
2,0.2981,1.137517,0.864679,0.864834,0.864465,0.864576
3,0.2027,1.207592,0.868119,0.868322,0.867885,0.868011
4,0.1573,1.175484,0.864679,0.864874,0.864928,0.864678
5,0.1275,1.187917,0.866972,0.866998,0.866843,0.866902
6,0.1065,1.096456,0.865826,0.865773,0.865802,0.865786
7,0.0916,1.175972,0.866972,0.867445,0.866633,0.866815
8,0.0807,1.227241,0.864679,0.864922,0.864423,0.864558
9,0.0726,1.207366,0.868119,0.868122,0.868012,0.868057
10,0.0671,1.203549,0.866972,0.866928,0.866928,0.866928


[I 2025-03-26 02:53:49,173] Trial 81 finished with value: 0.8669276753388904 and parameters: {'learning_rate': 0.0002765761315676863, 'weight_decay': 0.004, 'warmup_steps': 71, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 82 with params: {'learning_rate': 0.0004231236542539615, 'weight_decay': 0.007, 'warmup_steps': 49, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6894,1.135508,0.860092,0.864547,0.859076,0.859381
2,0.2402,1.058898,0.864679,0.865431,0.864254,0.864473
3,0.1612,1.100257,0.862385,0.862464,0.862213,0.862298
4,0.1215,1.124706,0.864679,0.865081,0.865012,0.864678
5,0.0966,1.070056,0.862385,0.86258,0.862634,0.862385
6,0.0802,1.059381,0.869266,0.869882,0.868885,0.86909
7,0.0685,1.101352,0.863532,0.863727,0.863297,0.86342
8,0.0602,1.134787,0.863532,0.864203,0.863128,0.863336
9,0.0541,1.108714,0.861239,0.861203,0.861171,0.861186
10,0.0499,1.103565,0.864679,0.86466,0.864591,0.864621


[I 2025-03-26 02:58:52,546] Trial 82 finished with value: 0.8646212141146752 and parameters: {'learning_rate': 0.0004231236542539615, 'weight_decay': 0.007, 'warmup_steps': 49, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 83 with params: {'learning_rate': 0.0006555223407620696, 'weight_decay': 0.01, 'warmup_steps': 48, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6007,1.180181,0.863532,0.864361,0.863086,0.863312
2,0.1977,1.16108,0.854358,0.856144,0.853698,0.853969
3,0.1308,1.115576,0.865826,0.865849,0.86597,0.865817
4,0.098,1.128421,0.856651,0.856674,0.856793,0.856642
5,0.0777,1.079642,0.857798,0.85775,0.85775,0.85775
6,0.0642,1.114068,0.855505,0.856224,0.855077,0.855285


[I 2025-03-26 03:01:49,216] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.0002551242417296694, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8178,1.150428,0.83945,0.841242,0.838764,0.839002
2,0.3117,1.109971,0.865826,0.865877,0.865675,0.865748
3,0.2137,1.166999,0.87156,0.871545,0.871474,0.871505
4,0.1668,1.199521,0.868119,0.86922,0.868643,0.868098
5,0.136,1.194032,0.865826,0.865827,0.865717,0.865762
6,0.1143,1.079477,0.872706,0.872712,0.8726,0.872646
7,0.0989,1.137962,0.873853,0.874483,0.873474,0.873683
8,0.087,1.185283,0.866972,0.86713,0.866759,0.866872
9,0.0785,1.177548,0.866972,0.867056,0.866801,0.866888
10,0.0725,1.172408,0.869266,0.86925,0.86918,0.86921


[I 2025-03-26 03:06:47,805] Trial 84 finished with value: 0.8692103255006183 and parameters: {'learning_rate': 0.0002551242417296694, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.4, 'temperature': 5.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 85 with params: {'learning_rate': 0.00028731625417467325, 'weight_decay': 0.0, 'warmup_steps': 73, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7973,1.17522,0.852064,0.854609,0.851278,0.851556
2,0.2917,1.130727,0.864679,0.865026,0.864381,0.864539
3,0.1982,1.192682,0.868119,0.868532,0.867801,0.867973
4,0.1532,1.187223,0.865826,0.867119,0.866391,0.865796
5,0.124,1.164213,0.858945,0.858942,0.858834,0.858878
6,0.1035,1.049619,0.869266,0.86925,0.86918,0.86921
7,0.089,1.167681,0.868119,0.868806,0.867717,0.86793
8,0.0783,1.195841,0.865826,0.866664,0.865381,0.865609
9,0.0704,1.165555,0.870413,0.870469,0.870264,0.870338
10,0.0652,1.163248,0.865826,0.865827,0.865717,0.865762


[I 2025-03-26 03:11:47,384] Trial 85 finished with value: 0.8657619572039268 and parameters: {'learning_rate': 0.00028731625417467325, 'weight_decay': 0.0, 'warmup_steps': 73, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 86 with params: {'learning_rate': 0.0008524401311384257, 'weight_decay': 0.003, 'warmup_steps': 63, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5638,1.210028,0.850917,0.854528,0.849983,0.850255
2,0.1801,1.120722,0.858945,0.859756,0.858498,0.858717
3,0.1189,1.045981,0.873853,0.873801,0.873895,0.873829
4,0.088,1.080336,0.854358,0.854299,0.854372,0.854325
5,0.0701,1.036573,0.856651,0.856605,0.856708,0.856629
6,0.0577,1.043298,0.857798,0.85774,0.857792,0.857762


[I 2025-03-26 03:14:51,999] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.00022057870171389332, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8632,1.168167,0.847477,0.84899,0.846857,0.847105
2,0.3401,1.115972,0.864679,0.864622,0.864675,0.864644
3,0.2333,1.188846,0.866972,0.866916,0.86697,0.866938
4,0.1832,1.25282,0.865826,0.866922,0.866349,0.865804
5,0.1508,1.182947,0.87156,0.871516,0.871516,0.871516
6,0.1278,1.123309,0.868119,0.868062,0.868138,0.86809
7,0.1111,1.197484,0.862385,0.862846,0.862044,0.862222
8,0.0986,1.242064,0.861239,0.86143,0.861002,0.861124
9,0.0893,1.233703,0.862385,0.862537,0.862171,0.862281
10,0.0829,1.228799,0.865826,0.865827,0.865717,0.865762


[I 2025-03-26 03:20:06,565] Trial 87 finished with value: 0.8657619572039268 and parameters: {'learning_rate': 0.00022057870171389332, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 88 with params: {'learning_rate': 0.00020625104195185143, 'weight_decay': 0.006, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8917,1.140999,0.848624,0.849177,0.848236,0.84842
2,0.3517,1.119573,0.868119,0.868087,0.868054,0.868069
3,0.242,1.250001,0.869266,0.869266,0.86939,0.869255
4,0.1903,1.292879,0.858945,0.860897,0.859634,0.858878
5,0.1572,1.191124,0.870413,0.870961,0.870053,0.870249
6,0.1334,1.114958,0.872706,0.872712,0.8726,0.872646
7,0.1163,1.161146,0.868119,0.868806,0.867717,0.86793
8,0.1034,1.226221,0.868119,0.868173,0.86797,0.868043
9,0.0937,1.24309,0.864679,0.864834,0.864465,0.864576
10,0.0872,1.232028,0.868119,0.868087,0.868054,0.868069


[I 2025-03-26 03:25:14,795] Trial 88 finished with value: 0.86806912293858 and parameters: {'learning_rate': 0.00020625104195185143, 'weight_decay': 0.006, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 89 with params: {'learning_rate': 6.43089617140213e-05, 'weight_decay': 0.008, 'warmup_steps': 101, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3301,1.417297,0.819954,0.822442,0.819115,0.819287
2,0.7255,1.313188,0.834862,0.835999,0.834302,0.834514
3,0.5348,1.324746,0.830275,0.831387,0.829713,0.829917
4,0.4345,1.307729,0.834862,0.835651,0.835312,0.834848
5,0.3711,1.324222,0.83945,0.839415,0.839522,0.839428
6,0.3304,1.261627,0.84289,0.842871,0.842984,0.842873


[I 2025-03-26 03:28:10,411] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.00033420998630500975, 'weight_decay': 0.005, 'warmup_steps': 10, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7393,1.120784,0.858945,0.860532,0.858329,0.858601
2,0.2718,1.087561,0.866972,0.867445,0.866633,0.866815
3,0.1841,1.135208,0.864679,0.86476,0.864507,0.864593
4,0.1416,1.151514,0.865826,0.86629,0.86618,0.865824
5,0.1135,1.108709,0.869266,0.869232,0.869348,0.869249
6,0.0948,1.027338,0.872706,0.872712,0.8726,0.872646
7,0.0809,1.091064,0.868119,0.868239,0.867928,0.868027
8,0.0708,1.119119,0.868119,0.868239,0.867928,0.868027
9,0.0637,1.103383,0.870413,0.870381,0.870348,0.870364
10,0.0587,1.088921,0.873853,0.873798,0.873853,0.873821


[I 2025-03-26 03:33:08,384] Trial 90 finished with value: 0.8738206864617699 and parameters: {'learning_rate': 0.00033420998630500975, 'weight_decay': 0.005, 'warmup_steps': 10, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 91 with params: {'learning_rate': 0.0002815671275199718, 'weight_decay': 0.005, 'warmup_steps': 13, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7916,1.151581,0.854358,0.856925,0.853572,0.853858
2,0.2997,1.137863,0.861239,0.86143,0.861002,0.861124
3,0.2044,1.186502,0.864679,0.864922,0.864423,0.864558
4,0.1587,1.197594,0.869266,0.870466,0.869811,0.869241
5,0.1284,1.195903,0.862385,0.862339,0.862339,0.862339
6,0.1077,1.12511,0.862385,0.862328,0.862381,0.86235


[I 2025-03-26 03:36:07,647] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0005904745362824023, 'weight_decay': 0.004, 'warmup_steps': 37, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6191,1.090549,0.870413,0.871107,0.870011,0.870227
2,0.2058,1.16731,0.858945,0.860113,0.858413,0.858662
3,0.1366,1.123588,0.862385,0.862328,0.862381,0.86235
4,0.1023,1.154448,0.856651,0.856605,0.856708,0.856629
5,0.0812,1.13855,0.855505,0.855446,0.855498,0.855467
6,0.0674,1.082529,0.858945,0.859133,0.858708,0.858829


[I 2025-03-26 03:39:07,770] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.00265768294671018, 'weight_decay': 0.008, 'warmup_steps': 117, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4736,1.027196,0.856651,0.856836,0.856414,0.856533
2,0.1503,1.123071,0.84633,0.846279,0.846279,0.846279
3,0.1002,1.073021,0.872706,0.87295,0.872979,0.872706
4,0.0746,1.088591,0.862385,0.8625,0.862592,0.862382
5,0.0591,1.040778,0.864679,0.864644,0.86476,0.864661
6,0.0486,1.146603,0.848624,0.850254,0.847983,0.848237


[I 2025-03-26 03:42:12,457] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0006161448586038617, 'weight_decay': 0.005, 'warmup_steps': 141, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.649,1.220392,0.863532,0.867487,0.862581,0.862904
2,0.2039,1.155175,0.864679,0.865779,0.86417,0.864421
3,0.1347,1.105924,0.870413,0.870567,0.870643,0.870411
4,0.1002,1.122916,0.850917,0.850931,0.850783,0.850839
5,0.0797,1.078482,0.865826,0.865849,0.86597,0.865817
6,0.0659,1.060756,0.862385,0.862537,0.862171,0.862281


[I 2025-03-26 03:45:21,109] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0007756146753848378, 'weight_decay': 0.009000000000000001, 'warmup_steps': 185, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6248,1.219737,0.853211,0.856852,0.852278,0.852559
2,0.1862,1.165446,0.858945,0.860532,0.858329,0.858601
3,0.1224,1.102867,0.864679,0.864625,0.864718,0.864653
4,0.0909,1.114785,0.855505,0.855554,0.855666,0.855498
5,0.0719,1.097492,0.850917,0.850917,0.851036,0.850905
6,0.06,1.089885,0.852064,0.852165,0.851867,0.851961


[I 2025-03-26 03:48:32,959] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 5.399635979922363e-05, 'weight_decay': 0.0, 'warmup_steps': 184, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4175,1.415207,0.815367,0.815312,0.815284,0.815297
2,0.7944,1.319553,0.834862,0.835055,0.834596,0.834715
3,0.6005,1.293251,0.830275,0.831576,0.829671,0.829881
4,0.4897,1.289552,0.833716,0.835344,0.834354,0.833652
5,0.4196,1.297168,0.838303,0.838283,0.838396,0.838286
6,0.3746,1.268112,0.840596,0.840577,0.84069,0.840579


[I 2025-03-26 03:51:48,896] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.002808343859880905, 'weight_decay': 0.007, 'warmup_steps': 95, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4655,1.067515,0.852064,0.853016,0.851572,0.851797
2,0.1512,1.123841,0.854358,0.854311,0.854414,0.854335
3,0.1007,1.217815,0.844037,0.844007,0.843942,0.84397


[I 2025-03-26 03:53:28,916] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0001147772186457988, 'weight_decay': 0.008, 'warmup_steps': 205, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1482,1.270062,0.837156,0.83781,0.836722,0.836908
2,0.5083,1.193133,0.83945,0.839415,0.839522,0.839428
3,0.3571,1.211385,0.849771,0.849869,0.849573,0.849666


[I 2025-03-26 03:55:08,277] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.00012336498024020963, 'weight_decay': 0.01, 'warmup_steps': 76, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.071,1.240998,0.838303,0.839973,0.837638,0.837871
2,0.4846,1.169035,0.84289,0.842871,0.842984,0.842873
3,0.3397,1.204048,0.855505,0.855647,0.855288,0.855395
4,0.2678,1.30409,0.856651,0.858595,0.85734,0.856583
5,0.2252,1.203837,0.862385,0.86298,0.862002,0.8622
6,0.1961,1.179464,0.862385,0.862537,0.862171,0.862281
7,0.1747,1.251568,0.853211,0.854652,0.852614,0.85287
8,0.1597,1.267378,0.855505,0.855832,0.855203,0.855355
9,0.1472,1.250686,0.861239,0.861761,0.860876,0.861063
10,0.1391,1.25369,0.862385,0.862728,0.862086,0.862243


[I 2025-03-26 04:00:33,303] Trial 99 finished with value: 0.8622432859399685 and parameters: {'learning_rate': 0.00012336498024020963, 'weight_decay': 0.01, 'warmup_steps': 76, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 100 with params: {'learning_rate': 0.004463096479266976, 'weight_decay': 0.003, 'warmup_steps': 128, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4633,1.165857,0.862385,0.863671,0.861834,0.862095
2,0.157,1.19106,0.849771,0.84985,0.849952,0.849766
3,0.1112,1.215818,0.84289,0.843493,0.842479,0.842665
4,0.0871,1.188889,0.844037,0.843984,0.843984,0.843984
5,0.0699,1.171651,0.858945,0.858887,0.858961,0.858914
6,0.0587,1.260969,0.841743,0.841749,0.841606,0.84166


[I 2025-03-26 04:03:46,411] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0005037372469841467, 'weight_decay': 0.005, 'warmup_steps': 95, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6737,1.114114,0.870413,0.872847,0.869675,0.870002
2,0.2226,1.080954,0.863532,0.864361,0.863086,0.863312
3,0.1484,1.13033,0.868119,0.868062,0.868138,0.86809
4,0.1112,1.190335,0.850917,0.850863,0.850951,0.850889
5,0.0884,1.142239,0.856651,0.856674,0.856793,0.856642
6,0.0727,1.126179,0.849771,0.851088,0.849194,0.849438


[I 2025-03-26 04:07:03,697] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.000413312810971622, 'weight_decay': 0.01, 'warmup_steps': 104, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7192,1.138089,0.861239,0.86284,0.860623,0.8609
2,0.246,1.095455,0.863532,0.863823,0.863255,0.863401
3,0.1645,1.186389,0.855505,0.855832,0.855203,0.855355
4,0.1239,1.174491,0.860092,0.860034,0.860087,0.860056
5,0.0985,1.138297,0.861239,0.861219,0.861339,0.861224
6,0.0815,1.150054,0.857798,0.858379,0.857413,0.857606


[I 2025-03-26 04:10:20,046] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0005362383460499971, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6295,1.106037,0.862385,0.863671,0.861834,0.862095
2,0.2175,1.121171,0.855505,0.856079,0.855119,0.85531
3,0.1441,1.107831,0.863532,0.863582,0.863381,0.863453
4,0.1078,1.160096,0.862385,0.862328,0.862381,0.86235
5,0.0853,1.109072,0.864679,0.864729,0.864844,0.864672
6,0.0703,1.065915,0.868119,0.868661,0.867759,0.867952
7,0.0601,1.08843,0.860092,0.860071,0.860003,0.860032
8,0.0527,1.091387,0.862385,0.862625,0.862128,0.862263
9,0.0474,1.051049,0.865826,0.865792,0.865759,0.865775
10,0.0436,1.054455,0.869266,0.869222,0.869222,0.869222


[I 2025-03-26 04:15:50,936] Trial 103 finished with value: 0.8692220257640818 and parameters: {'learning_rate': 0.0005362383460499971, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 1.0, 'temperature': 3.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 104 with params: {'learning_rate': 0.00045406484774417034, 'weight_decay': 0.006, 'warmup_steps': 118, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6999,1.132821,0.864679,0.865431,0.864254,0.864473
2,0.2333,1.102925,0.861239,0.861902,0.860834,0.86104
3,0.1552,1.147872,0.863532,0.863823,0.863255,0.863401
4,0.1168,1.151864,0.861239,0.861181,0.861255,0.861208
5,0.0933,1.141171,0.858945,0.858942,0.858834,0.858878
6,0.0769,1.137317,0.853211,0.853281,0.853035,0.853118


[I 2025-03-26 04:19:04,581] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.000492920793552149, 'weight_decay': 0.007, 'warmup_steps': 41, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6564,1.088473,0.863532,0.864724,0.863002,0.863259
2,0.2257,1.082558,0.872706,0.87313,0.87239,0.872565
3,0.1502,1.076316,0.865826,0.865943,0.865633,0.865732
4,0.1131,1.093015,0.865826,0.865773,0.865802,0.865786
5,0.0896,1.0724,0.862385,0.862328,0.862381,0.86235
6,0.0739,1.04147,0.865826,0.866024,0.865591,0.865715
7,0.0628,1.056549,0.866972,0.866928,0.866928,0.866928
8,0.0552,1.105003,0.862385,0.862537,0.862171,0.862281
9,0.0495,1.068991,0.863532,0.863479,0.863507,0.863492
10,0.0456,1.075402,0.864679,0.864625,0.864718,0.864653


[I 2025-03-26 04:24:29,796] Trial 105 finished with value: 0.8646532673892455 and parameters: {'learning_rate': 0.000492920793552149, 'weight_decay': 0.007, 'warmup_steps': 41, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 106 with params: {'learning_rate': 0.00034656180187221176, 'weight_decay': 0.009000000000000001, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7392,1.092502,0.858945,0.860764,0.858287,0.858568
2,0.2644,1.087756,0.857798,0.858866,0.857287,0.857528
3,0.1798,1.144738,0.858945,0.860532,0.858329,0.858601
4,0.1373,1.133264,0.860092,0.860286,0.860339,0.860091
5,0.1097,1.108831,0.868119,0.868062,0.868138,0.86809
6,0.091,1.04231,0.87156,0.871589,0.871432,0.871492
7,0.0779,1.089137,0.870413,0.870469,0.870264,0.870338
8,0.0682,1.145111,0.863532,0.863727,0.863297,0.86342
9,0.0614,1.097555,0.864679,0.864703,0.864549,0.864608
10,0.0567,1.105391,0.870413,0.870381,0.870348,0.870364


[I 2025-03-26 04:29:53,119] Trial 106 finished with value: 0.8703635729744308 and parameters: {'learning_rate': 0.00034656180187221176, 'weight_decay': 0.009000000000000001, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 107 with params: {'learning_rate': 0.00289927115065357, 'weight_decay': 0.008, 'warmup_steps': 45, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4496,1.093717,0.861239,0.86143,0.861002,0.861124
2,0.1493,1.214869,0.84289,0.843493,0.842479,0.842665
3,0.1012,1.116075,0.850917,0.850858,0.850909,0.850879


[I 2025-03-26 04:31:29,399] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.00015774850242095327, 'weight_decay': 0.0, 'warmup_steps': 33, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9791,1.195061,0.84289,0.844828,0.842184,0.842432
2,0.42,1.136468,0.860092,0.860038,0.860129,0.860065
3,0.2885,1.214088,0.870413,0.870469,0.870264,0.870338
4,0.2274,1.323373,0.860092,0.863015,0.860929,0.859967
5,0.1905,1.159781,0.873853,0.874112,0.8736,0.873741
6,0.1634,1.146773,0.87156,0.872044,0.871222,0.871407
7,0.1444,1.224788,0.869266,0.869745,0.868927,0.869111
8,0.1302,1.230099,0.870413,0.870717,0.870138,0.870288
9,0.1192,1.232928,0.864679,0.86476,0.864507,0.864593
10,0.1115,1.234417,0.866972,0.867325,0.866675,0.866835


[I 2025-03-26 04:36:53,339] Trial 108 finished with value: 0.8668351764086362 and parameters: {'learning_rate': 0.00015774850242095327, 'weight_decay': 0.0, 'warmup_steps': 33, 'lambda_param': 0.8, 'temperature': 7.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 109 with params: {'learning_rate': 0.00015155013708833715, 'weight_decay': 0.008, 'warmup_steps': 75, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0007,1.198157,0.837156,0.838126,0.836638,0.836846
2,0.4273,1.133702,0.852064,0.852057,0.851951,0.851994
3,0.2944,1.196982,0.864679,0.86476,0.864507,0.864593
4,0.2319,1.336984,0.866972,0.868373,0.867559,0.866938
5,0.1945,1.17381,0.875,0.875313,0.874726,0.87488
6,0.167,1.161101,0.869266,0.869352,0.869096,0.869183
7,0.1478,1.243565,0.866972,0.867445,0.866633,0.866815
8,0.1336,1.25535,0.866972,0.867445,0.866633,0.866815
9,0.1221,1.24729,0.869266,0.86925,0.86918,0.86921
10,0.1145,1.247395,0.870413,0.870469,0.870264,0.870338


[I 2025-03-26 04:42:20,141] Trial 109 finished with value: 0.8703376437443334 and parameters: {'learning_rate': 0.00015155013708833715, 'weight_decay': 0.008, 'warmup_steps': 75, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 110 with params: {'learning_rate': 0.0002167850769926573, 'weight_decay': 0.009000000000000001, 'warmup_steps': 91, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8838,1.137832,0.84633,0.847529,0.845773,0.846006
2,0.3398,1.109106,0.868119,0.868087,0.868054,0.868069
3,0.2333,1.175329,0.870413,0.870436,0.870559,0.870404
4,0.183,1.238611,0.858945,0.860897,0.859634,0.858878
5,0.1502,1.183376,0.87156,0.871648,0.87139,0.871478
6,0.1271,1.096749,0.870413,0.870494,0.870601,0.870409
7,0.1106,1.182355,0.870413,0.871107,0.870011,0.870227
8,0.098,1.205047,0.864679,0.864633,0.864633,0.864633
9,0.0887,1.213336,0.862385,0.862464,0.862213,0.862298
10,0.0823,1.195703,0.864679,0.86466,0.864591,0.864621


[I 2025-03-26 04:47:48,376] Trial 110 finished with value: 0.8646212141146752 and parameters: {'learning_rate': 0.0002167850769926573, 'weight_decay': 0.009000000000000001, 'warmup_steps': 91, 'lambda_param': 0.1, 'temperature': 6.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 111 with params: {'learning_rate': 0.0006537379443423989, 'weight_decay': 0.004, 'warmup_steps': 147, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6413,1.275511,0.854358,0.859299,0.853277,0.853544
2,0.1992,1.131425,0.863532,0.864361,0.863086,0.863312
3,0.1312,1.13206,0.860092,0.860038,0.860129,0.860065


[I 2025-03-26 04:49:26,088] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.00018676136048101407, 'weight_decay': 0.002, 'warmup_steps': 139, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.95,1.143082,0.841743,0.842151,0.841395,0.841556
2,0.377,1.112399,0.863532,0.863498,0.863465,0.86348
3,0.2579,1.209535,0.870413,0.870469,0.870264,0.870338
4,0.2031,1.279575,0.858945,0.860432,0.85955,0.858903
5,0.1686,1.172337,0.877294,0.877728,0.876979,0.877158
6,0.1434,1.133682,0.872706,0.87265,0.872727,0.872678
7,0.1259,1.172679,0.868119,0.868322,0.867885,0.868011
8,0.1122,1.199329,0.864679,0.864834,0.864465,0.864576
9,0.1017,1.199177,0.869266,0.869293,0.869138,0.869197
10,0.0948,1.197689,0.868119,0.868173,0.86797,0.868043


[I 2025-03-26 04:54:50,161] Trial 112 finished with value: 0.8680427347840562 and parameters: {'learning_rate': 0.00018676136048101407, 'weight_decay': 0.002, 'warmup_steps': 139, 'lambda_param': 0.5, 'temperature': 4.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 113 with params: {'learning_rate': 0.00031792934684326944, 'weight_decay': 0.001, 'warmup_steps': 143, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7971,1.118347,0.860092,0.861575,0.859497,0.859766
2,0.2784,1.087036,0.865826,0.866361,0.865465,0.865656
3,0.1892,1.154463,0.869266,0.869427,0.869054,0.869167
4,0.1449,1.175988,0.860092,0.861273,0.860634,0.860065
5,0.1163,1.143772,0.860092,0.86068,0.859708,0.859903
6,0.0967,1.064235,0.870413,0.870619,0.87018,0.870306
7,0.0826,1.111894,0.864679,0.864922,0.864423,0.864558
8,0.0724,1.163375,0.862385,0.862846,0.862044,0.862222
9,0.0651,1.11887,0.862385,0.862366,0.862297,0.862327
10,0.0601,1.111254,0.868119,0.868087,0.868054,0.868069


[I 2025-03-26 05:00:16,777] Trial 113 finished with value: 0.86806912293858 and parameters: {'learning_rate': 0.00031792934684326944, 'weight_decay': 0.001, 'warmup_steps': 143, 'lambda_param': 0.5, 'temperature': 3.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 114 with params: {'learning_rate': 0.00033090371518924275, 'weight_decay': 0.002, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7403,1.149132,0.854358,0.857216,0.85353,0.853818
2,0.273,1.103575,0.869266,0.869293,0.869138,0.869197
3,0.1857,1.133058,0.865826,0.865827,0.865717,0.865762
4,0.1424,1.174521,0.863532,0.864129,0.863928,0.863528
5,0.1146,1.132832,0.860092,0.86024,0.859876,0.859986
6,0.0952,1.045944,0.870413,0.870717,0.870138,0.870288
7,0.0813,1.112489,0.866972,0.867056,0.866801,0.866888
8,0.0713,1.09846,0.868119,0.868322,0.867885,0.868011
9,0.0642,1.094334,0.868119,0.868239,0.867928,0.868027
10,0.0593,1.089004,0.872706,0.872676,0.872643,0.872658


[I 2025-03-26 05:05:28,753] Trial 114 finished with value: 0.8726580230102816 and parameters: {'learning_rate': 0.00033090371518924275, 'weight_decay': 0.002, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 3.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 115 with params: {'learning_rate': 0.00024287817463622654, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8311,1.199143,0.838303,0.839756,0.83768,0.837909
2,0.3277,1.150508,0.860092,0.860057,0.860171,0.860073
3,0.2236,1.224446,0.869266,0.869623,0.868969,0.869131
4,0.1742,1.35273,0.854358,0.858752,0.855382,0.854123
5,0.1428,1.178412,0.863532,0.863727,0.863297,0.86342
6,0.1202,1.106696,0.872706,0.872676,0.872643,0.872658
7,0.1042,1.155908,0.870413,0.870831,0.870096,0.870269
8,0.0921,1.233258,0.860092,0.860991,0.859624,0.859853
9,0.083,1.207784,0.862385,0.862728,0.862086,0.862243
10,0.0767,1.208125,0.863532,0.863532,0.863423,0.863467


[I 2025-03-26 05:10:43,330] Trial 115 finished with value: 0.8634672898057032 and parameters: {'learning_rate': 0.00024287817463622654, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 116 with params: {'learning_rate': 0.0004732634174500223, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6602,1.160519,0.861239,0.863591,0.860497,0.860799
2,0.2271,1.097672,0.857798,0.859059,0.857245,0.857498
3,0.1512,1.119985,0.865826,0.865877,0.865675,0.865748
4,0.1142,1.113427,0.864679,0.86497,0.86497,0.864679
5,0.0908,1.063797,0.864679,0.864729,0.864844,0.864672
6,0.0753,1.05334,0.864679,0.865431,0.864254,0.864473
7,0.0643,1.102913,0.860092,0.860327,0.859834,0.859967
8,0.0565,1.12908,0.855505,0.855732,0.855245,0.855376
9,0.0509,1.091045,0.861239,0.861351,0.861044,0.861142
10,0.0468,1.100116,0.860092,0.860169,0.859918,0.860003


[I 2025-03-26 05:15:58,166] Trial 116 finished with value: 0.8600026319252534 and parameters: {'learning_rate': 0.0004732634174500223, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 117 with params: {'learning_rate': 0.00014290292227389124, 'weight_decay': 0.01, 'warmup_steps': 6, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0126,1.232885,0.836009,0.836889,0.835512,0.835713
2,0.4502,1.173049,0.848624,0.84891,0.84891,0.848624
3,0.3112,1.215852,0.857798,0.85774,0.857792,0.857762


[I 2025-03-26 05:17:41,103] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0001895997787335893, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9052,1.190421,0.847477,0.849976,0.846689,0.846954
2,0.369,1.100587,0.868119,0.868273,0.868348,0.868118
3,0.2535,1.211269,0.868119,0.868122,0.868012,0.868057
4,0.2002,1.336344,0.853211,0.85672,0.85413,0.853037
5,0.1663,1.158726,0.870413,0.870469,0.870264,0.870338
6,0.1418,1.106394,0.870413,0.870381,0.870348,0.870364
7,0.1242,1.168062,0.865826,0.867235,0.865254,0.865528
8,0.1111,1.184866,0.863532,0.863647,0.863339,0.863437
9,0.101,1.198095,0.858945,0.859054,0.85875,0.858847
10,0.094,1.194074,0.862385,0.862625,0.862128,0.862263


[I 2025-03-26 05:23:07,115] Trial 118 finished with value: 0.8622628694182501 and parameters: {'learning_rate': 0.0001895997787335893, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 119 with params: {'learning_rate': 0.00046962371511445193, 'weight_decay': 0.003, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6536,1.207144,0.860092,0.864547,0.859076,0.859381
2,0.2275,1.120284,0.855505,0.856561,0.854993,0.85523
3,0.1521,1.143572,0.861239,0.861203,0.861171,0.861186
4,0.115,1.11971,0.865826,0.865807,0.865928,0.865811
5,0.0912,1.124149,0.861239,0.861261,0.861381,0.86123
6,0.0754,1.084793,0.870413,0.870417,0.870306,0.870351
7,0.0643,1.088991,0.87156,0.871589,0.871432,0.871492
8,0.0565,1.111349,0.87156,0.871507,0.871601,0.871535
9,0.0508,1.087006,0.864679,0.864679,0.864802,0.864668
10,0.0468,1.082496,0.869266,0.869213,0.869306,0.869241


[I 2025-03-26 05:28:34,810] Trial 119 finished with value: 0.8692412922235084 and parameters: {'learning_rate': 0.00046962371511445193, 'weight_decay': 0.003, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 120 with params: {'learning_rate': 0.00027979687767556316, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7924,1.157789,0.852064,0.853834,0.851404,0.851669
2,0.2986,1.125807,0.858945,0.859461,0.858582,0.858766
3,0.2045,1.144082,0.869266,0.869427,0.869054,0.869167
4,0.159,1.197603,0.869266,0.869671,0.869601,0.869265
5,0.1288,1.161688,0.866972,0.866916,0.86697,0.866938
6,0.1079,1.096044,0.870413,0.870469,0.870264,0.870338
7,0.093,1.15062,0.865826,0.866121,0.865549,0.865697
8,0.0818,1.159582,0.87156,0.871545,0.871474,0.871505
9,0.0736,1.142898,0.869266,0.869222,0.869222,0.869222
10,0.0679,1.139806,0.869266,0.86925,0.86918,0.86921


[I 2025-03-26 05:33:59,699] Trial 120 finished with value: 0.8692103255006183 and parameters: {'learning_rate': 0.00027979687767556316, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 121 with params: {'learning_rate': 8.532115701682182e-05, 'weight_decay': 0.003, 'warmup_steps': 147, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2349,1.30313,0.830275,0.83055,0.829965,0.8301
2,0.6094,1.228658,0.837156,0.837445,0.836849,0.836988
3,0.4374,1.281754,0.83945,0.840112,0.839017,0.839205
4,0.3502,1.263192,0.845183,0.84552,0.845489,0.845183
5,0.2972,1.253763,0.850917,0.851204,0.851204,0.850917
6,0.2608,1.197887,0.858945,0.859337,0.858624,0.858789


[I 2025-03-26 05:37:09,665] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.00021703879021406816, 'weight_decay': 0.0, 'warmup_steps': 163, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9219,1.159479,0.840596,0.841337,0.840143,0.840339
2,0.3448,1.120606,0.862385,0.862339,0.862339,0.862339
3,0.2356,1.181116,0.875,0.87506,0.874853,0.874927
4,0.1848,1.227828,0.857798,0.860134,0.85855,0.857708
5,0.152,1.184654,0.866972,0.867056,0.866801,0.866888
6,0.1283,1.089054,0.870413,0.870361,0.87039,0.870374
7,0.1113,1.133322,0.869266,0.870035,0.868843,0.869067
8,0.0987,1.186067,0.862385,0.862464,0.862213,0.862298
9,0.089,1.171347,0.860092,0.860112,0.85996,0.860018
10,0.0827,1.164304,0.863532,0.863479,0.863507,0.863492


[I 2025-03-26 05:42:27,049] Trial 122 finished with value: 0.863491716864498 and parameters: {'learning_rate': 0.00021703879021406816, 'weight_decay': 0.0, 'warmup_steps': 163, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 123 with params: {'learning_rate': 0.0005646305142824726, 'weight_decay': 0.005, 'warmup_steps': 57, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6283,1.106726,0.869266,0.870589,0.868717,0.86899
2,0.2088,1.089746,0.862385,0.863294,0.861918,0.86215
3,0.1391,1.067319,0.868119,0.868067,0.868096,0.86808
4,0.104,1.081172,0.856651,0.856804,0.856877,0.85665
5,0.0826,1.072391,0.857798,0.857992,0.858045,0.857797
6,0.0684,1.050744,0.861239,0.861286,0.861087,0.861158


[I 2025-03-26 05:45:41,763] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.00014584620453722834, 'weight_decay': 0.001, 'warmup_steps': 25, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0103,1.223738,0.836009,0.837247,0.835428,0.835646
2,0.4395,1.161929,0.857798,0.857798,0.857919,0.857786
3,0.3045,1.199245,0.865826,0.865943,0.865633,0.865732
4,0.2396,1.324231,0.856651,0.859706,0.857508,0.856514
5,0.2011,1.167337,0.876147,0.87624,0.875979,0.876068
6,0.1733,1.144619,0.87156,0.871921,0.871264,0.871427
7,0.1539,1.220194,0.870413,0.870717,0.870138,0.870288
8,0.1396,1.23966,0.872706,0.873261,0.872348,0.872545
9,0.128,1.23054,0.865826,0.865877,0.865675,0.865748
10,0.1203,1.226664,0.868119,0.868239,0.867928,0.868027


[I 2025-03-26 05:51:02,119] Trial 124 finished with value: 0.8680274526060894 and parameters: {'learning_rate': 0.00014584620453722834, 'weight_decay': 0.001, 'warmup_steps': 25, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 125 with params: {'learning_rate': 0.00042274462193872217, 'weight_decay': 0.002, 'warmup_steps': 39, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6827,1.113992,0.863532,0.865905,0.862792,0.8631
2,0.2408,1.068497,0.866972,0.867733,0.866549,0.86677
3,0.1611,1.118525,0.863532,0.863647,0.863339,0.863437
4,0.1219,1.125574,0.864679,0.86497,0.86497,0.864679
5,0.0967,1.106899,0.868119,0.868142,0.868264,0.868111
6,0.0802,1.03204,0.87156,0.871648,0.87139,0.871478
7,0.0686,1.11252,0.864679,0.86476,0.864507,0.864593
8,0.0602,1.125146,0.860092,0.860429,0.859792,0.859947
9,0.0542,1.118137,0.866972,0.866955,0.866886,0.866916
10,0.0499,1.111599,0.863532,0.863479,0.863507,0.863492


[I 2025-03-26 05:56:19,807] Trial 125 finished with value: 0.863491716864498 and parameters: {'learning_rate': 0.00042274462193872217, 'weight_decay': 0.002, 'warmup_steps': 39, 'lambda_param': 0.8, 'temperature': 5.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 126 with params: {'learning_rate': 0.0009049791490282845, 'weight_decay': 0.0, 'warmup_steps': 181, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5981,1.236318,0.84633,0.851327,0.845226,0.845445
2,0.1766,1.116913,0.860092,0.86068,0.859708,0.859903
3,0.1166,1.063943,0.860092,0.860071,0.860003,0.860032


[I 2025-03-26 05:57:54,027] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0001955434173334935, 'weight_decay': 0.007, 'warmup_steps': 50, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9075,1.186516,0.852064,0.854335,0.85132,0.851596
2,0.3639,1.11835,0.865826,0.865769,0.865844,0.865796
3,0.2487,1.194903,0.870413,0.870417,0.870306,0.870351
4,0.196,1.300472,0.857798,0.859872,0.858508,0.857723
5,0.1629,1.182342,0.868119,0.868966,0.867675,0.867906
6,0.1384,1.126395,0.873853,0.873798,0.873853,0.873821
7,0.1212,1.216032,0.864679,0.865431,0.864254,0.864473
8,0.1079,1.243308,0.866972,0.86713,0.866759,0.866872
9,0.0981,1.249376,0.868119,0.868239,0.867928,0.868027
10,0.0911,1.241946,0.865826,0.866024,0.865591,0.865715


[I 2025-03-26 06:03:15,814] Trial 127 finished with value: 0.8657153123556285 and parameters: {'learning_rate': 0.0001955434173334935, 'weight_decay': 0.007, 'warmup_steps': 50, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 128 with params: {'learning_rate': 0.000376607933880846, 'weight_decay': 0.001, 'warmup_steps': 18, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7115,1.136444,0.860092,0.861575,0.859497,0.859766
2,0.2566,1.078301,0.866972,0.86713,0.866759,0.866872
3,0.1741,1.170302,0.860092,0.86024,0.859876,0.859986
4,0.1319,1.151909,0.868119,0.868466,0.868433,0.868119
5,0.1051,1.118627,0.864679,0.864644,0.86476,0.864661
6,0.0869,1.046585,0.87156,0.872044,0.871222,0.871407
7,0.0742,1.137933,0.863532,0.863935,0.863213,0.863381
8,0.0652,1.135355,0.868119,0.868419,0.867843,0.867993
9,0.0587,1.106288,0.863532,0.863582,0.863381,0.863453
10,0.0541,1.105671,0.866972,0.866998,0.866843,0.866902


[I 2025-03-26 06:08:39,312] Trial 128 finished with value: 0.8669024611044442 and parameters: {'learning_rate': 0.000376607933880846, 'weight_decay': 0.001, 'warmup_steps': 18, 'lambda_param': 1.0, 'temperature': 3.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 129 with params: {'learning_rate': 0.00025805376878275534, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.814,1.154767,0.844037,0.84487,0.843563,0.84377
2,0.315,1.120332,0.860092,0.860045,0.860045,0.860045
3,0.216,1.186348,0.866972,0.866998,0.866843,0.866902
4,0.1681,1.258844,0.858945,0.860432,0.85955,0.858903
5,0.1368,1.194527,0.860092,0.860429,0.859792,0.859947
6,0.1148,1.140386,0.864679,0.864834,0.864465,0.864576


[I 2025-03-26 06:11:46,834] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0005903984613460668, 'weight_decay': 0.001, 'warmup_steps': 142, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6583,1.187691,0.87156,0.873598,0.870885,0.871201
2,0.2054,1.112598,0.858945,0.860532,0.858329,0.858601
3,0.1361,1.100216,0.855505,0.85545,0.85554,0.855477


[I 2025-03-26 06:13:24,294] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0003334569816298886, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7439,1.14116,0.860092,0.862572,0.859329,0.85963
2,0.2694,1.092362,0.862385,0.862537,0.862171,0.862281
3,0.1834,1.184061,0.860092,0.860828,0.859666,0.859879
4,0.1405,1.201947,0.861239,0.861582,0.86155,0.861238
5,0.1132,1.190055,0.857798,0.85774,0.857792,0.857762
6,0.0939,1.077378,0.870413,0.870536,0.870222,0.870323
7,0.0805,1.165738,0.862385,0.862625,0.862128,0.862263
8,0.0706,1.167956,0.861239,0.861525,0.86096,0.861105
9,0.0637,1.147429,0.866972,0.866955,0.866886,0.866916
10,0.0588,1.142929,0.865826,0.865827,0.865717,0.865762


[I 2025-03-26 06:18:45,839] Trial 131 finished with value: 0.8657619572039268 and parameters: {'learning_rate': 0.0003334569816298886, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 132 with params: {'learning_rate': 0.0005066133239166055, 'weight_decay': 0.004, 'warmup_steps': 64, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6565,1.072219,0.876147,0.877728,0.875558,0.875859
2,0.2198,1.049024,0.866972,0.867581,0.866591,0.866793
3,0.1456,1.103444,0.858945,0.858942,0.858834,0.858878


[I 2025-03-26 06:20:23,277] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.00026034836401417295, 'weight_decay': 0.004, 'warmup_steps': 91, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8327,1.174276,0.84289,0.845079,0.842142,0.842392
2,0.3082,1.143712,0.860092,0.86024,0.859876,0.859986
3,0.2103,1.18417,0.866972,0.866998,0.866843,0.866902
4,0.1637,1.178903,0.869266,0.869942,0.869685,0.86926
5,0.1332,1.177599,0.861239,0.861237,0.861129,0.861173
6,0.1116,1.090982,0.864679,0.864644,0.86476,0.864661
7,0.0963,1.135212,0.870413,0.870831,0.870096,0.870269
8,0.0848,1.20667,0.861239,0.861351,0.861044,0.861142
9,0.0763,1.18165,0.860092,0.860034,0.860087,0.860056
10,0.0705,1.180685,0.860092,0.860038,0.860129,0.860065


[I 2025-03-26 06:25:46,945] Trial 133 finished with value: 0.8600652425549826 and parameters: {'learning_rate': 0.00026034836401417295, 'weight_decay': 0.004, 'warmup_steps': 91, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 134 with params: {'learning_rate': 0.0002700641330520949, 'weight_decay': 0.007, 'warmup_steps': 92, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8223,1.185259,0.844037,0.847555,0.8431,0.843344
2,0.3041,1.126579,0.863532,0.863498,0.863465,0.86348
3,0.2075,1.170659,0.87156,0.871589,0.871432,0.871492
4,0.1613,1.181168,0.863532,0.863877,0.863844,0.863532
5,0.1307,1.185615,0.864679,0.864622,0.864675,0.864644
6,0.1092,1.125189,0.863532,0.863475,0.863549,0.863502


[I 2025-03-26 06:29:02,585] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.002359072631914401, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4467,1.035801,0.866972,0.867445,0.866633,0.866815
2,0.1513,1.193441,0.852064,0.852561,0.851699,0.851877
3,0.1006,1.148059,0.852064,0.852165,0.851867,0.851961
4,0.0746,1.165013,0.852064,0.853834,0.851404,0.851669
5,0.0598,1.033783,0.868119,0.868273,0.868348,0.868118
6,0.0488,1.135409,0.848624,0.849319,0.848194,0.848393


[I 2025-03-26 06:32:15,972] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.001156179201901999, 'weight_decay': 0.0, 'warmup_steps': 19, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5033,1.199007,0.853211,0.857938,0.852151,0.852416
2,0.1655,1.151683,0.848624,0.84905,0.848278,0.848444
3,0.1094,1.144596,0.854358,0.854462,0.854161,0.854256


[I 2025-03-26 06:33:56,172] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0006342946194746763, 'weight_decay': 0.01, 'warmup_steps': 23, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5998,1.17493,0.861239,0.864485,0.860371,0.860684
2,0.2001,1.109057,0.864679,0.865598,0.864212,0.864448
3,0.1325,1.066128,0.865826,0.865827,0.865717,0.865762
4,0.0993,1.154916,0.852064,0.852242,0.851825,0.851943
5,0.0787,1.085553,0.857798,0.857817,0.857666,0.857723
6,0.0651,1.075375,0.853211,0.853534,0.852909,0.85306


[I 2025-03-26 06:37:09,358] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.00019424653046229976, 'weight_decay': 0.006, 'warmup_steps': 100, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.926,1.292641,0.838303,0.845041,0.837006,0.837098
2,0.3649,1.104108,0.868119,0.868239,0.867928,0.868027
3,0.2494,1.221455,0.869266,0.86925,0.86918,0.86921
4,0.1966,1.250268,0.855505,0.858107,0.856298,0.855395
5,0.1627,1.165602,0.870413,0.870961,0.870053,0.870249
6,0.1382,1.084443,0.877294,0.877244,0.877273,0.877257
7,0.1209,1.142572,0.873853,0.87481,0.87339,0.873638
8,0.1077,1.17406,0.865826,0.865877,0.865675,0.865748
9,0.0976,1.171939,0.87156,0.872044,0.871222,0.871407
10,0.0909,1.15937,0.87156,0.871589,0.871432,0.871492


[I 2025-03-26 06:42:34,687] Trial 138 finished with value: 0.8714920314111876 and parameters: {'learning_rate': 0.00019424653046229976, 'weight_decay': 0.006, 'warmup_steps': 100, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 139 with params: {'learning_rate': 0.0004916793812935462, 'weight_decay': 0.008, 'warmup_steps': 127, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6885,1.120442,0.865826,0.866505,0.865423,0.865633
2,0.2239,1.112956,0.858945,0.860764,0.858287,0.858568
3,0.1482,1.10333,0.862385,0.862328,0.862381,0.86235


[I 2025-03-26 06:44:05,254] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 9.942175778778898e-05, 'weight_decay': 0.005, 'warmup_steps': 104, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1643,1.275041,0.834862,0.834913,0.83468,0.834757
2,0.5522,1.208874,0.833716,0.833654,0.833681,0.833666
3,0.3945,1.216612,0.847477,0.847573,0.847278,0.847371
4,0.3123,1.323878,0.848624,0.850659,0.849331,0.848544
5,0.2642,1.203864,0.855505,0.85545,0.85554,0.855477
6,0.2307,1.20068,0.854358,0.854631,0.854077,0.854218


[I 2025-03-26 06:47:14,556] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.00031844389894899864, 'weight_decay': 0.005, 'warmup_steps': 93, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.774,1.145634,0.852064,0.854898,0.851236,0.851516
2,0.2781,1.080795,0.869266,0.86925,0.86918,0.86921
3,0.1888,1.12514,0.87156,0.871589,0.871432,0.871492
4,0.1449,1.163596,0.860092,0.860758,0.860508,0.860085
5,0.1163,1.116833,0.864679,0.864644,0.86476,0.864661
6,0.0964,1.016396,0.876147,0.876134,0.876063,0.876094
7,0.0824,1.06779,0.865826,0.866121,0.865549,0.865697
8,0.0723,1.121409,0.865826,0.866024,0.865591,0.865715
9,0.0651,1.101859,0.864679,0.86476,0.864507,0.864593
10,0.06,1.097477,0.866972,0.866955,0.866886,0.866916


[I 2025-03-26 06:52:37,610] Trial 141 finished with value: 0.8669157698076467 and parameters: {'learning_rate': 0.00031844389894899864, 'weight_decay': 0.005, 'warmup_steps': 93, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 142 with params: {'learning_rate': 0.00018099028564563143, 'weight_decay': 0.006, 'warmup_steps': 134, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9603,1.171632,0.838303,0.839554,0.837722,0.837944
2,0.3821,1.113747,0.864679,0.864622,0.864675,0.864644
3,0.2625,1.23613,0.865826,0.865773,0.865802,0.865786
4,0.2065,1.324938,0.857798,0.860134,0.85855,0.857708
5,0.1723,1.168732,0.875,0.875709,0.8746,0.874821
6,0.1463,1.122631,0.868119,0.868087,0.868054,0.868069
7,0.1288,1.195622,0.870413,0.870831,0.870096,0.870269
8,0.1151,1.197825,0.868119,0.868322,0.867885,0.868011
9,0.1045,1.20195,0.864679,0.864703,0.864549,0.864608
10,0.0975,1.198933,0.865826,0.865943,0.865633,0.865732


[I 2025-03-26 06:58:13,091] Trial 142 finished with value: 0.8657322778688039 and parameters: {'learning_rate': 0.00018099028564563143, 'weight_decay': 0.006, 'warmup_steps': 134, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 143 with params: {'learning_rate': 0.00021938632318006993, 'weight_decay': 0.01, 'warmup_steps': 143, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9007,1.130366,0.84633,0.847343,0.845815,0.846038
2,0.3398,1.096696,0.868119,0.8681,0.868222,0.868105
3,0.2323,1.175393,0.873853,0.873944,0.873684,0.873773
4,0.1815,1.206967,0.868119,0.869632,0.868727,0.86808
5,0.1491,1.169721,0.865826,0.865769,0.865844,0.865796
6,0.1259,1.091836,0.868119,0.868142,0.868264,0.868111
7,0.1093,1.146094,0.869266,0.870589,0.868717,0.86899
8,0.0967,1.167703,0.868119,0.868419,0.867843,0.867993
9,0.0874,1.177522,0.865826,0.866024,0.865591,0.865715
10,0.0811,1.165936,0.865826,0.865827,0.865717,0.865762


[I 2025-03-26 07:03:43,215] Trial 143 finished with value: 0.8657619572039268 and parameters: {'learning_rate': 0.00021938632318006993, 'weight_decay': 0.01, 'warmup_steps': 143, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 144 with params: {'learning_rate': 0.00016223331783774925, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9675,1.16761,0.841743,0.841867,0.841521,0.841623
2,0.4133,1.129866,0.860092,0.860057,0.860171,0.860073
3,0.2838,1.213042,0.870413,0.870361,0.87039,0.870374
4,0.2238,1.28144,0.858945,0.860657,0.859592,0.858891
5,0.187,1.161387,0.87156,0.871724,0.871348,0.871462
6,0.1606,1.142766,0.873853,0.87402,0.873642,0.873758
7,0.1418,1.207654,0.870413,0.870619,0.87018,0.870306
8,0.1281,1.209272,0.87156,0.871589,0.871432,0.871492
9,0.1166,1.239476,0.865826,0.865827,0.865717,0.865762
10,0.1091,1.22913,0.865826,0.865877,0.865675,0.865748


[I 2025-03-26 07:09:08,686] Trial 144 finished with value: 0.865747825823779 and parameters: {'learning_rate': 0.00016223331783774925, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 145 with params: {'learning_rate': 0.00023359450619544504, 'weight_decay': 0.009000000000000001, 'warmup_steps': 58, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8573,1.142975,0.847477,0.84899,0.846857,0.847105
2,0.3282,1.101186,0.865826,0.865827,0.865717,0.865762
3,0.2246,1.184481,0.875,0.874955,0.875063,0.87498
4,0.1757,1.245711,0.862385,0.863993,0.863013,0.862339
5,0.1439,1.18514,0.862385,0.862366,0.862297,0.862327
6,0.1214,1.119614,0.869266,0.869222,0.869222,0.869222
7,0.1049,1.18006,0.876147,0.876518,0.875852,0.876019
8,0.0928,1.230393,0.869266,0.869293,0.869138,0.869197
9,0.0838,1.221753,0.868119,0.868239,0.867928,0.868027
10,0.0776,1.211867,0.864679,0.86466,0.864591,0.864621


[I 2025-03-26 07:14:27,682] Trial 145 finished with value: 0.8646212141146752 and parameters: {'learning_rate': 0.00023359450619544504, 'weight_decay': 0.009000000000000001, 'warmup_steps': 58, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 146 with params: {'learning_rate': 0.00013865992779958379, 'weight_decay': 0.003, 'warmup_steps': 57, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0292,1.20357,0.840596,0.841192,0.840185,0.840368
2,0.4529,1.170525,0.849771,0.84985,0.849952,0.849766
3,0.313,1.226794,0.862385,0.862366,0.862297,0.862327
4,0.2465,1.356431,0.858945,0.861153,0.859676,0.858863
5,0.2078,1.209635,0.865826,0.866505,0.865423,0.865633
6,0.1796,1.188158,0.866972,0.867325,0.866675,0.866835
7,0.1593,1.270932,0.857798,0.859059,0.857245,0.857498
8,0.1448,1.274405,0.860092,0.86068,0.859708,0.859903
9,0.1329,1.27765,0.860092,0.860429,0.859792,0.859947
10,0.1249,1.275309,0.858945,0.859133,0.858708,0.858829


[I 2025-03-26 07:19:44,930] Trial 146 finished with value: 0.8588289181174557 and parameters: {'learning_rate': 0.00013865992779958379, 'weight_decay': 0.003, 'warmup_steps': 57, 'lambda_param': 0.4, 'temperature': 5.0}. Best is trial 31 with value: 0.8749524730461324.


Trial 147 with params: {'learning_rate': 0.00018524151460178165, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9194,1.149493,0.84633,0.847729,0.845731,0.845973
2,0.3754,1.127068,0.863532,0.863479,0.863507,0.863492
3,0.2587,1.229025,0.864679,0.86466,0.864591,0.864621
4,0.2047,1.328781,0.850917,0.854409,0.851835,0.850741
5,0.1706,1.164137,0.872706,0.87313,0.87239,0.872565
6,0.1452,1.10483,0.868119,0.868087,0.868054,0.868069
7,0.1275,1.161211,0.872706,0.873261,0.872348,0.872545
8,0.114,1.186515,0.866972,0.867056,0.866801,0.866888
9,0.1036,1.204719,0.863532,0.863582,0.863381,0.863453
10,0.0965,1.201631,0.863532,0.863582,0.863381,0.863453


[I 2025-03-26 07:25:03,183] Trial 147 finished with value: 0.8634529168635017 and parameters: {'learning_rate': 0.00018524151460178165, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 31 with value: 0.8749524730461324.


Trial 148 with params: {'learning_rate': 0.0003722013122961484, 'weight_decay': 0.01, 'warmup_steps': 75, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7313,1.123153,0.858945,0.859756,0.858498,0.858717
2,0.2577,1.049398,0.865826,0.865877,0.865675,0.865748
3,0.1744,1.126699,0.862385,0.862846,0.862044,0.862222


[I 2025-03-26 07:26:36,548] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0001627020201335242, 'weight_decay': 0.007, 'warmup_steps': 60, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9717,1.199179,0.840596,0.842283,0.839932,0.840171
2,0.4072,1.138544,0.863532,0.863612,0.863718,0.863528
3,0.2783,1.218563,0.866972,0.866955,0.866886,0.866916
4,0.2191,1.346574,0.856651,0.859119,0.857424,0.856552
5,0.1833,1.189411,0.870413,0.870536,0.870222,0.870323
6,0.1571,1.155966,0.872706,0.872832,0.872516,0.872618
7,0.1389,1.239231,0.868119,0.868806,0.867717,0.86793
8,0.1249,1.254968,0.872706,0.872832,0.872516,0.872618
9,0.114,1.269183,0.864679,0.864834,0.864465,0.864576
10,0.1065,1.2623,0.865826,0.866024,0.865591,0.865715


[I 2025-03-26 07:32:01,083] Trial 149 finished with value: 0.8657153123556285 and parameters: {'learning_rate': 0.0001627020201335242, 'weight_decay': 0.007, 'warmup_steps': 60, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}. Best is trial 31 with value: 0.8749524730461324.


In [None]:
print(best_trial_distill_aug)

BestRun(run_id='31', objective=0.8749524730461324, hyperparameters={'learning_rate': 0.0002255603737182001, 'weight_decay': 0.005, 'warmup_steps': 63, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}, run_summary=None)
