# Prohledávání hyperparametrů pro model MobileNetV2 nad datasetem CIFAR10

Tento notebook slouží k nalezení optimálních hyperparametrů nad datasetem CIFAR10 pro model MobileNetV2. Hyperparametry jsou hledány pro všechny varianty modelu (náhodně inicializovaný, předtrénovaný (doučení klasifikační hlavy) a předtrénovaný (doučení celého modelu)). Pro každou z variant jsou hledány hyperparametry pro normální trénink a trénink s destilací. 

K prohledávání je využito knihovny Optuna s algoritmem Hyperband. Nejlepší konfigurace je volena na základě F1-skóre, zkoušeno je 150 kombinací hyperparametrů pro každou variantu modelu. Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits. 

## Import knihoven a základní nastavení

In [None]:
from transformers import Trainer
import optuna
import torch
import math
import base
import os

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [2]:
dataset_part = base.get_dataset_part()

Resetování náhodného seedu pro replikovatelnost výsledků.

In [3]:
base.reset_seed()

Ověření dostupnosti GPU.

In [4]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Načtení datasetu a aplikace základních transformací.

In [5]:
DATASET = "cifar10"

In [None]:
transform = base.base_transforms()

test = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TEST, transform=transform)
train = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TRAIN, transform=transform)
eval = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.EVAL, transform=transform)

Základní konfigurace tréninku během prohledávání. Optuna nepracuje s epochami, ale s kroky. Níže je prováděn přepočet. 

Minimální délka tréninku jsou tři epochy, maximální deset epoch. Maximální počet kroků pro warm up je nastaven na 10 % první epochy.

In [8]:
num_epochs = 10
batch_size = 128

In [None]:
data_length = len(train)
min_r = math.ceil(data_length/batch_size)*3
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

## Prohledávání s normálním tréninkem náhodně inicializovaného modelu

Definice hledaných hyperparametrů a jejich rozmezí. 

In [10]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny, běhy jsou vyhodnocovány pomocí algoritmu Hyperband, pro volbu další kombinace hyperparametrů je využito TPESampleru. 

In [11]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [12]:
base.reset_seed()

Konfigurace jednotlivých tréninků.

In [13]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random_hp-search", logging_dir=f"~/logs/{DATASET}/random_hp-search", epochs=num_epochs, batch_size=batch_size)

Konfigurace trenéra pro jednotlivé tréninky. 

In [14]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_random_init_mobilenet(10)
)
  

Nastavení prohledávání.

In [15]:
best_base_random = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-random",
    n_trials=150
)

[I 2025-03-25 00:15:46,954] A new study created in memory with name: Base-random


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6653,1.244537,0.5604,0.575059,0.560358,0.54597
2,1.0135,0.816005,0.7093,0.712499,0.708921,0.709204
3,0.7453,0.728064,0.7506,0.753604,0.749972,0.746851


[I 2025-03-25 00:23:20,801] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5796,1.254713,0.5406,0.556954,0.540526,0.527542
2,1.0517,0.879832,0.6843,0.68697,0.684335,0.681831
3,0.8209,0.778138,0.7286,0.734709,0.727909,0.725455
4,0.6658,0.614202,0.7866,0.79074,0.786641,0.784389
5,0.5458,0.560927,0.8055,0.810158,0.80568,0.805738
6,0.4322,0.563114,0.8072,0.821199,0.806997,0.807658


[I 2025-03-25 00:30:51,531] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9666,1.528566,0.4375,0.453351,0.437659,0.424868
2,1.4916,1.313803,0.521,0.530406,0.520538,0.520262
3,1.2604,1.195557,0.5774,0.57956,0.576326,0.570877


[I 2025-03-25 00:34:39,280] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.706,1.327851,0.5097,0.504917,0.509677,0.497592
2,1.1982,1.023791,0.6301,0.634429,0.63045,0.624839
3,0.9563,0.938773,0.6778,0.680675,0.677216,0.670428
4,0.7887,0.703728,0.7544,0.754887,0.754223,0.752461
5,0.6544,0.663392,0.7734,0.78211,0.773311,0.773526
6,0.555,0.602315,0.7925,0.804544,0.79212,0.793235


[I 2025-03-25 00:42:29,136] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8339,1.552525,0.411,0.409891,0.410755,0.374266
2,1.4121,1.271517,0.5371,0.540464,0.537611,0.52989
3,1.1239,0.973697,0.6543,0.654871,0.653469,0.646838


[I 2025-03-25 00:46:11,870] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8638,1.415425,0.4792,0.477903,0.479801,0.463504
2,1.2766,1.06724,0.6165,0.62913,0.615372,0.618993
3,0.9694,0.978204,0.6655,0.670054,0.664614,0.660853
4,0.7702,0.738109,0.7405,0.742715,0.740484,0.739792
5,0.611,0.706136,0.7566,0.762205,0.756778,0.756119
6,0.4606,0.700777,0.7602,0.77148,0.759876,0.760723


[I 2025-03-25 00:53:59,290] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6269,1.115765,0.5916,0.590635,0.591597,0.582391
2,0.9906,0.840116,0.6981,0.705521,0.697622,0.694059
3,0.7437,0.721491,0.7514,0.75246,0.750861,0.748229
4,0.5758,0.565169,0.8063,0.807511,0.80635,0.804828
5,0.4421,0.553587,0.8117,0.817439,0.812083,0.812017
6,0.3236,0.562339,0.8097,0.815014,0.809418,0.809327


[I 2025-03-25 01:01:58,619] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8763,1.428996,0.4727,0.480272,0.472865,0.454897
2,1.3355,1.136551,0.5926,0.60298,0.592208,0.593149
3,1.0592,1.009522,0.6457,0.642131,0.644649,0.638392
4,0.8724,0.84222,0.6983,0.701359,0.698138,0.697935
5,0.7236,0.793991,0.7205,0.726044,0.720298,0.720559
6,0.5888,0.764945,0.7306,0.739431,0.730406,0.731438


[I 2025-03-25 01:09:56,652] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5964,1.161148,0.5816,0.587344,0.581687,0.57244
2,0.9869,0.80836,0.7125,0.712216,0.712337,0.709229
3,0.7479,0.744438,0.7458,0.751134,0.744917,0.742791
4,0.5871,0.593697,0.7936,0.797605,0.7935,0.79262
5,0.4557,0.552072,0.8101,0.814867,0.810178,0.81015
6,0.3336,0.543014,0.8184,0.827846,0.817849,0.819447
7,0.2206,0.611346,0.8166,0.820379,0.817178,0.813003
8,0.1283,0.54865,0.8357,0.839538,0.83583,0.836393
9,0.0624,0.756774,0.8162,0.829939,0.815868,0.815851
10,0.0273,0.595386,0.8334,0.83377,0.833832,0.83131


[I 2025-03-25 01:23:18,821] Trial 8 finished with value: 0.8313104459816 and parameters: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 8 with value: 0.8313104459816.


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5553,1.148026,0.5862,0.586048,0.586109,0.575125
2,1.0075,0.856153,0.6934,0.70245,0.692945,0.692595
3,0.7806,0.776151,0.7325,0.738882,0.732034,0.727808


[I 2025-03-25 01:27:22,421] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 6.888788881730778e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9414,1.473801,0.4512,0.452164,0.451348,0.439931
2,1.4561,1.290548,0.5318,0.540423,0.530977,0.529134
3,1.2219,1.197617,0.576,0.580769,0.574846,0.568206


[I 2025-03-25 01:31:28,131] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0020781267255701565, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7913,1.549186,0.414,0.434092,0.414078,0.384075
2,1.3209,1.120788,0.5933,0.592538,0.593137,0.583748
3,1.0618,1.004899,0.6478,0.652267,0.647109,0.638374
4,0.8932,0.777588,0.7232,0.730538,0.722707,0.724181
5,0.747,0.708473,0.7518,0.76143,0.751958,0.751594
6,0.6314,0.685296,0.7582,0.775841,0.757986,0.759325


[I 2025-03-25 01:39:30,295] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.0004229895735463087, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6002,1.113018,0.6017,0.604207,0.601426,0.594743
2,0.9933,0.819644,0.7087,0.714039,0.708064,0.706756
3,0.7501,0.745109,0.7411,0.74693,0.740432,0.736964
4,0.5883,0.563092,0.8025,0.806375,0.802797,0.801502
5,0.4631,0.514166,0.8207,0.826075,0.82096,0.82111
6,0.3428,0.537004,0.8187,0.832602,0.818297,0.820127


[I 2025-03-25 01:47:31,549] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0002893591596161301, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6474,1.135952,0.5789,0.583446,0.579001,0.572137
2,1.0009,0.799008,0.7191,0.721439,0.718743,0.71737
3,0.7208,0.718467,0.7567,0.758209,0.756639,0.752858
4,0.5496,0.575706,0.8021,0.807811,0.802154,0.802315
5,0.4148,0.551733,0.8145,0.819965,0.8149,0.814464
6,0.2835,0.540608,0.8213,0.830531,0.821218,0.822208
7,0.1733,0.660539,0.8133,0.820424,0.813892,0.811098
8,0.0919,0.597693,0.8312,0.837575,0.831418,0.831791
9,0.0439,0.832138,0.8141,0.833044,0.813708,0.813769
10,0.0185,0.645111,0.8307,0.834244,0.831346,0.828372


[I 2025-03-25 02:01:11,051] Trial 13 finished with value: 0.8283719419254172 and parameters: {'learning_rate': 0.0002893591596161301, 'weight_decay': 0.01, 'warmup_steps': 2}. Best is trial 8 with value: 0.8313104459816.


Trial 14 with params: {'learning_rate': 0.0002735652959498943, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.621,1.151086,0.5847,0.58835,0.58493,0.571478
2,0.9995,0.843324,0.699,0.709828,0.6988,0.701225
3,0.736,0.710118,0.7557,0.759572,0.754919,0.752508
4,0.5589,0.573654,0.8014,0.804745,0.801426,0.801038
5,0.4214,0.535657,0.8166,0.821316,0.816481,0.81659
6,0.2835,0.560595,0.8161,0.825461,0.816117,0.817016


[I 2025-03-25 02:09:08,882] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0031946974811222035, 'weight_decay': 0.008, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8735,1.669233,0.371,0.373568,0.37042,0.349703
2,1.4742,1.249956,0.533,0.533751,0.53338,0.522745
3,1.2067,1.116373,0.5954,0.606038,0.593744,0.58652
4,1.0331,0.934283,0.6556,0.663032,0.655689,0.650951
5,0.8968,0.826956,0.7016,0.714429,0.701484,0.702469
6,0.7705,0.754304,0.7326,0.741751,0.732129,0.733025


[I 2025-03-25 02:16:41,830] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 8.85713447869134e-05, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9027,1.460511,0.4546,0.457226,0.454396,0.438023
2,1.3967,1.196165,0.5594,0.571052,0.558448,0.558991
3,1.1315,1.039019,0.6311,0.629532,0.630253,0.624732


[I 2025-03-25 02:20:41,307] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.00032926542216520094, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6434,1.171574,0.5758,0.581425,0.575368,0.565535
2,1.0161,0.846922,0.697,0.697741,0.696929,0.693185
3,0.7585,0.732192,0.7524,0.754002,0.752191,0.749044
4,0.5781,0.587536,0.7957,0.805492,0.795898,0.795745
5,0.4382,0.515218,0.8205,0.822315,0.820719,0.820358
6,0.3085,0.550965,0.8153,0.826353,0.815155,0.816551
7,0.199,0.650834,0.809,0.819204,0.809828,0.806642
8,0.1081,0.555517,0.8368,0.841999,0.836732,0.837661
9,0.0506,0.794058,0.8063,0.825426,0.80599,0.807019
10,0.0227,0.597727,0.834,0.835848,0.834462,0.832188


[I 2025-03-25 02:33:31,324] Trial 17 finished with value: 0.8321876367128802 and parameters: {'learning_rate': 0.00032926542216520094, 'weight_decay': 0.006, 'warmup_steps': 15}. Best is trial 17 with value: 0.8321876367128802.


Trial 18 with params: {'learning_rate': 0.0001044907148504563, 'weight_decay': 0.006, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9033,1.420001,0.4711,0.474106,0.471763,0.44979
2,1.3307,1.089258,0.6118,0.619783,0.611011,0.612924
3,1.0206,0.992833,0.6602,0.664056,0.659196,0.65448
4,0.8129,0.76502,0.7294,0.732045,0.729541,0.727857
5,0.6589,0.726009,0.7445,0.74852,0.744556,0.744584
6,0.5222,0.717794,0.7553,0.767878,0.755053,0.756789


[I 2025-03-25 02:41:10,251] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0002696373345614392, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.653,1.130354,0.583,0.587452,0.582527,0.573832
2,1.0134,0.853642,0.6942,0.695472,0.693725,0.691936
3,0.7473,0.731094,0.7528,0.755241,0.752352,0.75017
4,0.5681,0.580318,0.7978,0.800222,0.797978,0.796009
5,0.4292,0.55916,0.8079,0.810413,0.808158,0.807835
6,0.2983,0.591479,0.8086,0.81905,0.808343,0.809041


[I 2025-03-25 02:48:57,205] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0006886022732454476, 'weight_decay': 0.006, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5958,1.16051,0.5767,0.578944,0.576216,0.567285
2,1.0304,0.863661,0.6862,0.690746,0.686205,0.678411
3,0.8014,0.735244,0.7471,0.751335,0.746499,0.744455
4,0.6449,0.599257,0.7902,0.795935,0.790388,0.788204
5,0.5245,0.575229,0.8019,0.808062,0.802096,0.802296
6,0.415,0.549862,0.8137,0.824894,0.813493,0.814657
7,0.3131,0.540417,0.8321,0.833943,0.832478,0.829964
8,0.2153,0.487175,0.848,0.850309,0.848035,0.847891
9,0.1338,0.638394,0.8246,0.837475,0.824335,0.825167
10,0.0726,0.537888,0.8459,0.846608,0.846371,0.84405


[I 2025-03-25 03:02:04,674] Trial 20 finished with value: 0.8440496184532578 and parameters: {'learning_rate': 0.0006886022732454476, 'weight_decay': 0.006, 'warmup_steps': 23}. Best is trial 20 with value: 0.8440496184532578.


Trial 21 with params: {'learning_rate': 0.0019067885875634165, 'weight_decay': 0.006, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8244,1.503195,0.429,0.447151,0.428848,0.411367
2,1.3519,1.150398,0.5796,0.587072,0.579342,0.574965
3,1.0561,1.034649,0.6463,0.647898,0.644999,0.63976


[I 2025-03-25 03:06:01,710] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0008176319549065851, 'weight_decay': 0.004, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6182,1.156602,0.5831,0.586733,0.581971,0.579987
2,1.0725,0.899258,0.6715,0.671631,0.671027,0.666055
3,0.852,0.832251,0.7164,0.719898,0.715727,0.710464
4,0.69,0.635571,0.7761,0.778363,0.776236,0.775202
5,0.5641,0.590698,0.7963,0.803387,0.796275,0.797015
6,0.451,0.569836,0.8045,0.816393,0.80429,0.805186
7,0.3492,0.565894,0.8202,0.823311,0.820725,0.817355
8,0.2519,0.496804,0.8375,0.839801,0.837748,0.837264
9,0.1613,0.637543,0.8201,0.833153,0.819977,0.818983
10,0.0949,0.53671,0.8421,0.846416,0.842555,0.840903


[I 2025-03-25 03:19:08,069] Trial 22 finished with value: 0.8409033192176631 and parameters: {'learning_rate': 0.0008176319549065851, 'weight_decay': 0.004, 'warmup_steps': 18}. Best is trial 20 with value: 0.8440496184532578.


Trial 23 with params: {'learning_rate': 0.0009267072507147457, 'weight_decay': 0.003, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6126,1.193916,0.5622,0.558093,0.562534,0.551338
2,1.0731,0.8941,0.6748,0.675082,0.674764,0.664053
3,0.846,0.902549,0.7001,0.707152,0.6993,0.695241


[I 2025-03-25 03:23:05,448] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0010023089553602236, 'weight_decay': 0.005, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6374,1.222837,0.5499,0.546329,0.549579,0.539197
2,1.116,0.94812,0.6616,0.670175,0.661421,0.658553
3,0.8813,0.85589,0.7046,0.710493,0.703725,0.700835
4,0.7239,0.662629,0.7639,0.766156,0.763937,0.762876
5,0.6014,0.593656,0.7912,0.796898,0.790883,0.791787
6,0.5008,0.577263,0.7983,0.810165,0.798022,0.799501


[I 2025-03-25 03:30:49,772] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0004487347982784188, 'weight_decay': 0.007, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.602,1.148932,0.5871,0.588571,0.58666,0.577857
2,0.9964,0.826038,0.7064,0.708562,0.705749,0.702695
3,0.7601,0.699588,0.755,0.759799,0.754322,0.751806
4,0.5952,0.594455,0.7936,0.800016,0.793805,0.792585
5,0.4646,0.55582,0.8087,0.816265,0.808866,0.809348
6,0.3444,0.540481,0.8214,0.831243,0.821356,0.822392
7,0.2382,0.56999,0.8226,0.825989,0.823144,0.820052
8,0.1457,0.501016,0.848,0.851761,0.848078,0.848053
9,0.074,0.677668,0.8265,0.838962,0.82614,0.826443
10,0.0331,0.54896,0.8483,0.850212,0.848432,0.847121


[I 2025-03-25 03:43:32,335] Trial 25 finished with value: 0.8471211776540211 and parameters: {'learning_rate': 0.0004487347982784188, 'weight_decay': 0.007, 'warmup_steps': 24}. Best is trial 25 with value: 0.8471211776540211.


Trial 26 with params: {'learning_rate': 0.00048204671805463905, 'weight_decay': 0.004, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6129,1.169758,0.5811,0.590649,0.581439,0.571595
2,0.9948,0.814308,0.7103,0.71125,0.710384,0.707558
3,0.7465,0.753168,0.7447,0.751191,0.74392,0.741433
4,0.595,0.580239,0.7963,0.801691,0.796379,0.795142
5,0.4649,0.539578,0.8155,0.821093,0.815776,0.815611
6,0.3512,0.524436,0.8279,0.836405,0.827794,0.828613
7,0.2417,0.599996,0.8151,0.821233,0.815769,0.811709
8,0.1483,0.503686,0.8466,0.850262,0.846735,0.846791
9,0.0784,0.661529,0.8279,0.841448,0.82734,0.82743
10,0.0355,0.541157,0.8494,0.852931,0.849874,0.847764


[I 2025-03-25 03:56:22,048] Trial 26 finished with value: 0.8477643425401415 and parameters: {'learning_rate': 0.00048204671805463905, 'weight_decay': 0.004, 'warmup_steps': 32}. Best is trial 26 with value: 0.8477643425401415.


Trial 27 with params: {'learning_rate': 0.00021059103361382344, 'weight_decay': 0.001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7275,1.245353,0.5524,0.563657,0.552807,0.535237
2,1.0789,0.921696,0.6696,0.68148,0.669392,0.672316
3,0.8106,0.755952,0.7384,0.743036,0.738039,0.736658
4,0.6227,0.611608,0.7887,0.790106,0.788606,0.787708
5,0.4707,0.591032,0.7973,0.800178,0.797786,0.796222
6,0.3339,0.58754,0.804,0.816023,0.803884,0.804784


[I 2025-03-25 04:04:07,107] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0004385895726901437, 'weight_decay': 0.008, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6398,1.185775,0.5728,0.583509,0.572722,0.563472
2,1.024,0.830274,0.7031,0.704513,0.702816,0.700332
3,0.766,0.730374,0.7465,0.75094,0.745793,0.743538
4,0.598,0.568366,0.8037,0.808226,0.803678,0.802663
5,0.4647,0.531651,0.8165,0.822143,0.816601,0.816885
6,0.3536,0.552503,0.8151,0.827971,0.815135,0.816119


[I 2025-03-25 04:12:09,912] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0009038748485757766, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6381,1.295509,0.5294,0.53251,0.529636,0.515913
2,1.0863,0.932052,0.6619,0.667578,0.661414,0.653092
3,0.8585,0.821336,0.7166,0.723942,0.715619,0.71255
4,0.7004,0.630008,0.782,0.783171,0.782053,0.779337
5,0.5693,0.574713,0.8017,0.806738,0.801913,0.801806
6,0.4701,0.570532,0.8069,0.818313,0.806667,0.807176
7,0.3638,0.554949,0.8246,0.827455,0.825246,0.821841
8,0.2709,0.491864,0.8426,0.844938,0.842751,0.841937
9,0.1803,0.59944,0.8313,0.841193,0.830925,0.830939
10,0.1097,0.508468,0.8477,0.849112,0.848005,0.846262


[I 2025-03-25 04:25:33,055] Trial 29 finished with value: 0.8462624463534348 and parameters: {'learning_rate': 0.0009038748485757766, 'weight_decay': 0.005, 'warmup_steps': 30}. Best is trial 26 with value: 0.8477643425401415.


Trial 30 with params: {'learning_rate': 0.0030001376948914087, 'weight_decay': 0.004, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8639,1.553592,0.4243,0.431432,0.424199,0.397775
2,1.4258,1.253239,0.5449,0.542189,0.545109,0.536546
3,1.1948,1.110606,0.6117,0.614036,0.610872,0.601993
4,1.0151,0.917937,0.6711,0.683799,0.671279,0.66985
5,0.8735,0.859281,0.7013,0.720734,0.701177,0.703074
6,0.7516,0.769736,0.7269,0.742726,0.727123,0.725984


[I 2025-03-25 04:33:46,088] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0006739040010427822, 'weight_decay': 0.005, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6427,1.197533,0.5641,0.56627,0.563737,0.553926
2,1.0443,0.857556,0.6937,0.697164,0.693577,0.691072
3,0.7946,0.762255,0.7383,0.744162,0.737845,0.735064


[I 2025-03-25 04:37:56,648] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.001403122459779143, 'weight_decay': 0.01, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7096,1.414559,0.4801,0.501463,0.4804,0.463982
2,1.2038,1.019253,0.6316,0.634466,0.631791,0.624516
3,0.9596,0.83106,0.7107,0.712759,0.710108,0.70776
4,0.7959,0.674873,0.7613,0.760671,0.761023,0.760067
5,0.666,0.681803,0.765,0.780172,0.765212,0.76636
6,0.5554,0.629545,0.7756,0.793841,0.775477,0.777497


[I 2025-03-25 04:46:07,582] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0005025100852131581, 'weight_decay': 0.004, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6304,1.124526,0.5936,0.595977,0.593229,0.584146
2,0.983,0.805081,0.714,0.713684,0.713356,0.710688
3,0.7418,0.783816,0.7394,0.743944,0.73873,0.735557


[I 2025-03-25 04:50:12,507] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0005863794791007642, 'weight_decay': 0.006, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6086,1.189097,0.574,0.579913,0.573532,0.566917
2,1.0191,0.805397,0.7137,0.71372,0.71323,0.710449
3,0.768,0.741867,0.7441,0.749709,0.743498,0.739398
4,0.6104,0.569678,0.8024,0.803763,0.802287,0.800947
5,0.489,0.534364,0.8152,0.820249,0.815369,0.815486
6,0.3794,0.541902,0.8172,0.827449,0.817003,0.818209
7,0.2737,0.586365,0.8194,0.825895,0.820174,0.816158
8,0.1807,0.477474,0.8495,0.852153,0.849487,0.84995
9,0.1042,0.641782,0.8362,0.848799,0.835767,0.835353
10,0.0502,0.523587,0.8515,0.853015,0.851887,0.850102


[I 2025-03-25 05:04:00,114] Trial 34 finished with value: 0.8501016779891574 and parameters: {'learning_rate': 0.0005863794791007642, 'weight_decay': 0.006, 'warmup_steps': 23}. Best is trial 34 with value: 0.8501016779891574.


Trial 35 with params: {'learning_rate': 0.00040061817893793147, 'weight_decay': 0.008, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6107,1.119944,0.5972,0.601322,0.597356,0.58585
2,0.976,0.815355,0.7064,0.706082,0.706668,0.701818
3,0.7343,0.69722,0.7637,0.765464,0.763047,0.760632
4,0.5703,0.562196,0.8031,0.805313,0.802998,0.80239
5,0.4442,0.530777,0.8177,0.821915,0.81782,0.817567
6,0.3265,0.559662,0.8167,0.829523,0.816795,0.817591
7,0.2126,0.61695,0.8218,0.827796,0.822649,0.818757
8,0.1251,0.529471,0.8406,0.847128,0.840725,0.841298
9,0.0618,0.677106,0.8318,0.845071,0.831414,0.83113
10,0.0258,0.546353,0.8456,0.847033,0.845953,0.8444


[I 2025-03-25 05:17:19,961] Trial 35 finished with value: 0.844400199821459 and parameters: {'learning_rate': 0.00040061817893793147, 'weight_decay': 0.008, 'warmup_steps': 23}. Best is trial 34 with value: 0.8501016779891574.


Trial 36 with params: {'learning_rate': 0.0003959189580096631, 'weight_decay': 0.006, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6269,1.169618,0.5762,0.586128,0.57605,0.565582
2,0.9989,0.826306,0.7045,0.712472,0.704628,0.704703
3,0.7545,0.753147,0.7457,0.750403,0.745378,0.743128
4,0.5848,0.565281,0.8003,0.805295,0.800477,0.800884
5,0.4528,0.538319,0.8161,0.820016,0.816435,0.816291
6,0.3321,0.566482,0.8118,0.823096,0.811281,0.812589
7,0.219,0.619584,0.8165,0.82274,0.817282,0.813194
8,0.1263,0.559834,0.831,0.835438,0.831289,0.831315
9,0.0621,0.735242,0.8201,0.834359,0.819706,0.819839
10,0.0264,0.564024,0.8366,0.839629,0.837046,0.835876


[I 2025-03-25 05:30:32,500] Trial 36 finished with value: 0.8358763053599244 and parameters: {'learning_rate': 0.0003959189580096631, 'weight_decay': 0.006, 'warmup_steps': 30}. Best is trial 34 with value: 0.8501016779891574.


Trial 37 with params: {'learning_rate': 0.001006382590642249, 'weight_decay': 0.002, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.664,1.266666,0.5338,0.54533,0.533788,0.526199
2,1.1368,0.948359,0.6572,0.660905,0.657096,0.65445
3,0.8932,0.874896,0.7096,0.713543,0.708411,0.70433


[I 2025-03-25 05:34:37,699] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0006636439057426917, 'weight_decay': 0.003, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6104,1.225017,0.5621,0.572206,0.561417,0.550277
2,1.027,0.837543,0.6993,0.700687,0.699291,0.698357
3,0.8036,0.776353,0.7324,0.736145,0.731567,0.729396


[I 2025-03-25 05:38:42,604] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0010475348879951107, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6791,1.324812,0.5179,0.522389,0.517444,0.507477
2,1.1304,0.950156,0.6503,0.655076,0.649765,0.646234
3,0.8956,0.844137,0.7105,0.714607,0.709442,0.706502
4,0.7477,0.649586,0.7674,0.768345,0.767329,0.766493
5,0.6253,0.608845,0.7903,0.799052,0.790724,0.790476
6,0.5159,0.610544,0.7878,0.805558,0.787615,0.788907


[I 2025-03-25 05:46:46,464] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.00022434184990426385, 'weight_decay': 0.005, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7206,1.244887,0.5512,0.559531,0.551037,0.537247
2,1.0559,0.834986,0.7076,0.713488,0.707117,0.707821
3,0.776,0.74603,0.7461,0.748001,0.745681,0.743104
4,0.5843,0.596641,0.7951,0.797851,0.79518,0.794427
5,0.4372,0.563679,0.8095,0.812387,0.809788,0.809266
6,0.299,0.567326,0.8094,0.818246,0.808984,0.810399


[I 2025-03-25 05:54:56,995] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.00038894451917175854, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6067,1.222579,0.5645,0.567913,0.564622,0.547579
2,0.9961,0.837075,0.7057,0.718606,0.7052,0.707998
3,0.7511,0.738673,0.7498,0.752626,0.749033,0.745903
4,0.5818,0.598355,0.7969,0.798558,0.796757,0.79552
5,0.4512,0.548903,0.8142,0.817262,0.814523,0.813423
6,0.3276,0.559959,0.816,0.827227,0.815947,0.816524
7,0.2195,0.592972,0.8289,0.832882,0.829389,0.826931
8,0.1267,0.537423,0.8448,0.845738,0.84491,0.844751
9,0.0622,0.687514,0.8289,0.838632,0.828657,0.828037
10,0.0262,0.58866,0.8408,0.842,0.841104,0.838964


[I 2025-03-25 06:08:58,309] Trial 41 finished with value: 0.8389639380504675 and parameters: {'learning_rate': 0.00038894451917175854, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}. Best is trial 34 with value: 0.8501016779891574.


Trial 42 with params: {'learning_rate': 0.0007323650903558112, 'weight_decay': 0.007, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6745,1.190171,0.5523,0.560013,0.552176,0.54177
2,1.0809,0.902894,0.6705,0.669396,0.67051,0.664899
3,0.8323,0.787859,0.7302,0.738953,0.729282,0.725695


[I 2025-03-25 06:13:18,728] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0005237563497429068, 'weight_decay': 0.007, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5635,1.107022,0.5997,0.596811,0.599444,0.589545
2,0.9721,0.797373,0.7149,0.717865,0.715091,0.711738
3,0.7409,0.727503,0.7485,0.754271,0.747934,0.744885
4,0.589,0.569962,0.8012,0.805474,0.801373,0.801373
5,0.4593,0.518032,0.8188,0.821579,0.818979,0.818791
6,0.3506,0.535378,0.8225,0.832038,0.822061,0.823294
7,0.2453,0.561364,0.8293,0.833265,0.830103,0.825671
8,0.1485,0.509084,0.8452,0.850215,0.845248,0.845571
9,0.079,0.670064,0.8284,0.84213,0.828148,0.828324
10,0.0349,0.560251,0.847,0.848083,0.847431,0.845237


[I 2025-03-25 06:27:36,427] Trial 43 finished with value: 0.8452374713860124 and parameters: {'learning_rate': 0.0005237563497429068, 'weight_decay': 0.007, 'warmup_steps': 19}. Best is trial 34 with value: 0.8501016779891574.


Trial 44 with params: {'learning_rate': 7.012112975444019e-05, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9584,1.498451,0.4424,0.455372,0.442237,0.428636
2,1.4542,1.277884,0.5286,0.541256,0.527734,0.526087
3,1.2151,1.144882,0.5924,0.588968,0.591145,0.583953


[I 2025-03-25 06:31:58,365] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.00037606587390441576, 'weight_decay': 0.007, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6279,1.09799,0.6028,0.604959,0.602361,0.597366
2,1.0057,0.835526,0.7041,0.710898,0.70364,0.705309
3,0.7541,0.742233,0.7421,0.74592,0.741398,0.738956
4,0.5807,0.567394,0.8069,0.807853,0.806807,0.805626
5,0.4415,0.552492,0.8104,0.816641,0.810722,0.810145
6,0.3202,0.541454,0.8219,0.830309,0.82183,0.822025
7,0.2054,0.616947,0.8201,0.824976,0.820645,0.817485
8,0.1169,0.523514,0.8409,0.843003,0.840881,0.840976
9,0.0557,0.690892,0.8253,0.835975,0.825227,0.824613
10,0.0235,0.591601,0.8361,0.839349,0.836701,0.833605


[I 2025-03-25 06:46:37,085] Trial 45 finished with value: 0.8336054311807434 and parameters: {'learning_rate': 0.00037606587390441576, 'weight_decay': 0.007, 'warmup_steps': 21}. Best is trial 34 with value: 0.8501016779891574.


Trial 46 with params: {'learning_rate': 0.00012827851737332596, 'weight_decay': 0.0, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8102,1.333975,0.5089,0.513747,0.509186,0.495437
2,1.2356,1.069448,0.6119,0.625521,0.611183,0.613672
3,0.9624,0.929928,0.6735,0.672483,0.672873,0.668507


[I 2025-03-25 06:50:58,126] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0012108719724028927, 'weight_decay': 0.005, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6741,1.228832,0.5494,0.555662,0.548612,0.542849
2,1.1414,1.015941,0.6287,0.635433,0.628958,0.620818
3,0.9098,0.856745,0.7062,0.709156,0.705389,0.700721
4,0.7458,0.671539,0.7645,0.769756,0.764165,0.763964
5,0.6249,0.626525,0.7813,0.790173,0.781406,0.782103
6,0.516,0.571912,0.804,0.811739,0.803891,0.804592


[I 2025-03-25 06:59:39,982] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0008196098195188294, 'weight_decay': 0.006, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5919,1.178234,0.5719,0.572891,0.571783,0.563802
2,1.0461,0.88078,0.683,0.689663,0.682446,0.68084
3,0.8252,0.776811,0.7337,0.736102,0.73326,0.728836
4,0.6734,0.627617,0.778,0.787722,0.777981,0.778115
5,0.5542,0.606392,0.7911,0.796782,0.791101,0.791464
6,0.4521,0.554398,0.8095,0.820803,0.809174,0.810849
7,0.3401,0.545868,0.8231,0.825914,0.823662,0.820841
8,0.2441,0.498566,0.8397,0.843056,0.839912,0.839505
9,0.1578,0.654055,0.8224,0.83522,0.822013,0.822325
10,0.089,0.556992,0.8365,0.83906,0.83697,0.834832


[I 2025-03-25 07:13:31,267] Trial 48 finished with value: 0.8348324620768921 and parameters: {'learning_rate': 0.0008196098195188294, 'weight_decay': 0.006, 'warmup_steps': 20}. Best is trial 34 with value: 0.8501016779891574.


Trial 49 with params: {'learning_rate': 0.001040373808699604, 'weight_decay': 0.008, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6178,1.206773,0.5579,0.564726,0.557072,0.5503
2,1.0972,0.930069,0.6606,0.661042,0.660664,0.649733
3,0.8611,0.815688,0.7174,0.722421,0.716648,0.712968


[I 2025-03-25 07:17:28,471] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8828,1.62625,0.382,0.395029,0.382275,0.361314
2,1.4506,1.226335,0.5493,0.549655,0.549481,0.538682
3,1.1681,1.095042,0.599,0.603744,0.597286,0.585585


[I 2025-03-25 07:21:26,476] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0002912292684538836, 'weight_decay': 0.008, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6527,1.160215,0.5872,0.59604,0.587079,0.5756
2,0.993,0.831544,0.7055,0.709264,0.705213,0.703837
3,0.7299,0.74698,0.7419,0.74759,0.741396,0.738829
4,0.5579,0.562097,0.8051,0.808302,0.80504,0.804745
5,0.4157,0.538045,0.8157,0.818457,0.815938,0.815762
6,0.2853,0.555676,0.8109,0.821609,0.810845,0.811547
7,0.1723,0.654844,0.8113,0.819539,0.811983,0.809064
8,0.0893,0.578654,0.8321,0.837492,0.832347,0.832599
9,0.0408,0.83307,0.8083,0.827073,0.807986,0.807072
10,0.0175,0.610473,0.8312,0.83309,0.831624,0.829352


[I 2025-03-25 07:35:11,437] Trial 51 finished with value: 0.8293522427480097 and parameters: {'learning_rate': 0.0002912292684538836, 'weight_decay': 0.008, 'warmup_steps': 24}. Best is trial 34 with value: 0.8501016779891574.


Trial 52 with params: {'learning_rate': 0.004803130612126116, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9375,1.68615,0.3537,0.357928,0.352974,0.334141
2,1.6299,1.475377,0.4431,0.448354,0.443672,0.433943
3,1.4116,1.326702,0.518,0.527462,0.517075,0.51109


[I 2025-03-25 07:39:16,438] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.00047129717363342505, 'weight_decay': 0.006, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.591,1.089242,0.6123,0.610668,0.611947,0.602955
2,0.9594,0.777325,0.7261,0.729998,0.725762,0.725301
3,0.7166,0.722207,0.758,0.764087,0.756938,0.754784
4,0.5621,0.553099,0.8064,0.810943,0.806373,0.806493
5,0.4395,0.525695,0.8195,0.824186,0.819934,0.819177
6,0.3258,0.542656,0.8181,0.830153,0.818126,0.818866
7,0.2169,0.582129,0.8223,0.830018,0.823024,0.819937
8,0.1301,0.52982,0.8405,0.846989,0.840638,0.841656
9,0.0676,0.663457,0.8296,0.840826,0.829363,0.829009
10,0.0304,0.535689,0.8476,0.849168,0.84799,0.846524


[I 2025-03-25 07:52:20,795] Trial 53 finished with value: 0.8465243590079513 and parameters: {'learning_rate': 0.00047129717363342505, 'weight_decay': 0.006, 'warmup_steps': 24}. Best is trial 34 with value: 0.8501016779891574.


Trial 54 with params: {'learning_rate': 0.0004492393034366378, 'weight_decay': 0.006, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5968,1.136852,0.5909,0.591006,0.591269,0.577277
2,0.9899,0.873941,0.6891,0.697647,0.688828,0.687889
3,0.758,0.757325,0.743,0.745138,0.74224,0.739146
4,0.5906,0.567185,0.7988,0.802235,0.798907,0.798538
5,0.4581,0.550424,0.8109,0.816408,0.811247,0.811212
6,0.34,0.546404,0.8165,0.82485,0.81631,0.817103
7,0.2326,0.580224,0.8235,0.828107,0.824173,0.820407
8,0.1409,0.527183,0.8401,0.845751,0.840039,0.84086
9,0.07,0.662528,0.8314,0.841807,0.83117,0.831005
10,0.0313,0.545437,0.8477,0.848807,0.848097,0.846282


[I 2025-03-25 08:05:36,956] Trial 54 finished with value: 0.8462822084686152 and parameters: {'learning_rate': 0.0004492393034366378, 'weight_decay': 0.006, 'warmup_steps': 25}. Best is trial 34 with value: 0.8501016779891574.


Trial 55 with params: {'learning_rate': 0.00038475845983399073, 'weight_decay': 0.005, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6215,1.183315,0.5743,0.576431,0.574352,0.565797
2,0.9965,0.831927,0.7043,0.708701,0.703829,0.703432
3,0.7484,0.72084,0.7491,0.75589,0.748432,0.746232
4,0.5747,0.583295,0.7989,0.805357,0.798919,0.798003
5,0.4444,0.547658,0.8114,0.814898,0.811696,0.810763
6,0.3208,0.557829,0.8124,0.823012,0.812603,0.812815
7,0.2062,0.607951,0.8149,0.824017,0.815643,0.813644
8,0.1176,0.527552,0.844,0.849244,0.84405,0.844432
9,0.0576,0.684924,0.8229,0.836151,0.822492,0.822032
10,0.0243,0.54365,0.8424,0.844341,0.842699,0.84139


[I 2025-03-25 08:18:36,841] Trial 55 finished with value: 0.8413903583474761 and parameters: {'learning_rate': 0.00038475845983399073, 'weight_decay': 0.005, 'warmup_steps': 22}. Best is trial 34 with value: 0.8501016779891574.


Trial 56 with params: {'learning_rate': 0.004913837305728667, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9554,1.658698,0.3619,0.385077,0.362059,0.344262
2,1.5689,1.327323,0.5143,0.519276,0.514016,0.500626
3,1.2934,1.208184,0.5665,0.591879,0.564416,0.555908


[I 2025-03-25 08:22:33,704] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00027424361526918534, 'weight_decay': 0.004, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6722,1.167194,0.5763,0.582051,0.576791,0.560924
2,1.038,0.865713,0.6947,0.70714,0.694358,0.695661
3,0.7739,0.759278,0.7407,0.741926,0.740085,0.737579
4,0.5984,0.613383,0.788,0.792498,0.78819,0.786715
5,0.4501,0.566973,0.8073,0.811377,0.807355,0.80705
6,0.313,0.57951,0.807,0.81618,0.806923,0.807317


[I 2025-03-25 08:30:06,403] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.000713823933571205, 'weight_decay': 0.006, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6098,1.194324,0.5602,0.564342,0.559949,0.551182
2,1.041,0.847049,0.6973,0.69782,0.69693,0.691025
3,0.7977,0.768544,0.7416,0.74558,0.74064,0.737168
4,0.6355,0.596893,0.7925,0.795806,0.792527,0.79085
5,0.5158,0.576863,0.801,0.808716,0.801544,0.801137
6,0.4082,0.539492,0.8182,0.826914,0.818052,0.81903
7,0.3014,0.545608,0.8314,0.833621,0.832092,0.828192
8,0.2061,0.492721,0.8457,0.85169,0.845745,0.846697
9,0.124,0.649748,0.8226,0.835684,0.822071,0.822008
10,0.065,0.543503,0.8454,0.847935,0.845853,0.843774


[I 2025-03-25 08:42:41,617] Trial 58 finished with value: 0.84377354250914 and parameters: {'learning_rate': 0.000713823933571205, 'weight_decay': 0.006, 'warmup_steps': 27}. Best is trial 34 with value: 0.8501016779891574.


Trial 59 with params: {'learning_rate': 0.00025300241415366547, 'weight_decay': 0.005, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7184,1.20235,0.5548,0.556452,0.554961,0.539043
2,1.0391,0.830648,0.7046,0.707937,0.704324,0.702963
3,0.7473,0.748001,0.7438,0.750947,0.743187,0.741993
4,0.5652,0.572039,0.8009,0.801564,0.801101,0.799183
5,0.4218,0.532168,0.8156,0.818449,0.816011,0.814972
6,0.2914,0.568354,0.8154,0.825018,0.815375,0.816303


[I 2025-03-25 08:50:15,336] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.00048625532935807857, 'weight_decay': 0.006, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.619,1.166365,0.5811,0.583072,0.580467,0.568859
2,0.9877,0.815868,0.7083,0.710755,0.708197,0.706938
3,0.7448,0.74958,0.7438,0.749395,0.743033,0.740102
4,0.5983,0.57895,0.7999,0.801494,0.800046,0.798283
5,0.4689,0.530599,0.8172,0.821726,0.817426,0.816477
6,0.3583,0.53299,0.8216,0.832195,0.821481,0.822119
7,0.2531,0.572021,0.8271,0.831537,0.827885,0.824316
8,0.1586,0.494743,0.8467,0.848896,0.847069,0.846437
9,0.0861,0.668793,0.8237,0.838163,0.823445,0.822713
10,0.0404,0.542022,0.845,0.846739,0.845448,0.843161


[I 2025-03-25 09:03:26,511] Trial 60 finished with value: 0.8431613885167893 and parameters: {'learning_rate': 0.00048625532935807857, 'weight_decay': 0.006, 'warmup_steps': 27}. Best is trial 34 with value: 0.8501016779891574.


Trial 61 with params: {'learning_rate': 0.0004754165351610492, 'weight_decay': 0.006, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6157,1.143178,0.5803,0.582999,0.580615,0.569584
2,0.9989,0.821691,0.7083,0.713637,0.708094,0.708482
3,0.7541,0.720672,0.7526,0.755948,0.752003,0.749677
4,0.5935,0.579536,0.8014,0.806635,0.800844,0.801214
5,0.4668,0.541689,0.8116,0.819568,0.811902,0.811701
6,0.3509,0.547809,0.8133,0.824205,0.813085,0.814571


[I 2025-03-25 09:11:29,811] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0003136627659382245, 'weight_decay': 0.006, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6373,1.136878,0.5835,0.590368,0.58384,0.573553
2,0.9904,0.809251,0.7085,0.711746,0.708464,0.7079
3,0.73,0.736784,0.7472,0.752727,0.74643,0.744193
4,0.5635,0.59517,0.7904,0.797072,0.790436,0.78962
5,0.4296,0.539506,0.8158,0.819848,0.816159,0.81556
6,0.3017,0.585481,0.804,0.817519,0.803875,0.80514


[I 2025-03-25 09:19:10,873] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0009299872552204633, 'weight_decay': 0.005, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6598,1.229908,0.5478,0.541747,0.547557,0.537583
2,1.1391,0.980755,0.6466,0.647958,0.64668,0.643714
3,0.8881,0.875932,0.6992,0.703602,0.697844,0.692241


[I 2025-03-25 09:22:58,323] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.0005240243031453702, 'weight_decay': 0.007, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5733,1.119008,0.5868,0.588395,0.586595,0.578465
2,0.9875,0.82748,0.7043,0.706723,0.70434,0.70032
3,0.7558,0.718286,0.7468,0.750604,0.746115,0.743378
4,0.5991,0.58243,0.7955,0.801109,0.795711,0.794617
5,0.4752,0.525354,0.819,0.825106,0.8193,0.819548
6,0.3616,0.539644,0.8204,0.830531,0.820296,0.820901
7,0.2585,0.56121,0.8241,0.82903,0.824789,0.822204
8,0.163,0.479721,0.8506,0.853537,0.850809,0.850853
9,0.0883,0.605862,0.8343,0.843796,0.834013,0.833789
10,0.0408,0.542892,0.8447,0.845987,0.845108,0.8428


[I 2025-03-25 09:35:47,555] Trial 64 finished with value: 0.8428003065308532 and parameters: {'learning_rate': 0.0005240243031453702, 'weight_decay': 0.007, 'warmup_steps': 17}. Best is trial 34 with value: 0.8501016779891574.


Trial 65 with params: {'learning_rate': 0.0003061126129336506, 'weight_decay': 0.004, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6327,1.156536,0.5861,0.591051,0.586016,0.57768
2,1.0112,0.834639,0.7012,0.703819,0.700603,0.697758
3,0.7624,0.795274,0.7342,0.742465,0.733449,0.731367
4,0.5814,0.575378,0.8016,0.806022,0.801717,0.801116
5,0.4478,0.566362,0.8074,0.811463,0.807635,0.80717
6,0.3172,0.579388,0.8103,0.823207,0.810167,0.811262


[I 2025-03-25 09:43:50,205] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0012558731927527068, 'weight_decay': 0.007, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6318,1.200802,0.5527,0.559427,0.552515,0.544069
2,1.1431,1.013084,0.6266,0.632174,0.626324,0.622847
3,0.9118,0.85932,0.7077,0.710228,0.706894,0.702407
4,0.7474,0.65016,0.7689,0.769178,0.768996,0.767177
5,0.6208,0.621011,0.783,0.790956,0.783227,0.783629
6,0.5173,0.570912,0.8006,0.809497,0.800461,0.800929


[I 2025-03-25 09:51:27,041] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0011036823145543708, 'weight_decay': 0.004, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6545,1.276134,0.5349,0.535443,0.534394,0.522982
2,1.1332,0.962638,0.6539,0.650962,0.654275,0.646751
3,0.9067,0.856082,0.7013,0.700908,0.700922,0.694271


[I 2025-03-25 09:55:17,662] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 9.189810555280755e-05, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8924,1.404661,0.4831,0.486232,0.483415,0.466676
2,1.3473,1.146342,0.5865,0.594477,0.58569,0.585755
3,1.0819,1.049235,0.6333,0.629374,0.633005,0.626255


[I 2025-03-25 09:59:09,485] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0005834035420803122, 'weight_decay': 0.005, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6422,1.165861,0.579,0.588518,0.578333,0.570796
2,1.0358,0.857664,0.6922,0.693141,0.692104,0.690079
3,0.7974,0.734878,0.7457,0.748547,0.744799,0.742381
4,0.6362,0.58836,0.7956,0.800326,0.795408,0.7952
5,0.5116,0.566649,0.8084,0.814294,0.808229,0.808936
6,0.3936,0.522829,0.8244,0.832585,0.824364,0.824709
7,0.2863,0.559808,0.8241,0.830263,0.824825,0.821393
8,0.1894,0.506263,0.8419,0.845902,0.842114,0.842225
9,0.1124,0.619815,0.8304,0.842399,0.830312,0.829475
10,0.056,0.526829,0.8498,0.850513,0.850167,0.848402


[I 2025-03-25 10:12:29,909] Trial 69 finished with value: 0.8484023652649375 and parameters: {'learning_rate': 0.0005834035420803122, 'weight_decay': 0.005, 'warmup_steps': 31}. Best is trial 34 with value: 0.8501016779891574.


Trial 70 with params: {'learning_rate': 0.00045171715955936434, 'weight_decay': 0.004, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6262,1.113703,0.5962,0.59871,0.595609,0.590845
2,1.0075,0.850593,0.6954,0.700996,0.695243,0.695053
3,0.77,0.783092,0.7363,0.740278,0.735722,0.731406
4,0.6131,0.589727,0.791,0.794565,0.791061,0.790154
5,0.4765,0.530758,0.8195,0.821768,0.819702,0.819203
6,0.362,0.564947,0.8116,0.823325,0.811588,0.812552
7,0.2449,0.61045,0.8201,0.826196,0.820967,0.816748
8,0.1484,0.510498,0.8408,0.842791,0.841068,0.840681
9,0.075,0.640643,0.832,0.841334,0.831663,0.83195
10,0.0331,0.540773,0.8461,0.846876,0.846593,0.844722


[I 2025-03-25 10:25:39,186] Trial 70 finished with value: 0.8447220895112725 and parameters: {'learning_rate': 0.00045171715955936434, 'weight_decay': 0.004, 'warmup_steps': 31}. Best is trial 34 with value: 0.8501016779891574.


Trial 71 with params: {'learning_rate': 0.0005754853583929809, 'weight_decay': 0.005, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6257,1.10021,0.6017,0.596649,0.601246,0.592893
2,1.0095,0.825741,0.7,0.703752,0.699205,0.694953
3,0.7791,0.748789,0.7459,0.751003,0.745492,0.742304
4,0.6168,0.581202,0.7996,0.801276,0.799676,0.797564
5,0.4924,0.548225,0.8101,0.814824,0.810272,0.809876
6,0.3831,0.531259,0.8221,0.830676,0.821957,0.823208
7,0.2733,0.557015,0.8236,0.825811,0.824229,0.820448
8,0.1762,0.49618,0.8468,0.848983,0.846899,0.846331
9,0.0995,0.676801,0.8223,0.837385,0.821954,0.822046
10,0.0475,0.525753,0.8454,0.847599,0.845705,0.844191


[I 2025-03-25 10:38:26,298] Trial 71 finished with value: 0.8441913020025487 and parameters: {'learning_rate': 0.0005754853583929809, 'weight_decay': 0.005, 'warmup_steps': 32}. Best is trial 34 with value: 0.8501016779891574.


Trial 72 with params: {'learning_rate': 0.000657509706223956, 'weight_decay': 0.007, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5875,1.178282,0.5779,0.585708,0.577157,0.569195
2,1.0226,0.847121,0.6947,0.694967,0.694746,0.692756
3,0.787,0.759635,0.7399,0.746452,0.739392,0.735251
4,0.6378,0.620093,0.7901,0.794403,0.79017,0.787745
5,0.5105,0.543719,0.8139,0.817541,0.813952,0.81409
6,0.3974,0.558073,0.8105,0.821745,0.81045,0.811724
7,0.2985,0.598361,0.8142,0.819336,0.814899,0.81093
8,0.1995,0.495074,0.8414,0.842428,0.841324,0.841267
9,0.1186,0.649609,0.8233,0.834634,0.822851,0.823035
10,0.0606,0.543646,0.8381,0.839394,0.838576,0.836339


[I 2025-03-25 10:51:12,768] Trial 72 finished with value: 0.8363390631395207 and parameters: {'learning_rate': 0.000657509706223956, 'weight_decay': 0.007, 'warmup_steps': 27}. Best is trial 34 with value: 0.8501016779891574.


Trial 73 with params: {'learning_rate': 5.953168512495511e-05, 'weight_decay': 0.01, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0221,1.549777,0.4269,0.43208,0.426697,0.414681
2,1.5399,1.383067,0.4986,0.511246,0.498019,0.497868
3,1.3166,1.288089,0.5426,0.547519,0.541063,0.535778


[I 2025-03-25 10:54:55,731] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0004991039464163667, 'weight_decay': 0.005, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6185,1.163697,0.5898,0.590743,0.58883,0.580064
2,1.0061,0.819795,0.7054,0.707158,0.705113,0.702661
3,0.7744,0.70355,0.7553,0.755213,0.754472,0.752016
4,0.6139,0.574439,0.7985,0.80163,0.798444,0.797811
5,0.4913,0.541583,0.8164,0.819531,0.816638,0.816287
6,0.3794,0.543027,0.8179,0.82939,0.817782,0.81852
7,0.2649,0.602446,0.8136,0.820169,0.814421,0.810013


[W 2025-03-25 11:05:03,181] Trial 74 failed with parameters: {'learning_rate': 0.0004991039464163667, 'weight_decay': 0.005, 'warmup_steps': 26} because of the following error: OSError(116, 'Stale file handle').
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/integration_utils.py", line 250, in _objective
    trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2241, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2639, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval, start_time)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3085, in 

OSError: [Errno 116] Stale file handle

In [None]:
print(best_base_random)

In [14]:
base.reset_seed()

## Prohledávání s destilací do náhodně inicializovaného modelu
Konfigurace jednotlivých tréninků.

In [15]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-KD_hp-search", logging_dir=f"~/logs/{DATASET}/random-KD_hp-search",  remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [16]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny. 

In [17]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [18]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_random_init_mobilenet(10)
)
  

Nastavení prohledávání.

In [19]:
best_distill_random = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-03-25 13:20:13,187] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2298,0.872347,0.5692,0.581125,0.569284,0.561281
2,0.7675,0.635119,0.7121,0.721079,0.712172,0.71178
3,0.5963,0.566114,0.7533,0.757952,0.752947,0.750231


Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.56k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.38k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.79k [00:00<?, ?B/s]

[I 2025-03-25 13:24:52,947] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3766,1.037159,0.4716,0.484527,0.471856,0.453253
2,0.9683,0.820372,0.6026,0.605285,0.601957,0.598993
3,0.7823,0.739958,0.6538,0.654594,0.652991,0.648829
4,0.6567,0.616794,0.7199,0.722348,0.719666,0.717586
5,0.5625,0.567622,0.7493,0.752407,0.749109,0.748891
6,0.4827,0.555159,0.7528,0.767252,0.752256,0.754423


[I 2025-03-25 13:34:52,729] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4785,1.152593,0.4085,0.41914,0.408474,0.395282
2,1.1912,1.013225,0.4879,0.488194,0.487419,0.481599
3,1.0324,0.967542,0.5229,0.526447,0.521876,0.513953


[I 2025-03-25 13:40:00,456] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3548,1.016344,0.4805,0.490937,0.481348,0.462826
2,0.9372,0.757395,0.6413,0.642301,0.640618,0.637644
3,0.7361,0.70256,0.6755,0.683057,0.674997,0.670798
4,0.6102,0.580544,0.7383,0.738828,0.738109,0.735498
5,0.5126,0.550294,0.7553,0.759606,0.75543,0.755488
6,0.4329,0.541298,0.7609,0.775251,0.760448,0.762221


[I 2025-03-25 13:48:53,325] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1263,0.874875,0.5744,0.586267,0.57387,0.565978
2,0.7641,0.641017,0.7086,0.718393,0.708189,0.706388
3,0.6077,0.583053,0.7444,0.751592,0.743436,0.741066


[I 2025-03-25 13:53:37,177] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2196,0.989602,0.4913,0.517548,0.490434,0.452456
2,0.8829,0.774165,0.6323,0.64712,0.631774,0.633952
3,0.7068,0.637375,0.7116,0.715798,0.71081,0.707944
4,0.5956,0.526254,0.7677,0.775974,0.767457,0.768298
5,0.5152,0.505565,0.7765,0.791968,0.77663,0.77773
6,0.4478,0.462367,0.8046,0.815925,0.804396,0.805487
7,0.3922,0.455518,0.8092,0.815011,0.809968,0.80588
8,0.3417,0.399556,0.8369,0.839741,0.837124,0.836893
9,0.2962,0.468296,0.8107,0.831159,0.809903,0.810047
10,0.2571,0.40036,0.8351,0.84232,0.835512,0.833966


[I 2025-03-25 14:06:54,456] Trial 5 finished with value: 0.8339655348942008 and parameters: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 5 with value: 0.8339655348942008.


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1131,0.851868,0.5884,0.596639,0.587969,0.579631
2,0.7662,0.655573,0.706,0.716809,0.70545,0.706467
3,0.6214,0.5683,0.7487,0.750362,0.747986,0.746051
4,0.525,0.473112,0.7969,0.801276,0.796468,0.794893
5,0.4472,0.426078,0.8261,0.829777,0.825861,0.826294
6,0.3814,0.442121,0.8099,0.82587,0.809813,0.81089


[I 2025-03-25 14:16:13,343] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1982,0.97358,0.5006,0.524704,0.500169,0.464567
2,0.8752,0.734532,0.658,0.653411,0.657696,0.65151
3,0.7125,0.645806,0.7066,0.709413,0.70548,0.702714
4,0.6046,0.548166,0.7602,0.76754,0.759376,0.759113
5,0.5255,0.506731,0.7898,0.798905,0.789482,0.791102
6,0.4596,0.469987,0.8002,0.809616,0.800283,0.800756
7,0.403,0.474736,0.8017,0.809454,0.802565,0.797419
8,0.3539,0.408513,0.8286,0.832636,0.82897,0.828447
9,0.3055,0.466946,0.8081,0.825795,0.807779,0.808546
10,0.2632,0.408489,0.8358,0.839457,0.836256,0.834376


[I 2025-03-25 14:31:29,626] Trial 7 finished with value: 0.8343759741433843 and parameters: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 7 with value: 0.8343759741433843.


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4056,1.062921,0.4588,0.465356,0.459118,0.440992
2,1.0529,0.881013,0.5687,0.57102,0.568174,0.565539
3,0.8565,0.831814,0.6039,0.607548,0.603015,0.597758
4,0.7271,0.680744,0.6812,0.685119,0.68108,0.678569
5,0.6311,0.624448,0.713,0.715999,0.712701,0.711899
6,0.5523,0.607508,0.727,0.739445,0.726323,0.728336


[I 2025-03-25 14:40:53,398] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1728,0.929811,0.5481,0.561529,0.548006,0.537571
2,0.8069,0.732628,0.6557,0.667513,0.655737,0.654741
3,0.6537,0.593781,0.7341,0.736337,0.7333,0.731262


[I 2025-03-25 14:44:40,034] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.004794768110099147, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3148,1.159086,0.3837,0.396403,0.383356,0.355686
2,1.0614,0.899875,0.5541,0.554276,0.553652,0.549261
3,0.874,0.824182,0.6085,0.612892,0.606725,0.598377
4,0.7671,0.653638,0.6992,0.700524,0.699179,0.697322
5,0.6691,0.60867,0.7225,0.736554,0.721927,0.724462
6,0.5947,0.568489,0.7434,0.757526,0.743299,0.744067


[I 2025-03-25 14:52:11,254] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0036642776254065634, 'weight_decay': 0.001, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.285,1.038032,0.4565,0.465922,0.456076,0.430048
2,0.9764,0.83553,0.5885,0.586968,0.587852,0.577374
3,0.7982,0.737462,0.6534,0.65758,0.652447,0.64886


[I 2025-03-25 14:56:00,719] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.002036475081925594, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2431,1.022631,0.473,0.485503,0.473628,0.442396
2,0.9371,0.842121,0.5896,0.592926,0.589721,0.584008
3,0.7748,0.719635,0.6669,0.66978,0.666836,0.662082
4,0.6551,0.562786,0.7501,0.754872,0.749906,0.749843
5,0.5653,0.544904,0.7598,0.773109,0.75969,0.760269
6,0.4964,0.489834,0.7896,0.802701,0.789526,0.790533
7,0.4332,0.482775,0.8001,0.803964,0.800794,0.796379
8,0.3802,0.429674,0.8187,0.822022,0.819045,0.817972
9,0.3309,0.445091,0.8132,0.822551,0.813026,0.812309
10,0.2887,0.420348,0.825,0.829684,0.825533,0.822957


[I 2025-03-25 15:08:32,624] Trial 12 finished with value: 0.822956887764042 and parameters: {'learning_rate': 0.002036475081925594, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 7 with value: 0.8343759741433843.


Trial 13 with params: {'learning_rate': 0.0018139304189064562, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.207,0.981379,0.497,0.498331,0.497378,0.468296
2,0.8699,0.751103,0.6393,0.640818,0.639355,0.63711
3,0.7015,0.630007,0.7213,0.722953,0.720389,0.718501


[I 2025-03-25 15:12:18,912] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0017454222385159357, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2103,0.967732,0.5102,0.521452,0.510495,0.496747
2,0.8772,0.760023,0.6414,0.64712,0.641142,0.637129
3,0.7123,0.652595,0.7012,0.702658,0.700271,0.696446
4,0.6047,0.548649,0.7592,0.766505,0.75875,0.757942
5,0.5244,0.520505,0.7775,0.786757,0.77748,0.778379
6,0.4569,0.468532,0.7991,0.810192,0.79934,0.799951
7,0.3998,0.44548,0.8124,0.818095,0.813064,0.809887
8,0.3466,0.405955,0.8316,0.837652,0.831813,0.831638
9,0.2982,0.438269,0.8218,0.832849,0.821382,0.822056
10,0.2571,0.405876,0.8349,0.840652,0.835387,0.833713


[I 2025-03-25 15:24:52,753] Trial 14 finished with value: 0.8337128351566463 and parameters: {'learning_rate': 0.0017454222385159357, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 7 with value: 0.8343759741433843.


Trial 15 with params: {'learning_rate': 0.0009349007798192055, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1429,0.9121,0.5414,0.569595,0.541506,0.526559
2,0.7959,0.7062,0.6704,0.681304,0.67044,0.668885
3,0.6488,0.602658,0.7291,0.732097,0.728698,0.725641


[I 2025-03-25 15:28:40,022] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0019870473557195355, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.19,0.942739,0.5293,0.536264,0.529132,0.515027
2,0.8727,0.726207,0.6592,0.661222,0.65881,0.656631
3,0.7009,0.654375,0.7026,0.710747,0.701672,0.699508
4,0.5915,0.525181,0.7688,0.77431,0.768718,0.768245
5,0.5169,0.488861,0.7898,0.803378,0.789482,0.791915
6,0.4509,0.467483,0.8022,0.814775,0.802162,0.802851
7,0.3952,0.455839,0.8076,0.813705,0.808447,0.803319
8,0.3436,0.396918,0.8358,0.839387,0.835916,0.835337
9,0.2965,0.451943,0.8186,0.833454,0.818114,0.819179
10,0.2555,0.394684,0.8367,0.841303,0.837131,0.835523


[I 2025-03-25 15:41:10,179] Trial 16 finished with value: 0.8355229212546507 and parameters: {'learning_rate': 0.0019870473557195355, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 16 with value: 0.8355229212546507.


Trial 17 with params: {'learning_rate': 0.0011499406103368222, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1405,0.909967,0.5472,0.577145,0.54758,0.531497
2,0.7896,0.647922,0.7027,0.710356,0.702532,0.702328
3,0.6365,0.586714,0.7382,0.74053,0.737529,0.735389
4,0.5441,0.502993,0.7785,0.785185,0.778581,0.777302
5,0.4688,0.45961,0.8029,0.812535,0.802808,0.803632
6,0.4044,0.439594,0.818,0.829138,0.817668,0.818741


[I 2025-03-25 15:48:40,176] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.001241033848515796, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1673,0.921508,0.5343,0.577849,0.535115,0.511806
2,0.8095,0.683241,0.6884,0.691931,0.688718,0.68433
3,0.6551,0.610899,0.727,0.731722,0.72598,0.724203


[I 2025-03-25 15:52:24,954] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.003761930752202365, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3057,1.077692,0.4325,0.465588,0.431572,0.397574
2,1.0065,0.856239,0.5865,0.580802,0.585155,0.578502
3,0.8302,0.778136,0.6353,0.639995,0.634273,0.629323
4,0.723,0.644958,0.7038,0.703041,0.70323,0.702029
5,0.6193,0.588545,0.7385,0.757405,0.737994,0.74138
6,0.5489,0.527637,0.7711,0.78903,0.770786,0.773264


[I 2025-03-25 15:59:54,747] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.00042547607186766345, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1683,0.856401,0.587,0.602734,0.586998,0.58011
2,0.7282,0.609223,0.7305,0.73738,0.730282,0.730607
3,0.5685,0.546738,0.762,0.7662,0.761331,0.759739
4,0.471,0.448211,0.8133,0.816821,0.813322,0.811993
5,0.3932,0.420337,0.8277,0.833644,0.827616,0.828377
6,0.3257,0.415654,0.829,0.839958,0.828923,0.830188
7,0.2673,0.434783,0.825,0.832435,0.82584,0.821484
8,0.2211,0.379389,0.8472,0.855202,0.847345,0.847903
9,0.1897,0.436637,0.8275,0.840872,0.827206,0.828029
10,0.1681,0.37279,0.8517,0.853725,0.851919,0.850607


[I 2025-03-25 16:12:27,776] Trial 20 finished with value: 0.8506074309419441 and parameters: {'learning_rate': 0.00042547607186766345, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 20 with value: 0.8506074309419441.


Trial 21 with params: {'learning_rate': 0.00017048302356543796, 'weight_decay': 0.005, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2931,0.956027,0.5131,0.5234,0.513603,0.490453
2,0.8516,0.719806,0.6651,0.672537,0.664536,0.665579
3,0.6698,0.642172,0.7136,0.715034,0.713166,0.709189
4,0.5507,0.518746,0.7725,0.777801,0.772369,0.771662
5,0.4576,0.4825,0.7926,0.79341,0.792613,0.791758
6,0.3794,0.486078,0.7876,0.800253,0.787811,0.78823


[I 2025-03-25 16:19:58,761] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.00011686897971316309, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3629,1.038218,0.4733,0.486836,0.473547,0.454508
2,0.9718,0.809044,0.6053,0.606197,0.604584,0.602875
3,0.7792,0.757781,0.6479,0.655794,0.647286,0.64365


[I 2025-03-25 16:23:43,695] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0006825523712074358, 'weight_decay': 0.005, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1353,0.880806,0.5749,0.599124,0.574431,0.571093
2,0.7474,0.631614,0.7114,0.71486,0.711185,0.708868
3,0.5978,0.56925,0.7495,0.754205,0.748481,0.74646
4,0.5027,0.470204,0.7977,0.802893,0.797758,0.796564
5,0.4306,0.426121,0.824,0.829072,0.824052,0.824292
6,0.3681,0.42559,0.822,0.836708,0.821555,0.822925
7,0.3073,0.418671,0.8305,0.837666,0.831194,0.828196
8,0.257,0.37508,0.8458,0.851663,0.846084,0.845933
9,0.2152,0.43592,0.8259,0.84141,0.825309,0.826864
10,0.1844,0.368597,0.8518,0.855887,0.852063,0.850727


[I 2025-03-25 16:36:16,441] Trial 23 finished with value: 0.8507272742848272 and parameters: {'learning_rate': 0.0006825523712074358, 'weight_decay': 0.005, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 23 with value: 0.8507272742848272.


Trial 24 with params: {'learning_rate': 0.0018964248810009032, 'weight_decay': 0.005, 'warmup_steps': 24, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2851,1.082134,0.4282,0.437198,0.427382,0.39443
2,0.9657,0.837431,0.5998,0.60154,0.600165,0.596592
3,0.7737,0.690185,0.6785,0.677439,0.677599,0.673137


[I 2025-03-25 16:40:08,139] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0008419149898406307, 'weight_decay': 0.004, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.14,0.854177,0.5814,0.594495,0.581701,0.569622
2,0.7647,0.641286,0.698,0.699186,0.698093,0.693355
3,0.6075,0.579309,0.7436,0.75165,0.742714,0.741425
4,0.5139,0.480499,0.7955,0.801347,0.795437,0.794393
5,0.4399,0.440196,0.8184,0.825036,0.818402,0.819143
6,0.3758,0.427118,0.8208,0.831631,0.82071,0.821815
7,0.3199,0.418336,0.8314,0.834343,0.83195,0.828414
8,0.2681,0.384046,0.8462,0.854429,0.846542,0.846301
9,0.2246,0.428988,0.8298,0.841859,0.82927,0.830375
10,0.1934,0.373131,0.8476,0.851131,0.848115,0.846422


[I 2025-03-25 16:52:46,184] Trial 25 finished with value: 0.846422475447576 and parameters: {'learning_rate': 0.0008419149898406307, 'weight_decay': 0.004, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 23 with value: 0.8507272742848272.


Trial 26 with params: {'learning_rate': 0.0008823325162674059, 'weight_decay': 0.005, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1263,0.879901,0.5688,0.585897,0.568823,0.564771
2,0.7632,0.638904,0.7085,0.713299,0.708395,0.705292
3,0.6177,0.603503,0.7376,0.743504,0.736806,0.734073
4,0.5271,0.490517,0.7941,0.802337,0.793815,0.793596
5,0.4515,0.451157,0.8094,0.818577,0.809547,0.810121
6,0.3877,0.424908,0.8237,0.834188,0.823814,0.824159
7,0.3323,0.41433,0.8312,0.836272,0.831704,0.828518
8,0.2784,0.373413,0.8472,0.851761,0.847485,0.84725
9,0.2356,0.405571,0.8381,0.848448,0.837861,0.838735
10,0.1999,0.372606,0.8487,0.851881,0.849231,0.847426


[I 2025-03-25 17:05:18,296] Trial 26 finished with value: 0.8474258804859682 and parameters: {'learning_rate': 0.0008823325162674059, 'weight_decay': 0.005, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 23 with value: 0.8507272742848272.


Trial 27 with params: {'learning_rate': 0.0019879731980440188, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2506,0.998852,0.4862,0.50755,0.48628,0.455465
2,0.9049,0.771629,0.6233,0.619501,0.622951,0.618548
3,0.7339,0.67615,0.6917,0.69579,0.690726,0.68663
4,0.6257,0.564498,0.7541,0.763445,0.753771,0.753086
5,0.5403,0.516513,0.7751,0.792125,0.775201,0.777652
6,0.4753,0.481581,0.793,0.802072,0.79262,0.793497
7,0.4169,0.485607,0.794,0.800182,0.794491,0.790026
8,0.3644,0.423905,0.8216,0.828717,0.821739,0.82128
9,0.3178,0.45533,0.8138,0.827292,0.813438,0.813832
10,0.2742,0.412617,0.8279,0.83375,0.828379,0.826575


[I 2025-03-25 17:17:54,520] Trial 27 finished with value: 0.8265745586866226 and parameters: {'learning_rate': 0.0019879731980440188, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 23 with value: 0.8507272742848272.


Trial 28 with params: {'learning_rate': 0.000496078616066282, 'weight_decay': 0.005, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1431,0.816794,0.6154,0.635229,0.615593,0.61275
2,0.7263,0.601153,0.7294,0.736579,0.729555,0.728038
3,0.5711,0.562714,0.7539,0.759528,0.753414,0.751959
4,0.4779,0.454467,0.8103,0.815789,0.809991,0.809194
5,0.402,0.419633,0.8247,0.829904,0.824789,0.825297
6,0.3382,0.428452,0.8153,0.83102,0.815357,0.816307
7,0.2794,0.417773,0.835,0.837738,0.835823,0.830678
8,0.2318,0.369999,0.8494,0.855089,0.849794,0.849717
9,0.1963,0.418057,0.8349,0.84731,0.834578,0.835582
10,0.1718,0.369171,0.8519,0.853945,0.852266,0.850389


[I 2025-03-25 17:30:25,963] Trial 28 finished with value: 0.8503885575948159 and parameters: {'learning_rate': 0.000496078616066282, 'weight_decay': 0.005, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 23 with value: 0.8507272742848272.


Trial 29 with params: {'learning_rate': 0.0005819754854990381, 'weight_decay': 0.005, 'warmup_steps': 15, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1537,0.864647,0.5854,0.61039,0.584709,0.581789
2,0.755,0.640191,0.7102,0.712878,0.710304,0.708686
3,0.6003,0.562893,0.7568,0.759043,0.755968,0.7542
4,0.5015,0.468869,0.8026,0.80832,0.802452,0.80213
5,0.4292,0.436995,0.8197,0.828009,0.819399,0.819974
6,0.3621,0.433675,0.8183,0.831232,0.818396,0.819019


[I 2025-03-25 17:37:52,608] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0006087189792076381, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1293,0.847736,0.5878,0.603114,0.587541,0.583892
2,0.7453,0.632207,0.7126,0.718501,0.712633,0.70947
3,0.5908,0.545865,0.762,0.766236,0.761557,0.760246
4,0.4897,0.450023,0.8121,0.818422,0.812117,0.811968
5,0.4184,0.439256,0.8124,0.820041,0.812349,0.813023
6,0.3515,0.406231,0.8288,0.837224,0.828781,0.829045
7,0.2935,0.408046,0.8328,0.837436,0.833291,0.831294
8,0.2431,0.36624,0.8517,0.85492,0.851948,0.851776
9,0.2046,0.4162,0.8313,0.84409,0.830914,0.831904
10,0.1777,0.364287,0.8561,0.859685,0.856278,0.855438


[I 2025-03-25 17:50:25,070] Trial 30 finished with value: 0.8554384102056222 and parameters: {'learning_rate': 0.0006087189792076381, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}. Best is trial 30 with value: 0.8554384102056222.


Trial 31 with params: {'learning_rate': 0.0002274114760873955, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2242,0.927578,0.5396,0.559943,0.540224,0.522173
2,0.7969,0.667916,0.691,0.697616,0.690978,0.690589
3,0.6229,0.612541,0.7279,0.733667,0.727429,0.723612
4,0.5087,0.491866,0.7957,0.802026,0.795364,0.794776
5,0.4197,0.467406,0.8094,0.813535,0.809211,0.809531
6,0.3416,0.455117,0.8056,0.817146,0.805311,0.805989
7,0.2768,0.449475,0.8173,0.821732,0.81792,0.814712
8,0.2278,0.423295,0.8217,0.827812,0.822201,0.821957
9,0.1967,0.466649,0.8114,0.82511,0.811263,0.811502
10,0.1761,0.410852,0.8292,0.831474,0.829454,0.827824


[I 2025-03-25 18:02:57,023] Trial 31 finished with value: 0.8278244195960106 and parameters: {'learning_rate': 0.0002274114760873955, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 6.5}. Best is trial 30 with value: 0.8554384102056222.


Trial 32 with params: {'learning_rate': 0.000384043280543124, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1628,0.87464,0.5758,0.60177,0.576105,0.566133
2,0.7355,0.613696,0.7185,0.720969,0.718522,0.716453
3,0.573,0.543964,0.7673,0.77238,0.766649,0.765602
4,0.4699,0.44482,0.8151,0.819649,0.815097,0.81398
5,0.3949,0.418489,0.8248,0.830212,0.824814,0.825314
6,0.3278,0.41052,0.831,0.842638,0.830844,0.832333
7,0.2682,0.426615,0.8298,0.83725,0.830657,0.826242
8,0.2212,0.370472,0.8512,0.856656,0.851319,0.851651
9,0.1894,0.437456,0.8281,0.845384,0.827745,0.828667
10,0.168,0.375309,0.8533,0.855128,0.853646,0.851986


[I 2025-03-25 18:15:30,755] Trial 32 finished with value: 0.8519859322481642 and parameters: {'learning_rate': 0.000384043280543124, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 30 with value: 0.8554384102056222.


Trial 33 with params: {'learning_rate': 0.0002100858756279339, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2676,0.965398,0.5211,0.526507,0.521863,0.503441
2,0.8294,0.6838,0.6824,0.689303,0.682345,0.681708
3,0.6459,0.643755,0.7118,0.715861,0.711143,0.706547
4,0.5289,0.493553,0.7875,0.789598,0.787551,0.785651
5,0.4349,0.462183,0.8028,0.804896,0.802916,0.802434
6,0.3569,0.458215,0.8011,0.813648,0.800773,0.802353
7,0.2888,0.470695,0.8057,0.81139,0.806519,0.801405
8,0.2381,0.430928,0.823,0.831702,0.82326,0.82338
9,0.2036,0.47605,0.8024,0.81576,0.802318,0.801619
10,0.1827,0.428856,0.8216,0.823694,0.822033,0.820142


[I 2025-03-25 18:28:03,651] Trial 33 finished with value: 0.8201423759788715 and parameters: {'learning_rate': 0.0002100858756279339, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 30 with value: 0.8554384102056222.


Trial 34 with params: {'learning_rate': 0.0003599989462065524, 'weight_decay': 0.002, 'warmup_steps': 21, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1736,0.814331,0.6125,0.614301,0.612123,0.607974
2,0.7442,0.637788,0.7102,0.720296,0.710237,0.712257
3,0.582,0.551644,0.7607,0.764448,0.759848,0.757929
4,0.4784,0.461317,0.805,0.807654,0.804876,0.803181
5,0.4001,0.437845,0.8193,0.824783,0.819233,0.819192
6,0.3293,0.434934,0.8142,0.829769,0.81373,0.815709


[I 2025-03-25 18:35:36,657] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.0006139968240256416, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1133,0.846867,0.5858,0.5981,0.585443,0.574426
2,0.7376,0.63286,0.7127,0.719982,0.712684,0.710036
3,0.5816,0.566848,0.7526,0.763174,0.751707,0.751597
4,0.4856,0.444516,0.8142,0.818023,0.814012,0.813158
5,0.4117,0.418661,0.8282,0.833648,0.828131,0.827959
6,0.3465,0.418018,0.8245,0.838485,0.824181,0.825391
7,0.2894,0.404351,0.8404,0.845993,0.840977,0.838544
8,0.2411,0.360853,0.8556,0.86092,0.855714,0.856059
9,0.2029,0.413212,0.839,0.851072,0.838493,0.839489
10,0.1756,0.354156,0.8622,0.863659,0.862632,0.860704


[I 2025-03-25 18:48:07,329] Trial 35 finished with value: 0.8607044045531035 and parameters: {'learning_rate': 0.0006139968240256416, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 36 with params: {'learning_rate': 0.0010024459813227932, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1088,0.871575,0.5669,0.584174,0.567202,0.561038
2,0.7725,0.666078,0.6905,0.694574,0.690571,0.684738
3,0.6263,0.602358,0.7324,0.738273,0.731827,0.729376


[I 2025-03-25 18:51:54,353] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0002072467729539187, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2368,0.92011,0.5413,0.556972,0.541404,0.524986
2,0.8079,0.677065,0.6863,0.696388,0.685472,0.68641
3,0.6302,0.587587,0.7402,0.742909,0.739549,0.736857
4,0.51,0.489731,0.7913,0.795209,0.791108,0.789771
5,0.4189,0.465937,0.8029,0.808584,0.80292,0.803833
6,0.3426,0.46923,0.7966,0.810138,0.796703,0.797418


[I 2025-03-25 18:59:26,605] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00015181932061058664, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2834,0.954714,0.5161,0.53104,0.516731,0.494091
2,0.8681,0.718074,0.6633,0.666699,0.662509,0.660549
3,0.6868,0.655311,0.7044,0.708122,0.703936,0.700024
4,0.565,0.534941,0.7688,0.771901,0.768393,0.767316
5,0.4696,0.50663,0.7808,0.786563,0.780998,0.781223
6,0.394,0.487549,0.7893,0.799431,0.789072,0.789964


[I 2025-03-25 19:07:00,733] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0010548253589466976, 'weight_decay': 0.01, 'warmup_steps': 6, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1318,0.932956,0.5561,0.570918,0.555415,0.549567
2,0.7992,0.670979,0.6869,0.685704,0.687001,0.682083
3,0.6559,0.61075,0.7278,0.733096,0.727006,0.725323


[I 2025-03-25 19:10:45,896] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.000645843586168911, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1542,0.835962,0.5989,0.611241,0.598423,0.593606
2,0.7467,0.627119,0.7151,0.717667,0.715065,0.712833
3,0.594,0.569625,0.7506,0.755122,0.749801,0.74801
4,0.4989,0.477353,0.8,0.804887,0.799784,0.799635
5,0.4263,0.424786,0.8227,0.826519,0.82266,0.82233
6,0.3629,0.427456,0.8197,0.83647,0.819326,0.821158
7,0.3023,0.409319,0.8353,0.838906,0.835859,0.832848
8,0.2515,0.37006,0.8534,0.857245,0.853645,0.853404
9,0.2124,0.425781,0.8361,0.849164,0.835548,0.837034
10,0.1836,0.367534,0.8536,0.857952,0.854027,0.852425


[I 2025-03-25 19:23:19,644] Trial 40 finished with value: 0.852424990711155 and parameters: {'learning_rate': 0.000645843586168911, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 41 with params: {'learning_rate': 0.00023155286712283352, 'weight_decay': 0.002, 'warmup_steps': 30, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2587,0.914403,0.5491,0.572787,0.549251,0.534566
2,0.7968,0.671634,0.6917,0.694139,0.691708,0.689251
3,0.6218,0.577091,0.745,0.745956,0.74467,0.741311
4,0.5044,0.479,0.7941,0.798637,0.794207,0.792856
5,0.4174,0.447077,0.8112,0.814708,0.811005,0.810873
6,0.3411,0.4457,0.8112,0.825299,0.811091,0.811768


[I 2025-03-25 19:30:50,414] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0009554051748895604, 'weight_decay': 0.0, 'warmup_steps': 32, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1848,0.907507,0.5491,0.573941,0.549248,0.53085
2,0.7952,0.677252,0.6824,0.683336,0.682207,0.678829
3,0.6476,0.598865,0.7314,0.734756,0.730467,0.728665
4,0.5449,0.512988,0.7767,0.785536,0.77661,0.775254
5,0.4629,0.459821,0.8051,0.812857,0.80489,0.80446
6,0.3969,0.444334,0.8072,0.820319,0.807249,0.807797
7,0.3371,0.412496,0.832,0.835137,0.832549,0.82959
8,0.2856,0.37704,0.8446,0.848863,0.844901,0.844672
9,0.2409,0.426287,0.8327,0.844498,0.832317,0.832983
10,0.2043,0.392436,0.8445,0.84891,0.844742,0.843017


[I 2025-03-25 19:43:23,041] Trial 42 finished with value: 0.8430166345134479 and parameters: {'learning_rate': 0.0009554051748895604, 'weight_decay': 0.0, 'warmup_steps': 32, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 43 with params: {'learning_rate': 0.0007014380022630181, 'weight_decay': 0.002, 'warmup_steps': 26, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1466,0.848267,0.5925,0.611167,0.591496,0.587701
2,0.7575,0.622416,0.7161,0.719737,0.715705,0.714051
3,0.601,0.557137,0.7539,0.756953,0.753081,0.752018
4,0.5067,0.46941,0.8063,0.808825,0.80621,0.806535
5,0.4322,0.417711,0.8272,0.832012,0.827234,0.827201
6,0.367,0.421404,0.8252,0.83601,0.825304,0.825755
7,0.3108,0.405846,0.8336,0.838175,0.834172,0.831309
8,0.2592,0.356724,0.8545,0.857572,0.854603,0.854486
9,0.2185,0.432186,0.8322,0.847752,0.831746,0.832766
10,0.1877,0.3677,0.8503,0.855779,0.850812,0.848842


[I 2025-03-25 19:55:58,177] Trial 43 finished with value: 0.8488416308816875 and parameters: {'learning_rate': 0.0007014380022630181, 'weight_decay': 0.002, 'warmup_steps': 26, 'lambda_param': 0.4, 'temperature': 5.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 44 with params: {'learning_rate': 0.0007975181390029798, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1169,0.835625,0.5918,0.609756,0.591525,0.588746
2,0.7585,0.646626,0.7018,0.707072,0.70176,0.700288
3,0.6124,0.564933,0.7587,0.7637,0.758043,0.756303
4,0.5192,0.472674,0.7981,0.803569,0.798136,0.796823
5,0.442,0.425823,0.823,0.829876,0.823025,0.823692
6,0.3778,0.424259,0.8215,0.833174,0.821381,0.822769
7,0.3199,0.406784,0.8341,0.837045,0.834731,0.831751
8,0.2674,0.375421,0.8498,0.853983,0.850104,0.849282
9,0.224,0.42665,0.8337,0.846617,0.833365,0.833848
10,0.1907,0.370883,0.8486,0.853161,0.84906,0.847199


[I 2025-03-25 20:08:32,294] Trial 44 finished with value: 0.8471986610912033 and parameters: {'learning_rate': 0.0007975181390029798, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 45 with params: {'learning_rate': 0.0008374131635821863, 'weight_decay': 0.001, 'warmup_steps': 18, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.153,0.854587,0.5878,0.596914,0.587788,0.580024
2,0.7619,0.639257,0.7079,0.712846,0.707998,0.705699
3,0.6097,0.602658,0.7332,0.737704,0.732223,0.728412


[I 2025-03-25 20:12:17,579] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0005588070486516951, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1244,0.85016,0.5952,0.612022,0.595692,0.589879
2,0.7279,0.611635,0.7221,0.72959,0.722075,0.721186
3,0.5704,0.550625,0.7635,0.770118,0.762913,0.76164
4,0.475,0.493158,0.8154,0.821888,0.81511,0.816162
5,0.4038,0.421886,0.8217,0.827955,0.821791,0.822353
6,0.3411,0.41709,0.8221,0.836419,0.821949,0.823996
7,0.2837,0.408456,0.8346,0.839722,0.835005,0.832165
8,0.2339,0.359117,0.856,0.858073,0.856258,0.855743
9,0.1972,0.413185,0.839,0.851561,0.838574,0.839559
10,0.1722,0.365332,0.8534,0.8587,0.853632,0.852926


[I 2025-03-25 20:24:53,562] Trial 46 finished with value: 0.8529263479060966 and parameters: {'learning_rate': 0.0005588070486516951, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 47 with params: {'learning_rate': 0.00039294592429744307, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1572,0.854938,0.5925,0.607687,0.592265,0.581521
2,0.7198,0.598244,0.7311,0.734217,0.731098,0.729174
3,0.5655,0.546369,0.7612,0.767843,0.760553,0.759956
4,0.4712,0.446275,0.8142,0.816293,0.814284,0.81291
5,0.3963,0.429966,0.8159,0.822828,0.815881,0.816351
6,0.329,0.420305,0.8201,0.83249,0.819802,0.821742


[I 2025-03-25 20:32:23,320] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0007332494229062027, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1264,0.860764,0.5936,0.609607,0.593079,0.584833
2,0.7467,0.635153,0.7109,0.72799,0.71026,0.710828
3,0.6008,0.582266,0.7487,0.754779,0.747843,0.745577
4,0.5048,0.47088,0.8064,0.811665,0.806193,0.805794
5,0.429,0.428149,0.822,0.828695,0.821635,0.822815
6,0.3675,0.412669,0.8282,0.839759,0.827812,0.829373
7,0.3096,0.42118,0.8312,0.838427,0.831978,0.828444
8,0.2583,0.371695,0.848,0.855018,0.848414,0.848036
9,0.2177,0.411102,0.8378,0.848303,0.837249,0.838105
10,0.1866,0.366761,0.8537,0.856993,0.854147,0.85252


[I 2025-03-25 20:45:57,321] Trial 48 finished with value: 0.852519706842279 and parameters: {'learning_rate': 0.0007332494229062027, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 49 with params: {'learning_rate': 0.001701446166450008, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2316,0.957207,0.5189,0.514887,0.518619,0.503575
2,0.889,0.748656,0.6447,0.641888,0.644245,0.633499
3,0.7156,0.64643,0.7007,0.702522,0.700112,0.694883
4,0.6049,0.537999,0.7673,0.769551,0.767144,0.765659
5,0.517,0.506047,0.7823,0.791302,0.782494,0.783057
6,0.4555,0.472199,0.7972,0.807986,0.79731,0.797408


[I 2025-03-25 20:55:22,672] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.00046790669842435505, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1357,0.833188,0.5955,0.603109,0.595638,0.586458
2,0.7252,0.61861,0.7189,0.726238,0.718872,0.719128
3,0.5723,0.537961,0.7683,0.770414,0.767806,0.766303
4,0.4825,0.462329,0.8025,0.806626,0.802464,0.801198
5,0.409,0.429976,0.8169,0.825093,0.816703,0.818263
6,0.3426,0.430336,0.8175,0.831248,0.817388,0.818519
7,0.2825,0.423218,0.8278,0.833607,0.828397,0.824835
8,0.2339,0.370846,0.8506,0.85383,0.850845,0.850247
9,0.1969,0.419543,0.8306,0.841963,0.830252,0.83028
10,0.1722,0.36896,0.8507,0.854765,0.851007,0.849696


[I 2025-03-25 21:10:29,373] Trial 50 finished with value: 0.8496964590628429 and parameters: {'learning_rate': 0.00046790669842435505, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.1, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 51 with params: {'learning_rate': 0.00046157708582858856, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1342,0.84557,0.5879,0.601607,0.587293,0.584247
2,0.7297,0.612018,0.7211,0.726612,0.720836,0.720026
3,0.5709,0.581439,0.7464,0.753898,0.745711,0.744249
4,0.4774,0.46213,0.8031,0.808301,0.803166,0.802896
5,0.3993,0.433959,0.8143,0.821696,0.814166,0.814998
6,0.3362,0.423507,0.8222,0.833839,0.821999,0.822897


[I 2025-03-25 21:18:02,801] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0003401586621745951, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1923,0.830447,0.5909,0.603507,0.591118,0.578541
2,0.7568,0.639146,0.7087,0.714628,0.708636,0.707257
3,0.5964,0.580069,0.7463,0.749227,0.746134,0.744161
4,0.4875,0.473266,0.8035,0.810017,0.80345,0.802691
5,0.409,0.416097,0.8234,0.828089,0.823712,0.823254
6,0.3349,0.429241,0.8191,0.83374,0.818972,0.820223
7,0.2726,0.431017,0.8248,0.829994,0.825342,0.821788
8,0.2252,0.380078,0.8488,0.854234,0.849111,0.849135
9,0.1921,0.448932,0.8237,0.840512,0.823473,0.824353
10,0.1705,0.385487,0.8445,0.847849,0.844712,0.843231


[I 2025-03-25 21:30:30,177] Trial 52 finished with value: 0.8432311396646284 and parameters: {'learning_rate': 0.0003401586621745951, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 53 with params: {'learning_rate': 0.00033575052416799064, 'weight_decay': 0.009000000000000001, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1633,0.847632,0.5847,0.610202,0.584011,0.57451
2,0.7376,0.599185,0.736,0.741462,0.735756,0.736125
3,0.5778,0.548017,0.7622,0.768036,0.761433,0.759962
4,0.4715,0.452627,0.8107,0.812452,0.810494,0.809446
5,0.3964,0.437163,0.8182,0.823813,0.818026,0.818191
6,0.3254,0.434143,0.8153,0.829365,0.815386,0.816119


[I 2025-03-25 21:38:00,300] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0015649717980820343, 'weight_decay': 0.005, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2347,1.032281,0.4697,0.482619,0.469674,0.438084
2,0.9004,0.787394,0.6211,0.619337,0.621005,0.613923
3,0.7483,0.689457,0.6801,0.677932,0.679345,0.674069
4,0.6387,0.564246,0.7464,0.755127,0.745813,0.746776
5,0.5471,0.50938,0.7787,0.788104,0.778179,0.779653
6,0.4733,0.475109,0.7998,0.814503,0.799789,0.801208


[I 2025-03-25 21:45:26,231] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 7.242888062473813e-05, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4447,1.120147,0.4203,0.434819,0.420225,0.402826
2,1.1189,0.944873,0.537,0.536054,0.536546,0.532277
3,0.9304,0.878364,0.5749,0.576333,0.573645,0.565792


[I 2025-03-25 21:49:09,284] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0003722351928527629, 'weight_decay': 0.0, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1857,0.857089,0.583,0.601872,0.58357,0.575736
2,0.7342,0.604681,0.7287,0.733084,0.72874,0.727294
3,0.5705,0.567461,0.7489,0.756721,0.748859,0.746717
4,0.4714,0.442517,0.8146,0.818087,0.814476,0.813162
5,0.3963,0.420415,0.8273,0.83411,0.827414,0.827631
6,0.3291,0.41475,0.8281,0.840255,0.827921,0.829438
7,0.2695,0.419062,0.8316,0.835615,0.831969,0.82875
8,0.2218,0.374853,0.8449,0.848976,0.845175,0.845327
9,0.1901,0.435651,0.8275,0.842577,0.827258,0.827936
10,0.1696,0.376716,0.8472,0.850307,0.847489,0.845918


[I 2025-03-25 22:01:32,906] Trial 56 finished with value: 0.8459183543210516 and parameters: {'learning_rate': 0.0003722351928527629, 'weight_decay': 0.0, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 57 with params: {'learning_rate': 0.0005648053089961236, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.13,0.852961,0.5866,0.617796,0.585912,0.576916
2,0.7237,0.61221,0.7218,0.731452,0.721236,0.722598
3,0.5752,0.545306,0.7668,0.770259,0.766452,0.764588
4,0.4798,0.445711,0.8124,0.817582,0.812104,0.812573
5,0.4054,0.413278,0.829,0.8333,0.82902,0.829626
6,0.3429,0.409638,0.8285,0.838396,0.828444,0.829213
7,0.2837,0.406668,0.8372,0.842258,0.837757,0.834646
8,0.234,0.359746,0.8554,0.858329,0.855617,0.85558
9,0.197,0.418157,0.8332,0.846906,0.832888,0.834642
10,0.1718,0.363197,0.856,0.858433,0.856429,0.854447


[I 2025-03-25 22:13:57,092] Trial 57 finished with value: 0.854447142879787 and parameters: {'learning_rate': 0.0005648053089961236, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 4.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 58 with params: {'learning_rate': 0.0008055995159722087, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1424,0.868463,0.5747,0.588624,0.574483,0.566412
2,0.773,0.630579,0.7143,0.721605,0.714096,0.712908
3,0.6058,0.571579,0.7563,0.763491,0.756108,0.754123
4,0.5123,0.466862,0.8035,0.805588,0.803258,0.802256
5,0.4409,0.43924,0.8145,0.819617,0.814621,0.81436
6,0.3773,0.433309,0.8189,0.835734,0.818658,0.820467


[I 2025-03-25 22:21:23,415] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.00036222385628615673, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1742,0.833354,0.6033,0.610535,0.603139,0.596771
2,0.7294,0.613462,0.72,0.730082,0.719794,0.720948
3,0.5683,0.553034,0.7586,0.765265,0.758306,0.756413
4,0.47,0.438881,0.8166,0.820425,0.816452,0.816475
5,0.3906,0.409831,0.8263,0.832113,0.826552,0.826599
6,0.3212,0.412938,0.8299,0.841813,0.829629,0.831711
7,0.2612,0.417849,0.8305,0.836243,0.831172,0.827586
8,0.2171,0.379548,0.8469,0.85316,0.847034,0.847326
9,0.1861,0.440662,0.8231,0.838938,0.822825,0.823214
10,0.1668,0.399705,0.8495,0.852123,0.849712,0.848481


[I 2025-03-25 22:33:53,133] Trial 59 finished with value: 0.8484809246343767 and parameters: {'learning_rate': 0.00036222385628615673, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 3.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 60 with params: {'learning_rate': 0.0008661046647060895, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1503,0.906948,0.5526,0.585728,0.552427,0.538826
2,0.7644,0.656089,0.7041,0.709858,0.704226,0.703505
3,0.6089,0.590357,0.7395,0.745045,0.738822,0.736543


[I 2025-03-25 22:37:36,007] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0004845904872054229, 'weight_decay': 0.008, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1272,0.824617,0.6055,0.614753,0.604825,0.603616
2,0.7292,0.609846,0.7202,0.727306,0.720128,0.717604
3,0.575,0.535116,0.7687,0.773106,0.768236,0.767527
4,0.4822,0.45705,0.8068,0.812206,0.807047,0.804888
5,0.4057,0.409379,0.8331,0.838996,0.83299,0.834082
6,0.3389,0.418739,0.8279,0.839288,0.827875,0.828896
7,0.2799,0.424469,0.8271,0.83474,0.82789,0.823939
8,0.2313,0.375231,0.845,0.851861,0.845299,0.845352
9,0.1951,0.434855,0.8306,0.844125,0.830151,0.831863
10,0.1723,0.366143,0.8518,0.854596,0.851885,0.850965


[I 2025-03-25 22:49:58,216] Trial 61 finished with value: 0.8509645384393909 and parameters: {'learning_rate': 0.0004845904872054229, 'weight_decay': 0.008, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 62 with params: {'learning_rate': 0.0004250018047081489, 'weight_decay': 0.007, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1421,0.884388,0.5846,0.60539,0.58385,0.577558
2,0.7235,0.609823,0.7233,0.72783,0.722865,0.719695
3,0.5595,0.556513,0.7616,0.767524,0.760774,0.759326
4,0.4633,0.462583,0.8105,0.814747,0.810249,0.809085
5,0.3873,0.403704,0.8318,0.835272,0.831871,0.831841
6,0.3199,0.417483,0.8246,0.836659,0.824398,0.825695
7,0.2637,0.411808,0.8336,0.838425,0.834313,0.831061
8,0.2185,0.381224,0.8457,0.85227,0.846139,0.845363
9,0.1875,0.424195,0.8365,0.846481,0.836384,0.836793
10,0.1664,0.370193,0.8532,0.856642,0.853567,0.852359


[I 2025-03-25 23:02:25,061] Trial 62 finished with value: 0.8523588980028727 and parameters: {'learning_rate': 0.0004250018047081489, 'weight_decay': 0.007, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 4.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 63 with params: {'learning_rate': 0.0003234268385824373, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2024,0.864367,0.5704,0.58929,0.570491,0.564022
2,0.756,0.636769,0.7091,0.716696,0.708605,0.708353
3,0.5909,0.570879,0.7543,0.75906,0.753843,0.751381
4,0.4816,0.457032,0.8073,0.811961,0.807322,0.807261
5,0.3985,0.42384,0.8233,0.828352,0.823302,0.824009
6,0.3281,0.429549,0.8183,0.829519,0.818339,0.819261
7,0.2663,0.415641,0.8317,0.836299,0.832365,0.829302
8,0.2209,0.39044,0.8397,0.845729,0.840066,0.839414
9,0.1895,0.449859,0.8208,0.83639,0.82037,0.821392
10,0.1699,0.389218,0.8387,0.842176,0.838985,0.837412


[I 2025-03-25 23:14:50,606] Trial 63 finished with value: 0.8374117456783647 and parameters: {'learning_rate': 0.0003234268385824373, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 4.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 64 with params: {'learning_rate': 0.0009368198099326022, 'weight_decay': 0.007, 'warmup_steps': 27, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1604,0.894199,0.5836,0.585092,0.583162,0.572542
2,0.7606,0.638409,0.7129,0.721796,0.712709,0.709656
3,0.6064,0.568268,0.7537,0.757513,0.753157,0.751454
4,0.5165,0.479758,0.7943,0.800151,0.794236,0.79354
5,0.4486,0.451119,0.8054,0.81691,0.80528,0.806188
6,0.3875,0.439967,0.8157,0.830436,0.815397,0.816793


[I 2025-03-25 23:22:14,286] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.001214359173122457, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1841,0.911964,0.553,0.566519,0.552669,0.540588
2,0.8249,0.691255,0.6753,0.683832,0.675547,0.670858
3,0.6527,0.578371,0.7466,0.746944,0.746262,0.744749
4,0.5574,0.502917,0.7846,0.79746,0.784003,0.786082
5,0.4789,0.462093,0.8032,0.814291,0.803192,0.804486
6,0.4165,0.452211,0.8086,0.820173,0.808825,0.80889
7,0.3582,0.444172,0.8143,0.821368,0.81512,0.810748
8,0.3071,0.380803,0.8452,0.848101,0.845389,0.84466
9,0.2623,0.43182,0.8325,0.843683,0.832117,0.832683
10,0.2245,0.383281,0.8426,0.847943,0.843257,0.841136


[I 2025-03-25 23:34:38,892] Trial 65 finished with value: 0.8411355623485122 and parameters: {'learning_rate': 0.001214359173122457, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 5.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 66 with params: {'learning_rate': 0.0003745288491711714, 'weight_decay': 0.008, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1615,0.843809,0.5897,0.610284,0.589682,0.581187
2,0.728,0.592037,0.7363,0.743594,0.735668,0.73644
3,0.5626,0.569553,0.7537,0.760028,0.753216,0.750464
4,0.4641,0.448633,0.8134,0.819559,0.813359,0.81302
5,0.3875,0.404251,0.834,0.836649,0.833956,0.833979
6,0.3192,0.416051,0.8262,0.836155,0.826102,0.827092
7,0.2615,0.428962,0.8253,0.830632,0.82606,0.821259
8,0.2162,0.374568,0.8471,0.851221,0.847252,0.846894
9,0.1854,0.421656,0.8309,0.842531,0.83059,0.831953
10,0.1666,0.375867,0.848,0.850593,0.848228,0.846614


[I 2025-03-25 23:47:01,252] Trial 66 finished with value: 0.8466141741441264 and parameters: {'learning_rate': 0.0003745288491711714, 'weight_decay': 0.008, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 67 with params: {'learning_rate': 0.000780319876521078, 'weight_decay': 0.007, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1199,0.826684,0.6032,0.621216,0.602602,0.599251
2,0.7275,0.611271,0.7204,0.727266,0.720101,0.717967
3,0.5838,0.556014,0.7587,0.762245,0.757788,0.755812
4,0.4979,0.470169,0.8059,0.813763,0.805689,0.805876
5,0.4282,0.434911,0.8196,0.824313,0.819534,0.819665
6,0.3669,0.427632,0.8225,0.836048,0.822406,0.823597
7,0.3076,0.407566,0.8357,0.840543,0.836296,0.833301
8,0.2591,0.369006,0.8509,0.855696,0.851042,0.850541
9,0.2188,0.429365,0.8294,0.843937,0.828863,0.829883
10,0.1879,0.371843,0.8519,0.856535,0.852327,0.850585


[I 2025-03-25 23:59:29,709] Trial 67 finished with value: 0.8505850283377626 and parameters: {'learning_rate': 0.000780319876521078, 'weight_decay': 0.007, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 68 with params: {'learning_rate': 0.00012524514416440818, 'weight_decay': 0.007, 'warmup_steps': 25, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3528,1.044974,0.4698,0.487629,0.470072,0.448558
2,0.9204,0.765562,0.6367,0.636822,0.636534,0.634929
3,0.7301,0.69496,0.6842,0.688713,0.683757,0.680034
4,0.606,0.560851,0.7511,0.749859,0.750909,0.748184
5,0.5063,0.530337,0.7725,0.773613,0.772637,0.771373
6,0.4234,0.512525,0.7764,0.788484,0.776047,0.77719


[I 2025-03-26 00:06:58,876] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 7.808255793137976e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 21, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4307,1.102405,0.4349,0.440243,0.435206,0.418669
2,1.0787,0.897689,0.5578,0.564127,0.55678,0.554522
3,0.8799,0.839488,0.599,0.598959,0.597979,0.589483


[I 2025-03-26 00:10:43,931] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0006402460206682689, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1197,0.871036,0.575,0.594483,0.575082,0.5599
2,0.7545,0.628397,0.7139,0.71812,0.713245,0.712063
3,0.5927,0.566585,0.7564,0.762193,0.755735,0.754034
4,0.4964,0.45218,0.8071,0.812259,0.806831,0.806112
5,0.4183,0.435279,0.8184,0.822887,0.81833,0.818172
6,0.3562,0.426302,0.8278,0.836628,0.827668,0.828309
7,0.2965,0.4102,0.8342,0.838628,0.834887,0.831645
8,0.247,0.366429,0.8528,0.858178,0.852992,0.852805
9,0.2077,0.412372,0.8404,0.852932,0.840041,0.840463
10,0.1792,0.366972,0.8523,0.854594,0.852815,0.850686


[I 2025-03-26 00:23:10,008] Trial 70 finished with value: 0.8506860066074591 and parameters: {'learning_rate': 0.0006402460206682689, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 71 with params: {'learning_rate': 0.0006107982886895342, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1622,0.84251,0.5907,0.593622,0.590181,0.585139
2,0.753,0.648252,0.7063,0.711795,0.70594,0.702879
3,0.5962,0.571502,0.7507,0.753667,0.750237,0.74777
4,0.4996,0.470294,0.8015,0.805114,0.801368,0.799291
5,0.4281,0.439679,0.8198,0.826313,0.819768,0.820159
6,0.3641,0.417193,0.8291,0.838189,0.828823,0.829335
7,0.304,0.418733,0.8324,0.836777,0.832916,0.829574
8,0.2538,0.367615,0.8515,0.855555,0.851766,0.851645
9,0.2121,0.424993,0.8356,0.849139,0.835249,0.836309
10,0.1834,0.373526,0.8542,0.856096,0.854524,0.852979


[I 2025-03-26 00:35:42,168] Trial 71 finished with value: 0.8529790769935403 and parameters: {'learning_rate': 0.0006107982886895342, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.0, 'temperature': 4.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 72 with params: {'learning_rate': 0.0003746797880209365, 'weight_decay': 0.008, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1884,0.812276,0.6067,0.622011,0.606052,0.603743
2,0.7236,0.595625,0.7321,0.735569,0.732098,0.730827
3,0.5622,0.532944,0.7733,0.781074,0.77282,0.77266
4,0.4672,0.452084,0.8104,0.814403,0.810118,0.809283
5,0.3881,0.413382,0.83,0.835173,0.829942,0.830189
6,0.319,0.416031,0.8282,0.839866,0.827988,0.829082
7,0.2632,0.415906,0.8354,0.83974,0.835936,0.832934
8,0.2185,0.370145,0.8519,0.856754,0.851952,0.851943
9,0.1873,0.438187,0.8268,0.842542,0.826551,0.827533
10,0.1677,0.368718,0.8503,0.852562,0.850486,0.848679


[I 2025-03-26 00:48:02,180] Trial 72 finished with value: 0.8486786536625065 and parameters: {'learning_rate': 0.0003746797880209365, 'weight_decay': 0.008, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 73 with params: {'learning_rate': 0.0010622327747034335, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1728,0.929442,0.5299,0.560005,0.530141,0.52111
2,0.8028,0.671318,0.6847,0.686468,0.685038,0.680002
3,0.6418,0.625409,0.724,0.73031,0.723042,0.718659
4,0.5449,0.494275,0.7854,0.787754,0.785313,0.78362
5,0.4732,0.44955,0.8127,0.818661,0.812807,0.812862
6,0.4098,0.443029,0.8124,0.822195,0.812393,0.812804
7,0.3505,0.430507,0.8235,0.828058,0.823982,0.82081
8,0.2991,0.38334,0.845,0.847402,0.845256,0.844135
9,0.2542,0.444439,0.8185,0.835397,0.817883,0.81903
10,0.2162,0.422521,0.8398,0.844696,0.84034,0.837792


[I 2025-03-26 01:00:25,977] Trial 73 finished with value: 0.8377923628713484 and parameters: {'learning_rate': 0.0010622327747034335, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 74 with params: {'learning_rate': 0.0004588730983384415, 'weight_decay': 0.009000000000000001, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1675,0.865472,0.5747,0.593466,0.574549,0.566332
2,0.7503,0.648431,0.7082,0.715952,0.708925,0.707768
3,0.5926,0.584221,0.7436,0.749108,0.742835,0.739822


[I 2025-03-26 01:04:08,977] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0010625117542200314, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1683,0.875661,0.5724,0.590092,0.572104,0.555992
2,0.7848,0.675224,0.686,0.687151,0.685662,0.68229
3,0.63,0.609408,0.7307,0.73687,0.730196,0.727213
4,0.5388,0.488602,0.7858,0.790263,0.785637,0.784429
5,0.4642,0.440304,0.8165,0.819357,0.816698,0.816237
6,0.4022,0.436232,0.8162,0.823999,0.816057,0.816719
7,0.3453,0.423406,0.8266,0.830684,0.827259,0.823553
8,0.2934,0.383819,0.8414,0.846695,0.84184,0.840848
9,0.2454,0.413067,0.8345,0.84449,0.834026,0.834703
10,0.2102,0.373692,0.8499,0.851109,0.850313,0.848377


[I 2025-03-26 01:16:41,140] Trial 75 finished with value: 0.8483768376385218 and parameters: {'learning_rate': 0.0010625117542200314, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 76 with params: {'learning_rate': 0.0008794603477247717, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1183,0.864367,0.5771,0.596355,0.576249,0.57293
2,0.7598,0.649868,0.7042,0.711639,0.704513,0.698856
3,0.612,0.595611,0.7403,0.743924,0.739599,0.736958


[I 2025-03-26 01:20:24,848] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0011323605637202055, 'weight_decay': 0.0, 'warmup_steps': 20, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1658,0.87956,0.5659,0.591381,0.566101,0.557868
2,0.8145,0.679953,0.6794,0.678764,0.679264,0.675494
3,0.6593,0.628471,0.7196,0.721858,0.719221,0.716216
4,0.558,0.503114,0.7796,0.784987,0.779226,0.778746
5,0.4808,0.475465,0.7915,0.802855,0.791564,0.792222
6,0.415,0.45297,0.8124,0.824634,0.812437,0.812801
7,0.3595,0.433066,0.8255,0.828319,0.826048,0.822437
8,0.305,0.385067,0.8403,0.845363,0.840533,0.840466
9,0.2602,0.425471,0.8252,0.838894,0.824773,0.825078
10,0.2201,0.390609,0.8425,0.847109,0.842996,0.84117


[I 2025-03-26 01:32:54,861] Trial 77 finished with value: 0.8411703161842043 and parameters: {'learning_rate': 0.0011323605637202055, 'weight_decay': 0.0, 'warmup_steps': 20, 'lambda_param': 0.9, 'temperature': 4.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 78 with params: {'learning_rate': 0.000845271193799776, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1393,0.849515,0.5971,0.600208,0.597368,0.588885
2,0.748,0.624352,0.7159,0.721421,0.716014,0.713999
3,0.5925,0.551803,0.7629,0.76804,0.76194,0.76058
4,0.5004,0.454863,0.8104,0.815746,0.810341,0.810156
5,0.4331,0.434387,0.8207,0.827526,0.820486,0.820976
6,0.3699,0.423452,0.8255,0.838045,0.82526,0.826486
7,0.3142,0.404515,0.8382,0.842633,0.838865,0.834971
8,0.2633,0.359039,0.8575,0.860152,0.857818,0.857305
9,0.2221,0.429621,0.8333,0.849014,0.832974,0.833872
10,0.1901,0.357557,0.8594,0.863304,0.859792,0.858105


[I 2025-03-26 01:45:16,876] Trial 78 finished with value: 0.8581054561667635 and parameters: {'learning_rate': 0.000845271193799776, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 3.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 79 with params: {'learning_rate': 0.00040533826879451425, 'weight_decay': 0.007, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1343,0.831902,0.6029,0.623514,0.602686,0.59734
2,0.7289,0.614191,0.7226,0.725904,0.723049,0.720642
3,0.5663,0.569538,0.7529,0.759747,0.751897,0.749651
4,0.4723,0.453052,0.8078,0.811726,0.80785,0.80669
5,0.3944,0.415088,0.8262,0.829831,0.826389,0.826719
6,0.3267,0.429608,0.8185,0.834433,0.818512,0.820392


[I 2025-03-26 01:52:43,752] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0023343742858331967, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2763,1.056981,0.456,0.477406,0.456705,0.435854
2,0.9506,0.81009,0.6143,0.608566,0.614,0.608992
3,0.7798,0.771608,0.6401,0.647031,0.639232,0.633616
4,0.6681,0.712789,0.7315,0.748206,0.730864,0.733512
5,0.5811,0.562535,0.7514,0.760926,0.751102,0.750805
6,0.5103,0.510157,0.7841,0.796222,0.783954,0.785327


[I 2025-03-26 02:00:13,298] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.0002135956197175552, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2617,0.944129,0.5161,0.533276,0.517019,0.496439
2,0.8124,0.680903,0.6862,0.69144,0.685578,0.6849
3,0.6359,0.599204,0.7394,0.743566,0.738801,0.736519


[I 2025-03-26 02:03:58,509] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.001089246525430585, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1406,0.851096,0.584,0.593986,0.583729,0.57675
2,0.7925,0.657978,0.6972,0.696633,0.697623,0.691355
3,0.6367,0.58614,0.744,0.749898,0.743091,0.741421
4,0.5398,0.478805,0.7969,0.800293,0.79671,0.795722
5,0.4636,0.455234,0.8045,0.814461,0.804555,0.805709
6,0.3995,0.465143,0.8065,0.820893,0.806733,0.806959


[I 2025-03-26 02:11:27,959] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0016155693728514412, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2022,0.97714,0.5027,0.497163,0.50252,0.473598
2,0.8603,0.751941,0.6419,0.644865,0.641821,0.637486
3,0.7079,0.684742,0.693,0.700585,0.69173,0.687527


[I 2025-03-26 02:15:11,824] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.0006178583487547131, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1145,0.80893,0.6064,0.620434,0.605928,0.597285
2,0.7397,0.620457,0.7176,0.724959,0.71729,0.715216
3,0.5804,0.547046,0.7655,0.771733,0.764771,0.76362
4,0.488,0.452087,0.8149,0.818291,0.81479,0.8134
5,0.4139,0.422133,0.8261,0.831547,0.826144,0.826371
6,0.3508,0.424587,0.8205,0.837049,0.820544,0.821191
7,0.2948,0.413184,0.835,0.840422,0.835389,0.832789
8,0.2457,0.359861,0.8555,0.861366,0.855658,0.855667
9,0.2065,0.436883,0.8268,0.844637,0.826289,0.827009
10,0.1787,0.361066,0.8574,0.859693,0.857784,0.856338


[I 2025-03-26 02:27:38,404] Trial 84 finished with value: 0.8563378487895836 and parameters: {'learning_rate': 0.0006178583487547131, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 85 with params: {'learning_rate': 0.0010953832680257783, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1229,0.8777,0.5687,0.590745,0.568677,0.562729
2,0.7711,0.64166,0.7131,0.712635,0.713325,0.709388
3,0.6227,0.569942,0.7497,0.752163,0.748646,0.746769
4,0.533,0.488802,0.7854,0.790729,0.784633,0.784416
5,0.465,0.451989,0.8074,0.815736,0.807507,0.807962
6,0.404,0.436797,0.8171,0.82649,0.816687,0.817517


[I 2025-03-26 02:35:02,211] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.000383322914589237, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1337,0.867861,0.5907,0.607061,0.590706,0.586283
2,0.7466,0.623897,0.7151,0.716836,0.715066,0.712953
3,0.5887,0.552245,0.7625,0.765453,0.761789,0.759169
4,0.4857,0.45563,0.808,0.813025,0.807788,0.807188
5,0.4029,0.436289,0.8208,0.82776,0.820552,0.821856
6,0.3336,0.424035,0.8222,0.837556,0.821816,0.823464
7,0.2721,0.427478,0.8265,0.831705,0.827136,0.823453
8,0.2237,0.387107,0.8413,0.848972,0.841698,0.841645
9,0.1895,0.476669,0.8141,0.835613,0.81382,0.816068
10,0.1692,0.375584,0.8479,0.851006,0.848137,0.846769


[I 2025-03-26 02:47:31,938] Trial 86 finished with value: 0.8467694565439599 and parameters: {'learning_rate': 0.000383322914589237, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 87 with params: {'learning_rate': 0.0015298199467251657, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1988,0.925464,0.5337,0.54455,0.533286,0.520585
2,0.8539,0.731408,0.6527,0.65306,0.653063,0.647936
3,0.689,0.636457,0.7139,0.719912,0.712984,0.710261


[I 2025-03-26 02:51:15,408] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0007191357337205345, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1123,0.844034,0.5888,0.610462,0.587919,0.577798
2,0.751,0.624641,0.7179,0.721889,0.717517,0.714033
3,0.5982,0.556553,0.7631,0.768503,0.762286,0.761266
4,0.5037,0.4756,0.7953,0.8007,0.794953,0.794232
5,0.4302,0.429563,0.8177,0.8252,0.817428,0.81783
6,0.3691,0.412615,0.829,0.841858,0.828999,0.8302
7,0.3084,0.394826,0.8385,0.842968,0.839039,0.836217
8,0.2599,0.355388,0.8589,0.861854,0.859251,0.858739
9,0.2184,0.400979,0.8409,0.853377,0.840458,0.841291
10,0.1872,0.348831,0.8616,0.864111,0.861916,0.860667


[I 2025-03-26 03:03:42,127] Trial 88 finished with value: 0.8606672567313044 and parameters: {'learning_rate': 0.0007191357337205345, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 89 with params: {'learning_rate': 0.0005900236525107107, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1384,0.863224,0.5823,0.601333,0.582009,0.575626
2,0.7471,0.623364,0.7146,0.718603,0.714429,0.712075
3,0.5927,0.565043,0.7577,0.762433,0.757224,0.755895
4,0.4978,0.447156,0.8095,0.816045,0.809252,0.809607
5,0.4205,0.431244,0.821,0.826787,0.821076,0.821312
6,0.3569,0.420798,0.8223,0.837341,0.822247,0.82332
7,0.2994,0.405851,0.8392,0.842962,0.839755,0.836889
8,0.2473,0.367613,0.8514,0.858264,0.85159,0.851649
9,0.2084,0.409452,0.8368,0.848298,0.836406,0.836954
10,0.18,0.359082,0.8562,0.858394,0.856582,0.85509


[I 2025-03-26 03:16:08,667] Trial 89 finished with value: 0.8550895550440284 and parameters: {'learning_rate': 0.0005900236525107107, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 6.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 90 with params: {'learning_rate': 0.0003638570292075322, 'weight_decay': 0.008, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.176,0.839193,0.5942,0.595219,0.594114,0.586724
2,0.7468,0.614861,0.7227,0.729602,0.722984,0.723641
3,0.5814,0.544077,0.7674,0.769189,0.766795,0.764819
4,0.478,0.449451,0.8125,0.817689,0.812537,0.811396
5,0.3978,0.411615,0.8268,0.828508,0.827032,0.826615
6,0.3292,0.417292,0.8271,0.837094,0.826664,0.828184
7,0.2685,0.422281,0.8275,0.831281,0.82808,0.824578
8,0.2225,0.382877,0.8456,0.852148,0.845591,0.846561
9,0.1913,0.436235,0.8288,0.841688,0.828442,0.829009
10,0.17,0.390657,0.8441,0.848589,0.844365,0.843117


[I 2025-03-26 03:28:36,750] Trial 90 finished with value: 0.8431169032610153 and parameters: {'learning_rate': 0.0003638570292075322, 'weight_decay': 0.008, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 91 with params: {'learning_rate': 0.0005553109415416178, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1226,0.835386,0.6001,0.604851,0.599626,0.591342
2,0.7333,0.626032,0.7124,0.716818,0.712133,0.710488
3,0.577,0.550207,0.762,0.765867,0.761318,0.759731
4,0.4842,0.447614,0.8164,0.818664,0.81628,0.815991
5,0.412,0.415807,0.8266,0.831287,0.826395,0.827378
6,0.3454,0.41754,0.8245,0.835181,0.824233,0.825329
7,0.2862,0.41445,0.831,0.835949,0.831629,0.828226
8,0.2376,0.367524,0.8506,0.85359,0.850991,0.85039
9,0.2009,0.428187,0.8307,0.843364,0.830414,0.831128
10,0.1747,0.371047,0.8514,0.852923,0.851761,0.849942


[I 2025-03-26 03:41:07,115] Trial 91 finished with value: 0.8499424986474959 and parameters: {'learning_rate': 0.0005553109415416178, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 92 with params: {'learning_rate': 0.0004721988057562174, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.135,0.840409,0.5905,0.615911,0.58995,0.582635
2,0.7339,0.60995,0.7238,0.724854,0.723659,0.720316
3,0.5807,0.547611,0.7639,0.774322,0.763445,0.762744
4,0.4824,0.460542,0.8041,0.808652,0.803942,0.802168
5,0.4072,0.428633,0.8209,0.826418,0.820587,0.821389
6,0.3382,0.430585,0.8174,0.836603,0.817104,0.818887
7,0.2785,0.412635,0.8355,0.842488,0.836243,0.833529
8,0.2307,0.370249,0.8508,0.856671,0.85111,0.851352
9,0.1949,0.424205,0.8317,0.843667,0.831563,0.832072
10,0.1714,0.36673,0.8486,0.851633,0.848913,0.847619


[I 2025-03-26 03:53:31,742] Trial 92 finished with value: 0.8476189423648636 and parameters: {'learning_rate': 0.0004721988057562174, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.5, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 93 with params: {'learning_rate': 0.00035835455524527393, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.148,0.834052,0.5964,0.60701,0.596804,0.587772
2,0.7388,0.613831,0.721,0.722759,0.720907,0.718302
3,0.5706,0.552156,0.7638,0.767535,0.763462,0.761176
4,0.4723,0.467132,0.8152,0.819104,0.815269,0.814492
5,0.3943,0.419949,0.8268,0.832545,0.826756,0.827636
6,0.3241,0.416722,0.8243,0.834432,0.82393,0.825212
7,0.264,0.422945,0.8322,0.838914,0.832914,0.829632
8,0.2191,0.378403,0.8473,0.854467,0.847713,0.847247
9,0.1876,0.437642,0.8265,0.841347,0.826307,0.826175
10,0.1672,0.372269,0.8507,0.85279,0.850925,0.8496


[I 2025-03-26 04:05:57,002] Trial 93 finished with value: 0.8495996550765819 and parameters: {'learning_rate': 0.00035835455524527393, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 6.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 94 with params: {'learning_rate': 0.0007439381202605794, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1117,0.895075,0.5603,0.580214,0.560643,0.548888
2,0.7568,0.635775,0.7049,0.707687,0.704438,0.702244
3,0.6013,0.573689,0.7511,0.756013,0.75032,0.747546
4,0.5039,0.478842,0.7987,0.805631,0.798481,0.796819
5,0.4301,0.432018,0.8197,0.82697,0.819282,0.820694
6,0.3655,0.413911,0.8309,0.842704,0.830704,0.832026
7,0.3071,0.411343,0.8334,0.838005,0.833914,0.831297
8,0.2567,0.365176,0.8535,0.856867,0.853733,0.853099
9,0.2155,0.421168,0.8352,0.850299,0.834742,0.836017
10,0.1861,0.35846,0.856,0.856852,0.85636,0.854887


[I 2025-03-26 04:18:17,920] Trial 94 finished with value: 0.8548873903376014 and parameters: {'learning_rate': 0.0007439381202605794, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 0.4, 'temperature': 6.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 95 with params: {'learning_rate': 0.000848195707408049, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.135,0.846796,0.5853,0.600502,0.585129,0.578343
2,0.7629,0.652139,0.7071,0.709337,0.706874,0.705683
3,0.6136,0.557963,0.7587,0.759718,0.757964,0.756515
4,0.5159,0.473179,0.7979,0.799207,0.797566,0.796348
5,0.4439,0.435104,0.8183,0.822874,0.81847,0.818138
6,0.3814,0.446152,0.813,0.826147,0.8129,0.813678


[I 2025-03-26 04:25:40,258] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.0019337764214992198, 'weight_decay': 0.008, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2334,0.98656,0.4805,0.5008,0.480614,0.451276
2,0.9228,0.77491,0.6274,0.629193,0.627177,0.621661
3,0.7458,0.699158,0.6837,0.68644,0.683156,0.678222
4,0.6341,0.545863,0.7617,0.766737,0.761279,0.761503
5,0.5471,0.526518,0.7742,0.78549,0.774093,0.774668
6,0.4752,0.491225,0.7909,0.801937,0.79077,0.791362


[I 2025-03-26 04:33:05,726] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0011113500309594335, 'weight_decay': 0.008, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1445,0.87712,0.5682,0.59226,0.568296,0.562475
2,0.7938,0.64369,0.7031,0.704326,0.703226,0.700985
3,0.6455,0.594895,0.7347,0.738579,0.734306,0.730777


[I 2025-03-26 04:36:46,958] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0006839321132490469, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1286,0.860769,0.5793,0.607676,0.579642,0.565755
2,0.7453,0.655929,0.7026,0.711165,0.702662,0.700984
3,0.5979,0.577037,0.7497,0.754489,0.749183,0.74655
4,0.5036,0.469366,0.7972,0.806284,0.797136,0.796885
5,0.4255,0.439159,0.8183,0.824679,0.817985,0.818661
6,0.3626,0.438726,0.8149,0.830257,0.814671,0.816062


[I 2025-03-26 04:44:12,943] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.00034127577375611115, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1489,0.860471,0.5957,0.605049,0.595605,0.588075
2,0.7225,0.600204,0.7319,0.733231,0.732164,0.727174
3,0.5641,0.554406,0.7645,0.769365,0.763741,0.762271
4,0.4656,0.454875,0.8076,0.8116,0.807628,0.806415
5,0.3884,0.440512,0.8124,0.816846,0.81238,0.812068
6,0.3193,0.433244,0.8193,0.832071,0.819174,0.820173
7,0.2611,0.43343,0.8232,0.828499,0.823797,0.819758
8,0.2147,0.38702,0.839,0.844264,0.839239,0.839524
9,0.1861,0.453477,0.8213,0.83671,0.820766,0.822072
10,0.1667,0.388217,0.8451,0.847109,0.845307,0.843536


[I 2025-03-26 04:56:37,517] Trial 99 finished with value: 0.8435362128882092 and parameters: {'learning_rate': 0.00034127577375611115, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.2, 'temperature': 6.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 100 with params: {'learning_rate': 0.00026885910198952694, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2272,0.881517,0.5686,0.583674,0.56845,0.560529
2,0.7712,0.649114,0.702,0.705493,0.701922,0.700838
3,0.6054,0.574975,0.7494,0.748353,0.748974,0.745824
4,0.4903,0.463188,0.8022,0.805943,0.802043,0.800998
5,0.4044,0.446169,0.8079,0.816442,0.807957,0.809045
6,0.3329,0.43892,0.8181,0.831217,0.817939,0.819585
7,0.2674,0.430487,0.8217,0.825064,0.822443,0.818752
8,0.221,0.398211,0.835,0.838972,0.835322,0.834975
9,0.1911,0.463346,0.8121,0.826516,0.812005,0.811742
10,0.1712,0.410464,0.8276,0.832049,0.828121,0.825672


[I 2025-03-26 05:08:56,470] Trial 100 finished with value: 0.825671746532959 and parameters: {'learning_rate': 0.00026885910198952694, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 101 with params: {'learning_rate': 0.0012297775259970582, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1474,0.907496,0.5534,0.564823,0.553079,0.541262
2,0.8003,0.673406,0.6866,0.685742,0.686493,0.68191
3,0.6506,0.592982,0.7361,0.739036,0.735097,0.731441


[I 2025-03-26 05:12:39,445] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0006538323945519396, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1187,0.830109,0.6056,0.613304,0.605213,0.597716
2,0.7504,0.621419,0.7122,0.712399,0.712131,0.709147
3,0.6027,0.565144,0.7556,0.761082,0.754614,0.753289
4,0.5048,0.508138,0.7977,0.808505,0.797648,0.797517
5,0.4308,0.43736,0.8158,0.822116,0.815671,0.815745
6,0.3659,0.42236,0.823,0.838903,0.822601,0.824958
7,0.3058,0.403849,0.8375,0.84152,0.83816,0.835233
8,0.2543,0.366082,0.8517,0.858718,0.852045,0.851945
9,0.2116,0.403246,0.8403,0.851859,0.839902,0.841283
10,0.1824,0.362029,0.8574,0.860572,0.857754,0.856198


[I 2025-03-26 05:25:05,530] Trial 102 finished with value: 0.8561975174732035 and parameters: {'learning_rate': 0.0006538323945519396, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 103 with params: {'learning_rate': 0.0009080550091377366, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1046,0.865555,0.5793,0.597963,0.57942,0.56601
2,0.763,0.643394,0.7039,0.707543,0.703503,0.700498
3,0.617,0.601653,0.7345,0.745455,0.733474,0.732292


[I 2025-03-26 05:28:49,456] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.0007195399675094845, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1093,0.869156,0.5706,0.609694,0.570929,0.561088
2,0.7553,0.639679,0.7108,0.715215,0.710786,0.708455
3,0.603,0.561383,0.7533,0.754358,0.752751,0.749867
4,0.5041,0.476995,0.798,0.805052,0.797938,0.797381
5,0.4321,0.433981,0.8181,0.82772,0.817953,0.81906
6,0.3658,0.420298,0.8221,0.835016,0.821991,0.82366
7,0.3075,0.424984,0.8284,0.834169,0.829127,0.824987
8,0.2574,0.36281,0.8533,0.856321,0.853451,0.85322
9,0.2152,0.408747,0.8373,0.847822,0.837086,0.83766
10,0.1843,0.367929,0.853,0.856094,0.853419,0.851522


[I 2025-03-26 05:41:08,468] Trial 104 finished with value: 0.8515218767352074 and parameters: {'learning_rate': 0.0007195399675094845, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 105 with params: {'learning_rate': 0.0004883962208131707, 'weight_decay': 0.01, 'warmup_steps': 6, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1464,0.822557,0.6111,0.621996,0.609917,0.60545
2,0.7243,0.606025,0.728,0.736244,0.728203,0.727706
3,0.5629,0.566159,0.7558,0.76446,0.754864,0.753005
4,0.4705,0.448575,0.812,0.81706,0.812006,0.81037
5,0.3965,0.42002,0.8268,0.831158,0.826978,0.825969
6,0.3311,0.419007,0.8252,0.835227,0.825118,0.825377
7,0.2741,0.407148,0.8335,0.838465,0.833969,0.831188
8,0.2277,0.361044,0.8549,0.858572,0.855121,0.855285
9,0.1927,0.433173,0.8325,0.846328,0.832159,0.832729
10,0.1705,0.3643,0.8547,0.85673,0.855073,0.853254


[I 2025-03-26 05:53:39,479] Trial 105 finished with value: 0.8532544080461386 and parameters: {'learning_rate': 0.0004883962208131707, 'weight_decay': 0.01, 'warmup_steps': 6, 'lambda_param': 0.4, 'temperature': 5.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 106 with params: {'learning_rate': 0.0008144517630719949, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1093,0.881666,0.5774,0.604997,0.576837,0.574587
2,0.7598,0.636796,0.7075,0.71425,0.707536,0.705644
3,0.6096,0.594355,0.7401,0.744044,0.739419,0.737017
4,0.5145,0.46581,0.7986,0.802852,0.798124,0.798007
5,0.4387,0.437079,0.814,0.820576,0.813871,0.814537
6,0.3761,0.442031,0.81,0.828431,0.809645,0.811362


[I 2025-03-26 06:01:16,743] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.00032116375014000037, 'weight_decay': 0.01, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1614,0.8671,0.5801,0.594329,0.58052,0.572286
2,0.7462,0.617949,0.7161,0.725937,0.715733,0.718451
3,0.589,0.539659,0.7652,0.769336,0.76483,0.763928
4,0.4834,0.448542,0.8083,0.80773,0.808188,0.80701
5,0.4055,0.430444,0.8178,0.823407,0.817656,0.818864
6,0.3339,0.451976,0.8123,0.829535,0.812096,0.813497


[I 2025-03-26 06:08:43,154] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0005326122719113937, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.116,0.80777,0.6087,0.620751,0.608499,0.603024
2,0.7357,0.616033,0.722,0.730459,0.721872,0.721896
3,0.5827,0.563676,0.7566,0.763597,0.755833,0.754895
4,0.4829,0.45154,0.8085,0.813332,0.808374,0.807222
5,0.4094,0.425678,0.8247,0.828867,0.824632,0.825126
6,0.3422,0.424488,0.8184,0.832955,0.818266,0.819266


[I 2025-03-26 06:16:11,560] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.00039571163346581694, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1374,0.84788,0.5975,0.623755,0.597502,0.592924
2,0.7272,0.621132,0.7241,0.733281,0.72412,0.724998
3,0.5655,0.593731,0.742,0.753756,0.741966,0.740116
4,0.4671,0.449183,0.8148,0.817444,0.814477,0.81275
5,0.3895,0.402992,0.8336,0.835788,0.833604,0.833634
6,0.3228,0.414495,0.8253,0.838247,0.825079,0.826751
7,0.2653,0.438613,0.8148,0.823945,0.815671,0.811851
8,0.2187,0.366404,0.8478,0.853577,0.848066,0.848407
9,0.1877,0.426882,0.8312,0.843152,0.830889,0.831724
10,0.1676,0.379113,0.8398,0.842603,0.840085,0.8382


[I 2025-03-26 06:28:38,468] Trial 109 finished with value: 0.8381998955283579 and parameters: {'learning_rate': 0.00039571163346581694, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 110 with params: {'learning_rate': 0.003268025584603064, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2726,1.04694,0.4565,0.462482,0.456984,0.431387
2,0.9404,0.811948,0.6203,0.618815,0.620249,0.61378
3,0.7566,0.68167,0.6893,0.695617,0.688325,0.686093
4,0.6445,0.568276,0.7437,0.747259,0.743487,0.743365
5,0.5605,0.546237,0.7601,0.774429,0.760279,0.760049
6,0.4957,0.499528,0.7855,0.796536,0.785361,0.786133


[I 2025-03-26 06:36:06,002] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0007680432787446194, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1283,0.863259,0.5886,0.602617,0.587723,0.584059
2,0.7462,0.621503,0.7194,0.720055,0.719266,0.715741
3,0.5981,0.582042,0.7423,0.748166,0.741624,0.738853


[I 2025-03-26 06:39:51,397] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 8.37052256414999e-05, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4167,1.089135,0.4408,0.456244,0.440759,0.421536
2,1.0685,0.897248,0.5583,0.563205,0.558133,0.549759
3,0.8691,0.815412,0.6102,0.609851,0.609539,0.603343
4,0.7416,0.681257,0.6826,0.683749,0.682523,0.679447
5,0.6497,0.649116,0.7004,0.707738,0.700106,0.701317
6,0.5788,0.630887,0.7144,0.727138,0.714089,0.714931


[I 2025-03-26 06:47:21,544] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0011469002699219671, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1364,0.889019,0.5648,0.564522,0.564706,0.557382
2,0.7958,0.671096,0.6914,0.691434,0.690746,0.688754
3,0.6342,0.613926,0.7335,0.741485,0.732727,0.731235


[I 2025-03-26 06:51:06,869] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0002209827657500777, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2347,0.923471,0.544,0.550703,0.544313,0.524494
2,0.7929,0.659232,0.6965,0.701599,0.696525,0.695928
3,0.6103,0.56253,0.76,0.761054,0.759672,0.757408
4,0.4964,0.470012,0.7994,0.803059,0.799322,0.798237
5,0.4074,0.457791,0.8094,0.814578,0.809392,0.809735
6,0.3334,0.454018,0.8068,0.820106,0.806679,0.807747


[I 2025-03-26 06:58:35,723] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.00042700571067181653, 'weight_decay': 0.008, 'warmup_steps': 6, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1401,0.834952,0.6066,0.625255,0.60597,0.604988
2,0.7256,0.621498,0.7231,0.732973,0.72306,0.721871
3,0.5633,0.550934,0.7667,0.770405,0.766259,0.764995
4,0.4677,0.460752,0.8054,0.81394,0.805416,0.803998
5,0.3949,0.40806,0.8285,0.833218,0.828345,0.828958
6,0.3294,0.417406,0.8257,0.83491,0.825635,0.825972
7,0.2706,0.425953,0.8269,0.834012,0.82753,0.823983
8,0.224,0.379111,0.8476,0.852587,0.847981,0.846933
9,0.1907,0.436036,0.8271,0.841531,0.826664,0.827758
10,0.1691,0.384332,0.8416,0.845569,0.84202,0.839894


[I 2025-03-26 07:11:08,375] Trial 115 finished with value: 0.8398936990409516 and parameters: {'learning_rate': 0.00042700571067181653, 'weight_decay': 0.008, 'warmup_steps': 6, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 116 with params: {'learning_rate': 0.0004995132844460749, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1398,0.859362,0.5811,0.609186,0.581099,0.573946
2,0.7248,0.607707,0.7257,0.73139,0.725698,0.722516
3,0.5703,0.563629,0.7631,0.772153,0.762392,0.761046
4,0.4739,0.457982,0.8139,0.816321,0.813829,0.812764
5,0.3996,0.411286,0.8285,0.832501,0.828298,0.82875
6,0.3356,0.440564,0.8138,0.828843,0.813615,0.814059
7,0.2777,0.404096,0.8367,0.841436,0.837288,0.834523
8,0.2306,0.367049,0.8502,0.85663,0.850473,0.850875
9,0.1964,0.414687,0.8355,0.847209,0.835147,0.836292
10,0.1724,0.369231,0.8547,0.857003,0.855088,0.853568


[I 2025-03-26 07:23:38,904] Trial 116 finished with value: 0.8535682414939336 and parameters: {'learning_rate': 0.0004995132844460749, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14, 'lambda_param': 0.4, 'temperature': 3.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 117 with params: {'learning_rate': 0.000256345652662891, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2201,0.896703,0.5538,0.567374,0.554447,0.540513
2,0.7655,0.636382,0.7139,0.718683,0.713715,0.711298
3,0.5897,0.548988,0.7618,0.764368,0.761338,0.759138
4,0.4775,0.458708,0.8076,0.812097,0.807606,0.806639
5,0.3908,0.429686,0.8275,0.829326,0.827601,0.827166
6,0.3166,0.448653,0.8106,0.823956,0.810293,0.811358


[I 2025-03-26 07:31:07,273] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0007003900325427294, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1223,0.83471,0.6029,0.617805,0.602126,0.593987
2,0.7482,0.638688,0.7076,0.71974,0.707137,0.706894
3,0.5965,0.551583,0.7608,0.765025,0.759976,0.758351
4,0.4993,0.476676,0.7946,0.7993,0.794547,0.793054
5,0.4286,0.42204,0.8236,0.830104,0.823423,0.823505
6,0.3644,0.423887,0.8204,0.835181,0.820321,0.821846


[I 2025-03-26 07:38:39,397] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.00017284643466884748, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2903,0.969129,0.5074,0.518565,0.507917,0.489431
2,0.8476,0.699343,0.6698,0.675089,0.669311,0.67088
3,0.6667,0.641056,0.7142,0.718571,0.713414,0.70976
4,0.5492,0.529375,0.771,0.774237,0.771066,0.769425
5,0.4546,0.498001,0.793,0.798082,0.793148,0.793634
6,0.3737,0.476508,0.7945,0.804678,0.7942,0.795185


[I 2025-03-26 07:46:07,955] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.0008416266901328355, 'weight_decay': 0.008, 'warmup_steps': 18, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1271,0.880963,0.5757,0.587261,0.575613,0.56447
2,0.7514,0.617802,0.7176,0.720846,0.717629,0.71219
3,0.604,0.575695,0.7467,0.749641,0.746001,0.742846


[I 2025-03-26 07:49:55,420] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 8.532115701682182e-05, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4075,1.064365,0.4528,0.461427,0.453503,0.434075
2,1.0322,0.871164,0.5711,0.580229,0.570579,0.570265
3,0.8381,0.794875,0.6187,0.614603,0.617735,0.60997
4,0.7177,0.663205,0.6965,0.698072,0.696227,0.693359
5,0.6282,0.628869,0.7137,0.719527,0.713441,0.714155
6,0.556,0.598682,0.7273,0.735754,0.726938,0.727264


[I 2025-03-26 07:57:26,106] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0005539433333566457, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1338,0.844566,0.5924,0.605101,0.591977,0.589022
2,0.7283,0.597278,0.7316,0.738648,0.731609,0.731276
3,0.5759,0.562053,0.7611,0.76746,0.760194,0.759135
4,0.4856,0.449368,0.8103,0.814569,0.810067,0.809075
5,0.4097,0.411569,0.8273,0.832585,0.827263,0.827841
6,0.3461,0.411958,0.8321,0.840183,0.832087,0.832465
7,0.2874,0.39144,0.8437,0.848724,0.844267,0.842172
8,0.2375,0.361557,0.8545,0.858618,0.854763,0.853874
9,0.2012,0.40644,0.8418,0.851674,0.841331,0.841826
10,0.1753,0.362516,0.857,0.859944,0.857301,0.85585


[I 2025-03-26 08:09:54,955] Trial 122 finished with value: 0.8558501397951461 and parameters: {'learning_rate': 0.0005539433333566457, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 123 with params: {'learning_rate': 0.0007661522892286894, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1088,0.868746,0.5816,0.600774,0.581685,0.572571
2,0.7397,0.626469,0.7161,0.719393,0.71611,0.71306
3,0.5909,0.573348,0.7487,0.753885,0.747939,0.745108
4,0.499,0.467089,0.8026,0.808779,0.802399,0.802015
5,0.4294,0.454897,0.8071,0.816696,0.807084,0.80787
6,0.3657,0.425642,0.8182,0.830285,0.818281,0.818869
7,0.3071,0.406739,0.8333,0.838091,0.833865,0.831206
8,0.2562,0.37488,0.8479,0.854098,0.848035,0.847825
9,0.2172,0.438669,0.8248,0.843598,0.82418,0.826275
10,0.186,0.37002,0.8518,0.855583,0.852176,0.850248


[I 2025-03-26 08:22:25,715] Trial 123 finished with value: 0.8502478102978015 and parameters: {'learning_rate': 0.0007661522892286894, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 3.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 124 with params: {'learning_rate': 0.0004346202747067784, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1469,0.828497,0.6085,0.614111,0.608191,0.602664
2,0.7346,0.610443,0.7242,0.730215,0.723888,0.722824
3,0.5734,0.540731,0.7609,0.768903,0.760196,0.759362
4,0.4769,0.452985,0.8137,0.816367,0.813906,0.812283
5,0.4006,0.414254,0.8299,0.835608,0.829514,0.830573
6,0.3362,0.420046,0.8246,0.83604,0.824591,0.824935
7,0.2763,0.414606,0.8295,0.834099,0.830279,0.826511
8,0.2289,0.377798,0.8425,0.849321,0.84291,0.842902
9,0.1942,0.421969,0.8349,0.846813,0.834727,0.835482
10,0.1705,0.367801,0.8533,0.856557,0.853617,0.852399


[I 2025-03-26 08:34:50,882] Trial 124 finished with value: 0.8523990537975182 and parameters: {'learning_rate': 0.0004346202747067784, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.1, 'temperature': 2.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 125 with params: {'learning_rate': 0.000568299561218188, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1513,0.868359,0.5846,0.598182,0.584924,0.574124
2,0.7484,0.632369,0.7089,0.712674,0.709026,0.708132
3,0.5958,0.560051,0.7582,0.764539,0.757205,0.755686
4,0.4974,0.468112,0.8028,0.808856,0.802644,0.802559
5,0.4204,0.420444,0.8287,0.833991,0.828697,0.829125
6,0.3527,0.418919,0.8241,0.834385,0.824133,0.824665
7,0.2913,0.415827,0.8309,0.83657,0.831575,0.827736
8,0.2399,0.377217,0.8441,0.853531,0.844299,0.844346
9,0.2015,0.447908,0.8214,0.838476,0.820907,0.821912
10,0.1752,0.374987,0.8475,0.850568,0.847914,0.84598


[I 2025-03-26 08:47:19,493] Trial 125 finished with value: 0.8459801205368516 and parameters: {'learning_rate': 0.000568299561218188, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 126 with params: {'learning_rate': 0.0009049791490282845, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1614,0.922456,0.534,0.553763,0.534241,0.519674
2,0.7963,0.667668,0.6938,0.695091,0.693591,0.690476
3,0.6336,0.580992,0.7451,0.74667,0.744327,0.741546


[I 2025-03-26 08:51:03,359] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0003016893635025172, 'weight_decay': 0.01, 'warmup_steps': 8, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1974,0.87068,0.5792,0.590018,0.579511,0.566627
2,0.7566,0.629955,0.7132,0.715318,0.713229,0.710831
3,0.5821,0.56106,0.7597,0.763726,0.759185,0.757468
4,0.4745,0.463771,0.8032,0.809465,0.803092,0.802438
5,0.3902,0.431368,0.8266,0.83231,0.826822,0.827204
6,0.3193,0.43946,0.8129,0.825633,0.812939,0.813946
7,0.2594,0.43365,0.8212,0.827009,0.8219,0.817646
8,0.2154,0.395131,0.8372,0.843385,0.837465,0.837497
9,0.186,0.441157,0.8216,0.834991,0.82122,0.822031
10,0.1673,0.398875,0.838,0.841385,0.838233,0.836413


[I 2025-03-26 09:03:29,658] Trial 127 finished with value: 0.836412631062541 and parameters: {'learning_rate': 0.0003016893635025172, 'weight_decay': 0.01, 'warmup_steps': 8, 'lambda_param': 0.0, 'temperature': 4.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 128 with params: {'learning_rate': 0.0008282486463958535, 'weight_decay': 0.01, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1395,0.895655,0.5704,0.589214,0.570272,0.563577
2,0.7647,0.645704,0.7018,0.705814,0.701626,0.695884
3,0.6116,0.555516,0.762,0.769511,0.760962,0.76039
4,0.5184,0.527045,0.7812,0.793564,0.780956,0.778211
5,0.4441,0.445603,0.8154,0.822882,0.815473,0.815927
6,0.3805,0.420319,0.8264,0.836383,0.826061,0.827092
7,0.3218,0.419411,0.8284,0.834377,0.828944,0.825783
8,0.2705,0.3648,0.8537,0.859053,0.853928,0.85412
9,0.2273,0.424113,0.8388,0.851575,0.838478,0.839411
10,0.1947,0.365395,0.8531,0.855396,0.853519,0.851753


[I 2025-03-26 09:15:59,211] Trial 128 finished with value: 0.8517532819981367 and parameters: {'learning_rate': 0.0008282486463958535, 'weight_decay': 0.01, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 129 with params: {'learning_rate': 0.0005167325349125508, 'weight_decay': 0.007, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1506,0.84345,0.5899,0.609059,0.589265,0.582973
2,0.7467,0.618079,0.7198,0.725377,0.719676,0.718427
3,0.5804,0.572862,0.7508,0.75931,0.750594,0.748615
4,0.4843,0.456059,0.8082,0.813367,0.808273,0.807148
5,0.4091,0.425235,0.8184,0.824456,0.818555,0.819006
6,0.3448,0.422329,0.8212,0.833978,0.821229,0.822537


[I 2025-03-26 09:23:27,296] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0006983958756474961, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1564,0.883195,0.5568,0.586459,0.556769,0.547991
2,0.7614,0.633734,0.7145,0.714328,0.714699,0.710139
3,0.601,0.546879,0.7676,0.77176,0.766958,0.765543
4,0.5009,0.446965,0.8103,0.812091,0.810312,0.809282
5,0.4277,0.427113,0.8211,0.830574,0.820975,0.821373
6,0.3633,0.410497,0.8313,0.842152,0.831044,0.83218
7,0.307,0.412941,0.8339,0.838943,0.834849,0.830382
8,0.2562,0.363544,0.8513,0.855346,0.851515,0.851504
9,0.2165,0.411555,0.8375,0.850846,0.837301,0.838241
10,0.1861,0.361988,0.8555,0.860601,0.856049,0.853968


[I 2025-03-26 09:35:46,739] Trial 130 finished with value: 0.8539675708836988 and parameters: {'learning_rate': 0.0006983958756474961, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 131 with params: {'learning_rate': 0.00036745224272528257, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1684,0.859511,0.5898,0.607833,0.590017,0.579802
2,0.7414,0.623671,0.7126,0.72015,0.712344,0.711828
3,0.5841,0.560335,0.7587,0.764515,0.758073,0.756606
4,0.483,0.450421,0.8075,0.812941,0.807301,0.807239
5,0.4006,0.412816,0.8318,0.838717,0.83157,0.832647
6,0.3294,0.417085,0.8266,0.839084,0.826365,0.827799
7,0.2696,0.438205,0.8234,0.831772,0.824065,0.820762
8,0.2225,0.381853,0.8468,0.855822,0.847194,0.846776
9,0.19,0.427727,0.8321,0.84532,0.831816,0.832253
10,0.1696,0.377156,0.8478,0.850786,0.848154,0.846538


[I 2025-03-26 09:48:12,334] Trial 131 finished with value: 0.8465379034325912 and parameters: {'learning_rate': 0.00036745224272528257, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 132 with params: {'learning_rate': 0.0013526774812612856, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1601,0.955051,0.518,0.539854,0.518139,0.499007
2,0.8306,0.727616,0.6592,0.665364,0.659106,0.654392
3,0.6804,0.637467,0.7154,0.720041,0.714544,0.710813


[I 2025-03-26 09:51:55,528] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0005870587024591358, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1472,0.885636,0.5728,0.610913,0.572273,0.567263
2,0.7234,0.605424,0.7288,0.733105,0.729014,0.726999
3,0.5727,0.541841,0.7669,0.772724,0.7661,0.765099
4,0.4869,0.446362,0.8123,0.816626,0.812199,0.811973
5,0.4131,0.426405,0.8213,0.82958,0.821183,0.821506
6,0.3499,0.416978,0.8262,0.839759,0.826076,0.827173
7,0.2927,0.430966,0.8277,0.836046,0.828286,0.825019
8,0.2442,0.365518,0.8547,0.859808,0.854915,0.854553
9,0.2061,0.424265,0.8317,0.845394,0.831323,0.832395
10,0.1784,0.361691,0.8552,0.8573,0.855413,0.854033


[I 2025-03-26 10:04:24,829] Trial 133 finished with value: 0.8540330091357353 and parameters: {'learning_rate': 0.0005870587024591358, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 134 with params: {'learning_rate': 6.558978114640059e-05, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4313,1.10353,0.4293,0.439298,0.429184,0.41588
2,1.0963,0.925811,0.5413,0.543512,0.540653,0.534567
3,0.9083,0.8894,0.5722,0.581405,0.571725,0.565327


[I 2025-03-26 10:08:08,041] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0009341701211843292, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.161,0.883587,0.5504,0.587867,0.550226,0.534905
2,0.7803,0.669294,0.696,0.701914,0.696077,0.694547
3,0.6197,0.59921,0.7412,0.743947,0.740486,0.737581
4,0.5229,0.479494,0.7939,0.799909,0.793871,0.793118
5,0.4544,0.434618,0.8187,0.823723,0.818921,0.81863
6,0.3916,0.435202,0.8154,0.826,0.815213,0.816448
7,0.3363,0.430448,0.8215,0.827244,0.822159,0.818377
8,0.2858,0.371094,0.8471,0.852242,0.847327,0.847292
9,0.2419,0.412874,0.8385,0.848868,0.838153,0.838866
10,0.2057,0.379692,0.849,0.853273,0.849402,0.848004


[I 2025-03-26 10:20:34,432] Trial 135 finished with value: 0.8480036507700646 and parameters: {'learning_rate': 0.0009341701211843292, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 136 with params: {'learning_rate': 0.0006994448361275141, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1194,0.825067,0.6035,0.621872,0.602915,0.595815
2,0.7363,0.603153,0.7299,0.732475,0.729348,0.727026
3,0.5783,0.561989,0.7559,0.761668,0.755184,0.753292
4,0.4905,0.463814,0.8024,0.807086,0.802493,0.801833
5,0.4201,0.432544,0.8193,0.827916,0.819183,0.820419
6,0.3588,0.430868,0.8247,0.836575,0.824329,0.825647
7,0.3003,0.418312,0.8303,0.835951,0.830883,0.827622
8,0.2514,0.363596,0.853,0.855861,0.853229,0.853003
9,0.2108,0.422785,0.8314,0.845777,0.830928,0.831281
10,0.182,0.364817,0.8538,0.854714,0.854176,0.852562


[I 2025-03-26 10:32:57,964] Trial 136 finished with value: 0.8525617052541499 and parameters: {'learning_rate': 0.0006994448361275141, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.0, 'temperature': 4.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 137 with params: {'learning_rate': 0.0008066631648617294, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1409,0.868953,0.567,0.582043,0.566713,0.558056
2,0.7663,0.631425,0.7116,0.720672,0.710708,0.710164
3,0.61,0.579813,0.7472,0.751187,0.746698,0.743937
4,0.5119,0.565337,0.7958,0.804862,0.795855,0.796556
5,0.4409,0.432666,0.8205,0.828858,0.820491,0.821302
6,0.3805,0.429204,0.8178,0.831395,0.817722,0.819549
7,0.3213,0.43198,0.8258,0.833336,0.826376,0.822918
8,0.27,0.373949,0.8507,0.853182,0.850925,0.850607
9,0.2259,0.422315,0.834,0.845947,0.833647,0.834667
10,0.1934,0.365934,0.8526,0.854955,0.852838,0.851416


[I 2025-03-26 10:45:18,570] Trial 137 finished with value: 0.8514155259594073 and parameters: {'learning_rate': 0.0008066631648617294, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 138 with params: {'learning_rate': 0.0006354569614363497, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1447,0.844285,0.5977,0.603705,0.597756,0.590194
2,0.7467,0.607956,0.7268,0.729782,0.726774,0.724364
3,0.5886,0.569774,0.7532,0.758414,0.752595,0.751083
4,0.4937,0.448704,0.8124,0.815608,0.812289,0.811347
5,0.4211,0.426289,0.8257,0.831334,0.82559,0.826157
6,0.3553,0.423783,0.8233,0.838847,0.823073,0.824634
7,0.2946,0.406261,0.8391,0.843567,0.839619,0.836548
8,0.2443,0.368849,0.8509,0.854922,0.851233,0.850422
9,0.2052,0.419483,0.8328,0.846609,0.832416,0.833565
10,0.1776,0.355957,0.8584,0.861162,0.858716,0.857618


[I 2025-03-26 10:57:40,159] Trial 138 finished with value: 0.857617937642259 and parameters: {'learning_rate': 0.0006354569614363497, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 6.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 139 with params: {'learning_rate': 0.000412476668386486, 'weight_decay': 0.01, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1557,0.852182,0.5854,0.602905,0.585866,0.578814
2,0.7365,0.619283,0.721,0.72679,0.720958,0.718514
3,0.5753,0.538696,0.7691,0.773678,0.768753,0.767667
4,0.4758,0.458502,0.8033,0.809591,0.80311,0.802228
5,0.3992,0.420409,0.8238,0.830416,0.823746,0.824458
6,0.3308,0.426046,0.8248,0.836263,0.82476,0.825749
7,0.2712,0.416432,0.8303,0.835614,0.830905,0.828039
8,0.2251,0.370879,0.8483,0.85496,0.848669,0.848629
9,0.1902,0.424725,0.8315,0.843561,0.831332,0.83263
10,0.1683,0.382651,0.8472,0.850723,0.847593,0.845752


[I 2025-03-26 11:10:06,322] Trial 139 finished with value: 0.8457521565557095 and parameters: {'learning_rate': 0.000412476668386486, 'weight_decay': 0.01, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 5.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 140 with params: {'learning_rate': 0.0003078372337018397, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.208,0.877036,0.5671,0.58799,0.567588,0.553377
2,0.7683,0.653308,0.7011,0.712298,0.701106,0.70262
3,0.5995,0.58308,0.7404,0.745237,0.740005,0.737307
4,0.4873,0.476246,0.7957,0.800155,0.795711,0.794807
5,0.4085,0.42399,0.8207,0.822588,0.820774,0.820466
6,0.3331,0.456431,0.8043,0.823866,0.804233,0.805619


[I 2025-03-26 11:17:30,583] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.004819634368582602, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3353,1.190413,0.3571,0.342471,0.35625,0.3265
2,1.094,0.956472,0.5084,0.508598,0.507786,0.495194
3,0.8919,0.824169,0.6094,0.607757,0.608071,0.599785


[I 2025-03-26 11:21:14,386] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0005786036208285414, 'weight_decay': 0.008, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1342,0.822184,0.6037,0.60981,0.603319,0.595401
2,0.7354,0.621211,0.7205,0.729214,0.720453,0.719169
3,0.5802,0.552369,0.7576,0.763573,0.757097,0.755225
4,0.4875,0.456697,0.8045,0.808037,0.804488,0.803276
5,0.415,0.418863,0.8255,0.83211,0.825391,0.826081
6,0.3501,0.413072,0.8276,0.839381,0.827551,0.828562
7,0.2918,0.402466,0.8369,0.839644,0.83745,0.834093
8,0.2428,0.363738,0.8548,0.858949,0.855079,0.854717
9,0.2043,0.422613,0.8317,0.843958,0.831382,0.831786
10,0.178,0.36563,0.856,0.857341,0.85626,0.854805


[I 2025-03-26 11:33:32,618] Trial 142 finished with value: 0.8548045479648717 and parameters: {'learning_rate': 0.0005786036208285414, 'weight_decay': 0.008, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 143 with params: {'learning_rate': 0.000962444195538357, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1353,0.884974,0.5574,0.578622,0.557474,0.547795
2,0.7848,0.672415,0.6928,0.69465,0.692819,0.689055
3,0.6229,0.584165,0.7448,0.749044,0.743969,0.742392


[I 2025-03-26 11:37:15,929] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.0008904735735592472, 'weight_decay': 0.007, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1382,0.902736,0.5618,0.568635,0.561619,0.553371
2,0.7716,0.651334,0.7058,0.712298,0.705383,0.704124
3,0.6115,0.566052,0.7535,0.755801,0.752791,0.750269
4,0.517,0.463952,0.8006,0.805163,0.800496,0.799411
5,0.4401,0.440641,0.8161,0.824906,0.81612,0.817191
6,0.3769,0.424977,0.8208,0.832803,0.82105,0.821809
7,0.3207,0.422007,0.8264,0.833593,0.827074,0.823814
8,0.2716,0.361684,0.8536,0.860153,0.853841,0.853759
9,0.2271,0.423331,0.8331,0.846691,0.832777,0.833808
10,0.1942,0.376899,0.8488,0.852794,0.849222,0.847242


[I 2025-03-26 11:49:41,176] Trial 144 finished with value: 0.8472416650266066 and parameters: {'learning_rate': 0.0008904735735592472, 'weight_decay': 0.007, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


Trial 145 with params: {'learning_rate': 0.0005128998165901848, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1475,0.849911,0.6073,0.618475,0.606622,0.603097
2,0.7335,0.615517,0.7245,0.734596,0.724011,0.72503
3,0.5762,0.561276,0.7516,0.758254,0.750853,0.749396
4,0.4808,0.450768,0.8136,0.816544,0.813421,0.812438
5,0.4063,0.410903,0.8305,0.835998,0.83038,0.830928
6,0.3383,0.417422,0.8255,0.836379,0.825246,0.826635
7,0.2816,0.423192,0.8305,0.8381,0.83115,0.828525
8,0.2346,0.375472,0.8499,0.854924,0.850207,0.849348
9,0.1977,0.424511,0.8336,0.846134,0.833217,0.833689
10,0.1737,0.365147,0.8527,0.854953,0.853108,0.851345


[I 2025-03-26 12:02:05,328] Trial 145 finished with value: 0.851345393789359 and parameters: {'learning_rate': 0.0005128998165901848, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 146 with params: {'learning_rate': 0.0008180227221315464, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1176,0.869544,0.5802,0.592849,0.579355,0.573975
2,0.7665,0.643296,0.7028,0.705578,0.703006,0.699036
3,0.6115,0.570819,0.7488,0.751188,0.748155,0.745726
4,0.5147,0.476732,0.7925,0.800366,0.792064,0.792275
5,0.4397,0.435947,0.8181,0.824575,0.818033,0.819087
6,0.3742,0.438188,0.8159,0.827786,0.816062,0.816713
7,0.3165,0.42459,0.8259,0.832738,0.826642,0.822588
8,0.2658,0.370614,0.8471,0.850444,0.847278,0.847235
9,0.2227,0.412916,0.8346,0.847043,0.834138,0.834927
10,0.1909,0.373933,0.8486,0.853661,0.849139,0.847422


[I 2025-03-26 12:14:30,848] Trial 146 finished with value: 0.8474215529602314 and parameters: {'learning_rate': 0.0008180227221315464, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 35 with value: 0.8607044045531035.


Trial 147 with params: {'learning_rate': 0.00039272229567661494, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1581,0.834694,0.6049,0.611226,0.604786,0.600323
2,0.7381,0.604208,0.7274,0.733282,0.727038,0.726649
3,0.5728,0.526895,0.7766,0.780139,0.776224,0.774957
4,0.4722,0.457799,0.8083,0.813719,0.808381,0.807238
5,0.394,0.419739,0.8225,0.825384,0.822533,0.822166
6,0.3271,0.41739,0.8234,0.834448,0.823311,0.824566


[I 2025-03-26 12:21:54,797] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0008629993785421451, 'weight_decay': 0.01, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1485,0.925519,0.5392,0.565401,0.538422,0.530601
2,0.7832,0.638121,0.7133,0.716016,0.713139,0.710264
3,0.6232,0.592369,0.7346,0.741206,0.733342,0.73047


[I 2025-03-26 12:25:38,335] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0006013711398701938, 'weight_decay': 0.008, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1373,0.837469,0.5926,0.613968,0.59244,0.585716
2,0.7449,0.625158,0.7158,0.71928,0.715659,0.712727
3,0.5879,0.572607,0.7437,0.748696,0.743293,0.741153
4,0.4921,0.458909,0.8051,0.809126,0.805052,0.803396
5,0.4199,0.41999,0.8266,0.831498,0.826614,0.826707
6,0.3532,0.432605,0.8188,0.832337,0.818886,0.819065
7,0.2961,0.404011,0.8382,0.841311,0.838673,0.836259
8,0.246,0.37145,0.8489,0.854292,0.849127,0.849125
9,0.2066,0.414665,0.8359,0.847119,0.835799,0.836011
10,0.1797,0.366399,0.8501,0.853568,0.850399,0.848966


[I 2025-03-26 12:38:01,748] Trial 149 finished with value: 0.8489659377203326 and parameters: {'learning_rate': 0.0006013711398701938, 'weight_decay': 0.008, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 5.5}. Best is trial 35 with value: 0.8607044045531035.


In [20]:
print(best_distill_random)

BestRun(run_id='35', objective=0.8607044045531035, hyperparameters={'learning_rate': 0.0006139968240256416, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}, run_summary=None)


In [21]:
base.reset_seed()

## Prohledávání s normálním tréninkem s doučením klasifikační hlavy předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [22]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-head_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-head_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [None]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [24]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [27]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.freeze_model(base.get_mobilenet(10))
)
  

config.json:   0%|          | 0.00/69.8k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/14.2M [00:00<?, ?B/s]

Nastavení prohledávání.

In [28]:
best_base_head = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-head",
    n_trials=150
)

[I 2025-03-26 13:17:47,446] A new study created in memory with name: Base-head


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3153,0.907658,0.7127,0.723356,0.712383,0.712754
2,0.8364,0.797999,0.7351,0.743364,0.733999,0.735253
3,0.7624,0.762252,0.7442,0.745174,0.743412,0.74189


[I 2025-03-26 13:20:17,766] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9932,0.766502,0.7364,0.754752,0.735974,0.737793
2,0.7304,0.715032,0.7541,0.759926,0.753558,0.753677
3,0.6904,0.697195,0.7605,0.763086,0.759557,0.759461
4,0.6746,0.673912,0.769,0.771324,0.769037,0.768282
5,0.6596,0.705323,0.7535,0.762131,0.753585,0.754929
6,0.6545,0.663235,0.7718,0.772694,0.771496,0.771462


[I 2025-03-26 13:25:16,028] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8868,1.48872,0.6359,0.642433,0.635637,0.632968
2,1.3089,1.167719,0.6875,0.691065,0.686501,0.68498
3,1.0863,1.026419,0.7033,0.701312,0.702507,0.698107


[I 2025-03-26 13:27:45,373] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9729,0.754201,0.7373,0.763065,0.736927,0.738718
2,0.7143,0.70103,0.7572,0.763281,0.756828,0.756101
3,0.6801,0.715526,0.7523,0.765085,0.751255,0.751074
4,0.6672,0.660954,0.7712,0.775246,0.771202,0.771031
5,0.6537,0.701084,0.7571,0.767987,0.757137,0.758571
6,0.6489,0.657304,0.7717,0.774181,0.771374,0.771904


[I 2025-03-26 13:32:46,015] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8685,0.745054,0.7414,0.760017,0.741344,0.742499
2,0.7155,0.709354,0.7535,0.76054,0.752938,0.75218
3,0.6873,0.739032,0.7451,0.761404,0.744171,0.742385
4,0.6773,0.660528,0.768,0.773527,0.767962,0.768257
5,0.6642,0.684108,0.7596,0.764538,0.75958,0.759909
6,0.657,0.666197,0.7668,0.773231,0.766148,0.767618


[I 2025-03-26 13:37:47,675] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6538,1.195002,0.6801,0.688507,0.679867,0.678714
2,1.0555,0.961004,0.7123,0.718655,0.711183,0.711468
3,0.9064,0.879642,0.7228,0.722476,0.722024,0.718725
4,0.8463,0.82406,0.7352,0.736637,0.735176,0.733672
5,0.8104,0.831929,0.7276,0.734266,0.727394,0.728403
6,0.7903,0.788575,0.7413,0.743153,0.740976,0.741067


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Tue Mar 25 13:21:27 2025) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-26 13:43:22,279] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2195,0.856074,0.7202,0.732305,0.719849,0.720658
2,0.7983,0.770039,0.7395,0.748041,0.738485,0.739748
3,0.7367,0.740176,0.7477,0.748825,0.746939,0.745839
4,0.712,0.705846,0.7598,0.760666,0.759875,0.758435
5,0.6933,0.731533,0.7474,0.754444,0.747373,0.748522
6,0.6853,0.691631,0.7647,0.766098,0.764454,0.764402


[I 2025-03-26 13:48:26,831] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7307,1.288317,0.6688,0.676903,0.668503,0.666904
2,1.1324,1.021478,0.7061,0.711122,0.70502,0.70467
3,0.9588,0.921905,0.7177,0.716796,0.716895,0.713373
4,0.887,0.860644,0.7306,0.732467,0.730565,0.729148
5,0.8451,0.862596,0.7251,0.73187,0.724874,0.725896
6,0.8211,0.817627,0.7371,0.738803,0.736771,0.73679


[I 2025-03-26 13:53:32,261] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1555,0.835616,0.7232,0.73576,0.722849,0.723935
2,0.7835,0.759305,0.7408,0.749369,0.739807,0.741071
3,0.727,0.731755,0.75,0.75118,0.749211,0.748294
4,0.7043,0.699655,0.7607,0.761785,0.760765,0.759387
5,0.6865,0.725882,0.7478,0.754932,0.747785,0.748931
6,0.6791,0.686173,0.7662,0.767479,0.765952,0.765889


[I 2025-03-26 13:58:37,579] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0701,0.79957,0.7301,0.744454,0.729669,0.731298
2,0.757,0.737747,0.7457,0.753472,0.744859,0.745926
3,0.7086,0.714192,0.7555,0.756736,0.754704,0.754114
4,0.6893,0.687745,0.7632,0.764763,0.76326,0.76206
5,0.673,0.714652,0.75,0.757359,0.750039,0.75124
6,0.6668,0.675012,0.7688,0.769712,0.768516,0.768403
7,0.6558,0.684471,0.7662,0.765202,0.766127,0.762847
8,0.6494,0.680379,0.7647,0.769715,0.764483,0.764021
9,0.6484,0.690086,0.7595,0.764873,0.75919,0.759692
10,0.6441,0.676936,0.7674,0.769469,0.767005,0.766303


[I 2025-03-26 14:06:55,877] Trial 9 finished with value: 0.7663032658144318 and parameters: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}. Best is trial 9 with value: 0.7663032658144318.


Trial 10 with params: {'learning_rate': 0.0026025741521183794, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8798,0.758556,0.7373,0.760058,0.737349,0.738535
2,0.7209,0.717,0.7515,0.760021,0.750832,0.750405
3,0.6931,0.743028,0.7446,0.761438,0.743631,0.742135
4,0.6835,0.665012,0.7677,0.774554,0.767671,0.768109
5,0.6693,0.679601,0.7632,0.766841,0.763104,0.763472
6,0.6612,0.668228,0.7673,0.774027,0.766611,0.768112
7,0.6447,0.675258,0.7693,0.771302,0.769263,0.766338
8,0.6341,0.670903,0.7692,0.773302,0.769138,0.768267
9,0.6302,0.688437,0.7636,0.769933,0.763544,0.763308
10,0.616,0.657228,0.7727,0.773987,0.772449,0.771495


[I 2025-03-26 14:15:23,843] Trial 10 finished with value: 0.7714948222142116 and parameters: {'learning_rate': 0.0026025741521183794, 'weight_decay': 0.007, 'warmup_steps': 14}. Best is trial 10 with value: 0.7714948222142116.


Trial 11 with params: {'learning_rate': 0.004345544743062486, 'weight_decay': 0.006, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8969,0.799114,0.7305,0.750419,0.730902,0.731412
2,0.7614,0.747163,0.7422,0.759531,0.741548,0.744903
3,0.7351,0.718728,0.7519,0.758024,0.75115,0.750862
4,0.7255,0.697013,0.7632,0.775758,0.763496,0.763618
5,0.7066,0.692812,0.7579,0.761175,0.757542,0.757962
6,0.6916,0.68103,0.763,0.769884,0.762334,0.763569


[I 2025-03-26 14:20:21,222] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.00347910804452505, 'weight_decay': 0.006, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8645,0.789289,0.7307,0.75424,0.731084,0.73172
2,0.7394,0.736182,0.7456,0.758003,0.744902,0.746231
3,0.7124,0.729123,0.7513,0.762246,0.750386,0.749765
4,0.7034,0.68285,0.7651,0.776525,0.765285,0.765798
5,0.6869,0.680357,0.7616,0.764458,0.761356,0.762119
6,0.6752,0.674719,0.7653,0.772902,0.764583,0.766193


[I 2025-03-26 14:25:20,249] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0006887042034090289, 'weight_decay': 0.006, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0414,0.77679,0.7358,0.752699,0.73537,0.737263
2,0.7392,0.721895,0.7509,0.75691,0.750271,0.750661
3,0.696,0.701448,0.7605,0.762419,0.759621,0.759432
4,0.679,0.678576,0.7673,0.769366,0.767354,0.766427
5,0.6636,0.707686,0.7522,0.760496,0.752266,0.753666
6,0.6581,0.666718,0.7709,0.771633,0.77059,0.770519
7,0.6467,0.67679,0.7703,0.769078,0.770199,0.766849
8,0.6405,0.674868,0.7652,0.770162,0.764998,0.764475
9,0.6395,0.684205,0.7609,0.766,0.760577,0.760906
10,0.6344,0.669566,0.7674,0.768848,0.767031,0.766129


[I 2025-03-26 14:33:42,989] Trial 13 finished with value: 0.766128632555344 and parameters: {'learning_rate': 0.0006887042034090289, 'weight_decay': 0.006, 'warmup_steps': 14}. Best is trial 10 with value: 0.7714948222142116.


Trial 14 with params: {'learning_rate': 0.0028927493446863814, 'weight_decay': 0.01, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8858,0.772446,0.7332,0.758226,0.733413,0.734293
2,0.7269,0.724157,0.7496,0.75944,0.748855,0.748957
3,0.6994,0.742934,0.7455,0.761122,0.744533,0.743235
4,0.69,0.670539,0.7665,0.775131,0.766512,0.767075
5,0.675,0.67725,0.7637,0.766781,0.763553,0.764108
6,0.6657,0.669917,0.7657,0.772834,0.764983,0.766561
7,0.6484,0.678161,0.7688,0.771159,0.768759,0.76587
8,0.6367,0.673813,0.7682,0.772847,0.76814,0.767402
9,0.6323,0.691804,0.7635,0.770504,0.763477,0.763315
10,0.6166,0.657741,0.7729,0.774247,0.772653,0.771733


[I 2025-03-26 14:42:00,911] Trial 14 finished with value: 0.7717328120181975 and parameters: {'learning_rate': 0.0028927493446863814, 'weight_decay': 0.01, 'warmup_steps': 18}. Best is trial 14 with value: 0.7717328120181975.


Trial 15 with params: {'learning_rate': 0.004136030701154777, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9038,0.796854,0.7305,0.751178,0.7309,0.731449
2,0.7564,0.745457,0.7436,0.760255,0.742961,0.74613
3,0.73,0.717536,0.753,0.759673,0.752237,0.752077
4,0.7204,0.694751,0.7639,0.776661,0.76421,0.764343
5,0.7022,0.68975,0.7593,0.76247,0.758963,0.759474
6,0.6878,0.679831,0.7633,0.770779,0.762609,0.764027


[I 2025-03-26 14:47:03,880] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0016516727692087007, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9153,0.74254,0.7432,0.764182,0.742936,0.744375
2,0.7108,0.701087,0.7558,0.76179,0.755361,0.754274
3,0.6795,0.728889,0.7454,0.76156,0.744414,0.743314
4,0.668,0.658906,0.7706,0.775639,0.770557,0.770766
5,0.6556,0.69657,0.7567,0.766187,0.756722,0.757897
6,0.6503,0.658971,0.7699,0.773404,0.769469,0.770325
7,0.6359,0.665766,0.7719,0.771906,0.771827,0.768925
8,0.6282,0.667907,0.7668,0.770468,0.766672,0.765432
9,0.6264,0.679514,0.7646,0.769773,0.764408,0.764294
10,0.6165,0.657384,0.7726,0.773584,0.772317,0.771289


[I 2025-03-26 14:55:31,121] Trial 16 finished with value: 0.7712894833822972 and parameters: {'learning_rate': 0.0016516727692087007, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}. Best is trial 14 with value: 0.7717328120181975.


Trial 17 with params: {'learning_rate': 0.002141096997973419, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8814,0.740538,0.7428,0.760086,0.742674,0.74373
2,0.7134,0.70584,0.7546,0.760755,0.754097,0.753064
3,0.6847,0.736784,0.7458,0.762249,0.744901,0.74303


[I 2025-03-26 14:58:02,596] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0001044907148504563, 'weight_decay': 0.006, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7197,1.253835,0.6728,0.680842,0.672547,0.671069
2,1.1006,0.994498,0.7087,0.714465,0.707569,0.70756
3,0.9349,0.902132,0.7198,0.71924,0.718999,0.715667


[I 2025-03-26 15:00:34,636] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0010990405503339894, 'weight_decay': 0.007, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.984,0.757231,0.7366,0.760213,0.736221,0.737826
2,0.718,0.703191,0.7561,0.761257,0.755665,0.755295
3,0.6822,0.704173,0.7578,0.765924,0.756701,0.756776
4,0.6685,0.663869,0.7694,0.772527,0.769436,0.769077
5,0.6542,0.702546,0.7551,0.765651,0.755179,0.756637
6,0.6494,0.658029,0.7725,0.774075,0.772188,0.77239
7,0.6369,0.667114,0.7705,0.769,0.770434,0.767163
8,0.6303,0.668871,0.7655,0.769638,0.765303,0.764343
9,0.6292,0.678593,0.7616,0.76655,0.761328,0.761376
10,0.6219,0.660819,0.7703,0.771602,0.769994,0.769054


[I 2025-03-26 15:09:07,996] Trial 19 finished with value: 0.7690538666356959 and parameters: {'learning_rate': 0.0010990405503339894, 'weight_decay': 0.007, 'warmup_steps': 26}. Best is trial 14 with value: 0.7717328120181975.


Trial 20 with params: {'learning_rate': 7.828712010044815e-05, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7936,1.380377,0.6555,0.663192,0.655203,0.653076
2,1.2137,1.088923,0.6968,0.700849,0.695781,0.694843
3,1.0178,0.97026,0.711,0.709617,0.710172,0.70626


[I 2025-03-26 15:11:41,186] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.001795268169759896, 'weight_decay': 0.01, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9034,0.739436,0.7429,0.76206,0.742682,0.743906
2,0.7108,0.70148,0.7558,0.761605,0.755342,0.754244
3,0.6805,0.731977,0.7459,0.762494,0.74495,0.743636
4,0.6694,0.658422,0.7708,0.775987,0.770739,0.771082
5,0.6571,0.693828,0.7565,0.765138,0.75652,0.757546
6,0.6514,0.660516,0.7689,0.773231,0.768421,0.769495
7,0.6367,0.666784,0.7718,0.772248,0.771719,0.768825
8,0.6286,0.667927,0.7675,0.771152,0.767409,0.766166
9,0.6265,0.680448,0.7649,0.770152,0.764734,0.764569
10,0.616,0.657078,0.7724,0.773543,0.77211,0.771131


[I 2025-03-26 15:20:08,857] Trial 21 finished with value: 0.7711308771110976 and parameters: {'learning_rate': 0.001795268169759896, 'weight_decay': 0.01, 'warmup_steps': 16}. Best is trial 14 with value: 0.7717328120181975.


Trial 22 with params: {'learning_rate': 0.0017118867984434079, 'weight_decay': 0.007, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9013,0.740686,0.7428,0.763132,0.742543,0.743885
2,0.7105,0.701076,0.756,0.761871,0.75556,0.754448
3,0.6797,0.730173,0.7454,0.761862,0.744424,0.743215
4,0.6684,0.65872,0.7705,0.775611,0.770451,0.770728
5,0.6561,0.695484,0.7568,0.76604,0.756832,0.757946
6,0.6507,0.659551,0.7694,0.773259,0.768952,0.769921
7,0.6362,0.666093,0.7717,0.771925,0.771624,0.768716
8,0.6283,0.667887,0.7674,0.771033,0.767285,0.766036
9,0.6264,0.679861,0.7645,0.769779,0.764309,0.764221
10,0.6162,0.657226,0.7725,0.773579,0.772218,0.771221


[I 2025-03-26 15:28:35,391] Trial 22 finished with value: 0.7712209540663186 and parameters: {'learning_rate': 0.0017118867984434079, 'weight_decay': 0.007, 'warmup_steps': 13}. Best is trial 14 with value: 0.7717328120181975.


Trial 23 with params: {'learning_rate': 0.002429968097831835, 'weight_decay': 0.008, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8889,0.750659,0.7398,0.759807,0.739763,0.740937
2,0.7181,0.712661,0.7525,0.760142,0.751878,0.751283
3,0.6899,0.741152,0.7455,0.762155,0.744544,0.742903
4,0.6799,0.662239,0.7682,0.774023,0.768182,0.768487
5,0.6664,0.681982,0.7614,0.765745,0.761338,0.761683
6,0.6588,0.667203,0.7671,0.773813,0.766417,0.767926


[I 2025-03-26 15:33:41,028] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0010295071542769218, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9897,0.758326,0.7378,0.76004,0.737417,0.739088
2,0.72,0.705036,0.757,0.762259,0.756567,0.756323
3,0.6834,0.700779,0.759,0.765407,0.757931,0.758063
4,0.6693,0.665505,0.7692,0.772192,0.769225,0.768813
5,0.6548,0.70301,0.7541,0.764448,0.754192,0.755621
6,0.65,0.658676,0.7725,0.773853,0.772192,0.772324
7,0.6377,0.668059,0.7712,0.769648,0.771132,0.767841
8,0.6313,0.669338,0.7654,0.769782,0.765191,0.764368
9,0.6302,0.678974,0.7615,0.766455,0.761223,0.76131
10,0.6233,0.661698,0.7699,0.771367,0.769554,0.768664


[I 2025-03-26 15:42:07,446] Trial 24 finished with value: 0.7686636463285315 and parameters: {'learning_rate': 0.0010295071542769218, 'weight_decay': 0.01, 'warmup_steps': 24}. Best is trial 14 with value: 0.7717328120181975.


Trial 25 with params: {'learning_rate': 0.004985341137518224, 'weight_decay': 0.01, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8983,0.81104,0.7306,0.74935,0.730863,0.730907
2,0.7788,0.748045,0.7436,0.758641,0.742755,0.745481
3,0.7517,0.732768,0.751,0.758799,0.75028,0.749784
4,0.7413,0.699075,0.7636,0.77273,0.763915,0.763542
5,0.7199,0.702316,0.7574,0.761047,0.75712,0.757342
6,0.704,0.685793,0.7622,0.767931,0.761644,0.762317


[I 2025-03-26 15:47:11,366] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.0008704244501957572, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0,0.762982,0.738,0.75769,0.737599,0.73945
2,0.7262,0.711078,0.7546,0.760387,0.75413,0.754138
3,0.6875,0.696608,0.7599,0.76348,0.758909,0.758909
4,0.6723,0.670608,0.7692,0.771665,0.769243,0.768609
5,0.6575,0.704339,0.7541,0.763574,0.754201,0.755596
6,0.6525,0.6612,0.7725,0.773476,0.77219,0.772175
7,0.6406,0.671117,0.7706,0.769133,0.770531,0.767141
8,0.6344,0.671104,0.766,0.770424,0.765805,0.765117
9,0.6334,0.680541,0.7621,0.767234,0.761814,0.762041
10,0.6273,0.66445,0.7688,0.77024,0.768453,0.767552


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Tue Mar 25 13:21:27 2025) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-26 15:56:14,327] Trial 26 finished with value: 0.7675521511958575 and parameters: {'learning_rate': 0.0008704244501957572, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16}. Best is trial 14 with value: 0.7717328120181975.


Trial 27 with params: {'learning_rate': 0.00021059103361382344, 'weight_decay': 0.001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4376,0.982734,0.7022,0.711796,0.70193,0.701802
2,0.8915,0.837593,0.7292,0.737359,0.728067,0.729253
3,0.7983,0.791907,0.7378,0.738732,0.736983,0.73507
4,0.7613,0.748315,0.7511,0.752139,0.751106,0.749725
5,0.7368,0.768175,0.7392,0.745556,0.739002,0.740038
6,0.7246,0.727215,0.7545,0.756707,0.754242,0.754419


[I 2025-03-26 16:01:19,150] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.002953666986018182, 'weight_decay': 0.002, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8943,0.775074,0.7329,0.758087,0.733156,0.733878
2,0.7285,0.725655,0.7491,0.759303,0.748351,0.748598
3,0.701,0.742033,0.7461,0.761491,0.745121,0.743985
4,0.6915,0.671816,0.7664,0.775555,0.766432,0.767014
5,0.6763,0.677131,0.7635,0.766606,0.763346,0.763964
6,0.6667,0.670328,0.7659,0.773075,0.765167,0.766752


[I 2025-03-26 16:06:21,211] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0044328903491836335, 'weight_decay': 0.008, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.914,0.799496,0.7321,0.751224,0.732452,0.732837
2,0.7642,0.745757,0.743,0.760068,0.742295,0.745608
3,0.738,0.719475,0.7518,0.757969,0.751079,0.750797
4,0.7282,0.69763,0.7628,0.775044,0.763097,0.763162
5,0.7089,0.694364,0.7577,0.760998,0.757334,0.75771
6,0.6937,0.681558,0.7629,0.769608,0.76224,0.76338


[I 2025-03-26 16:11:26,207] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0017364896671898356, 'weight_decay': 0.004, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9021,0.740218,0.7431,0.762987,0.74286,0.74411
2,0.7105,0.701176,0.7563,0.76209,0.755862,0.75473
3,0.6799,0.730728,0.746,0.762344,0.74504,0.743793


[I 2025-03-26 16:13:58,929] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.003617134863992964, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8792,0.790582,0.729,0.752025,0.729436,0.729982
2,0.7431,0.738884,0.7455,0.758749,0.744836,0.746647
3,0.716,0.725056,0.7536,0.763063,0.752735,0.752301
4,0.707,0.68589,0.7653,0.77729,0.765521,0.765977
5,0.6901,0.682167,0.7613,0.764356,0.76105,0.761789
6,0.6778,0.676004,0.7644,0.772208,0.763676,0.765284


[I 2025-03-26 16:19:03,391] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.0011648325818640426, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9368,0.754515,0.7378,0.762344,0.737422,0.739204
2,0.7151,0.701518,0.7564,0.761563,0.755988,0.755472
3,0.6806,0.706972,0.7566,0.766016,0.755548,0.755635
4,0.6675,0.662622,0.7705,0.773955,0.770528,0.770237
5,0.6535,0.701987,0.7553,0.765806,0.755382,0.756793
6,0.6488,0.65749,0.773,0.774753,0.772689,0.772946
7,0.6361,0.666339,0.7705,0.769119,0.770431,0.767221
8,0.6295,0.668472,0.7662,0.77024,0.766001,0.764996
9,0.6284,0.678354,0.7615,0.766396,0.761244,0.761246
10,0.6208,0.660081,0.771,0.772255,0.770687,0.769733


[I 2025-03-26 16:27:31,028] Trial 32 finished with value: 0.7697325808976596 and parameters: {'learning_rate': 0.0011648325818640426, 'weight_decay': 0.007, 'warmup_steps': 9}. Best is trial 14 with value: 0.7717328120181975.


Trial 33 with params: {'learning_rate': 0.0014433808226704866, 'weight_decay': 0.008, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9276,0.749257,0.7381,0.762787,0.737741,0.739453
2,0.7116,0.70078,0.7569,0.762774,0.75651,0.75553
3,0.679,0.721634,0.7484,0.763599,0.747383,0.746964
4,0.6669,0.659904,0.7708,0.775104,0.770755,0.770662
5,0.6539,0.699651,0.7573,0.768069,0.757328,0.758744
6,0.649,0.657493,0.7707,0.77342,0.770342,0.770958
7,0.6353,0.665085,0.772,0.770984,0.771917,0.768891
8,0.6281,0.667975,0.7663,0.770017,0.766147,0.764916
9,0.6266,0.678578,0.7634,0.768429,0.763188,0.76306
10,0.6177,0.65814,0.7718,0.772899,0.771503,0.770483


[I 2025-03-26 16:35:51,959] Trial 33 finished with value: 0.7704828660712321 and parameters: {'learning_rate': 0.0014433808226704866, 'weight_decay': 0.008, 'warmup_steps': 17}. Best is trial 14 with value: 0.7717328120181975.


Trial 34 with params: {'learning_rate': 0.0018813451008186423, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8956,0.738489,0.7427,0.760863,0.742512,0.743662
2,0.7111,0.702071,0.7569,0.762753,0.756433,0.755353
3,0.6813,0.733337,0.746,0.762387,0.745057,0.743531


[I 2025-03-26 16:38:22,222] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.0024870786738035154, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8918,0.753518,0.7392,0.760381,0.73918,0.740392
2,0.7192,0.714215,0.7525,0.760341,0.751867,0.751294
3,0.6911,0.741899,0.7452,0.76198,0.744227,0.742648
4,0.6812,0.663119,0.7676,0.773639,0.767589,0.767882
5,0.6674,0.681108,0.7624,0.76646,0.762335,0.76266
6,0.6596,0.6676,0.7671,0.773888,0.766405,0.767925
7,0.6434,0.674099,0.7693,0.771141,0.769247,0.76637
8,0.6332,0.670114,0.7689,0.772775,0.768838,0.767873
9,0.6295,0.687131,0.7645,0.770773,0.764427,0.764192
10,0.6158,0.657087,0.7731,0.774339,0.77284,0.77187


[I 2025-03-26 16:46:50,385] Trial 35 finished with value: 0.7718702742260117 and parameters: {'learning_rate': 0.0024870786738035154, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}. Best is trial 35 with value: 0.7718702742260117.


Trial 36 with params: {'learning_rate': 0.002392204028682123, 'weight_decay': 0.009000000000000001, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8969,0.749511,0.7408,0.760563,0.740734,0.742002
2,0.7177,0.711846,0.7534,0.760809,0.752789,0.752189
3,0.6893,0.740718,0.7455,0.762171,0.744551,0.742875


[I 2025-03-26 16:49:25,483] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.004697477328165655, 'weight_decay': 0.008, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9015,0.804052,0.7311,0.749779,0.731384,0.731545
2,0.7707,0.747246,0.7426,0.759229,0.741855,0.745083
3,0.7443,0.724709,0.751,0.757822,0.750296,0.749989
4,0.7343,0.698779,0.7623,0.773638,0.762592,0.762525
5,0.7141,0.698122,0.757,0.760358,0.756651,0.756918
6,0.6985,0.683269,0.7624,0.768658,0.761804,0.762752


[I 2025-03-26 16:54:27,518] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0034917785382137696, 'weight_decay': 0.01, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.895,0.788309,0.7294,0.752653,0.72981,0.730443
2,0.7406,0.736254,0.7454,0.758336,0.74471,0.746286
3,0.7134,0.727658,0.7524,0.763004,0.75152,0.75092
4,0.7043,0.6834,0.7652,0.776652,0.76539,0.765863
5,0.6877,0.680739,0.7616,0.76454,0.761339,0.762091
6,0.6758,0.674923,0.7652,0.772795,0.764489,0.766082


[I 2025-03-26 16:59:34,264] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0015297605283018953, 'weight_decay': 0.01, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.93,0.74659,0.7403,0.763261,0.739984,0.741563
2,0.7114,0.701061,0.7562,0.762159,0.755798,0.754773
3,0.6792,0.725186,0.7473,0.76282,0.746293,0.745686
4,0.6673,0.659448,0.7704,0.774895,0.770359,0.77037
5,0.6546,0.698518,0.7575,0.767563,0.757522,0.758815
6,0.6495,0.658003,0.7707,0.773838,0.77032,0.771069
7,0.6355,0.665233,0.772,0.771336,0.771915,0.768978
8,0.6281,0.667944,0.7664,0.770041,0.766257,0.764987
9,0.6265,0.678916,0.7641,0.769202,0.763914,0.763779
10,0.6171,0.657775,0.7726,0.773669,0.772308,0.771282


[I 2025-03-26 17:07:59,713] Trial 39 finished with value: 0.7712824764852668 and parameters: {'learning_rate': 0.0015297605283018953, 'weight_decay': 0.01, 'warmup_steps': 21}. Best is trial 35 with value: 0.7718702742260117.


Trial 40 with params: {'learning_rate': 0.00016564383319058083, 'weight_decay': 0.01, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4974,1.050795,0.6966,0.705525,0.696393,0.695943
2,0.9444,0.877703,0.723,0.730416,0.721873,0.722778
3,0.8342,0.821606,0.7318,0.732383,0.730992,0.728596
4,0.7901,0.773966,0.743,0.744328,0.742984,0.741553
5,0.7622,0.789924,0.7351,0.741653,0.734917,0.735886
6,0.7474,0.748337,0.749,0.751271,0.748703,0.748946


[I 2025-03-26 17:12:59,841] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 6.459897452290429e-05, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8709,1.483549,0.637,0.643576,0.636736,0.634062
2,1.3077,1.168964,0.6876,0.691109,0.686597,0.685008
3,1.0882,1.028642,0.703,0.701037,0.702223,0.697839


[I 2025-03-26 17:15:30,247] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0015293979934935358, 'weight_decay': 0.01, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9256,0.746467,0.7402,0.763198,0.739876,0.741465
2,0.7112,0.700987,0.7562,0.762155,0.755798,0.754766
3,0.6791,0.725092,0.7474,0.762891,0.746391,0.745777
4,0.6672,0.659456,0.7704,0.774958,0.77036,0.770392
5,0.6545,0.698542,0.7571,0.767252,0.757121,0.758435
6,0.6495,0.657983,0.7706,0.77375,0.770222,0.770972
7,0.6355,0.665237,0.7721,0.77144,0.772018,0.769055
8,0.6281,0.667944,0.7664,0.770041,0.766257,0.764987
9,0.6265,0.678896,0.7642,0.769267,0.764012,0.763876
10,0.6171,0.657768,0.7726,0.773689,0.772308,0.771291


[I 2025-03-26 17:23:51,717] Trial 42 finished with value: 0.7712913949619931 and parameters: {'learning_rate': 0.0015293979934935358, 'weight_decay': 0.01, 'warmup_steps': 19}. Best is trial 35 with value: 0.7718702742260117.


Trial 43 with params: {'learning_rate': 0.0023078579771072935, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8854,0.745473,0.7414,0.760216,0.741329,0.74254
2,0.7159,0.709539,0.7536,0.760657,0.753031,0.752313
3,0.6876,0.739296,0.7451,0.76155,0.744171,0.742429


[I 2025-03-26 17:26:22,401] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 7.012112975444019e-05, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8673,1.454718,0.6409,0.647857,0.640629,0.637934
2,1.2759,1.13841,0.69,0.693754,0.689012,0.687717
3,1.0602,1.004403,0.7059,0.704128,0.705112,0.700891
4,0.9653,0.931303,0.7221,0.723827,0.722087,0.720569
5,0.9111,0.921421,0.7178,0.723886,0.7176,0.718481
6,0.8793,0.872666,0.7291,0.730231,0.728706,0.728396


[I 2025-03-26 17:31:23,880] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0008917967652515557, 'weight_decay': 0.01, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0003,0.762208,0.7385,0.758268,0.738115,0.739831
2,0.7252,0.710137,0.7552,0.760855,0.754725,0.754666
3,0.6868,0.696782,0.7603,0.764167,0.759298,0.759297
4,0.6718,0.669829,0.7695,0.772023,0.769532,0.768941
5,0.657,0.704113,0.7538,0.76353,0.753899,0.755302
6,0.652,0.660773,0.7723,0.773393,0.771995,0.772016
7,0.6401,0.67064,0.7708,0.769327,0.770735,0.767327
8,0.6339,0.6708,0.7659,0.770284,0.765704,0.764986
9,0.6329,0.680267,0.7618,0.766865,0.761507,0.761729
10,0.6267,0.664012,0.769,0.770436,0.768653,0.767759


[I 2025-03-26 17:39:49,801] Trial 45 finished with value: 0.7677592426191138 and parameters: {'learning_rate': 0.0008917967652515557, 'weight_decay': 0.01, 'warmup_steps': 18}. Best is trial 35 with value: 0.7718702742260117.


Trial 46 with params: {'learning_rate': 0.0004082087893799213, 'weight_decay': 0.007, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1904,0.838948,0.7226,0.73557,0.722253,0.723393
2,0.7855,0.760149,0.7403,0.748881,0.739303,0.740592
3,0.7278,0.732099,0.7503,0.751422,0.749499,0.748544
4,0.7047,0.699894,0.7606,0.761659,0.760665,0.759265
5,0.6867,0.726017,0.7479,0.755061,0.74788,0.749052
6,0.6793,0.686275,0.7661,0.767347,0.765854,0.765766


[I 2025-03-26 17:44:53,906] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0020905286457601835, 'weight_decay': 0.008, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9021,0.740447,0.7432,0.760318,0.743067,0.744132
2,0.7134,0.705188,0.7547,0.760805,0.754198,0.753181
3,0.6842,0.736387,0.7464,0.762954,0.745501,0.743734


[I 2025-03-26 17:47:26,206] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0019435684548239534, 'weight_decay': 0.01, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9053,0.738964,0.7433,0.760784,0.743128,0.744307
2,0.7119,0.702919,0.7565,0.762362,0.756044,0.75496
3,0.6822,0.734441,0.7464,0.762753,0.74547,0.743796
4,0.6714,0.65825,0.771,0.776173,0.770942,0.77129
5,0.6591,0.690813,0.7576,0.765068,0.757611,0.75834
6,0.6529,0.662401,0.7681,0.773259,0.767562,0.768825


[I 2025-03-26 17:52:28,785] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.002710922203881521, 'weight_decay': 0.01, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8845,0.764317,0.7363,0.760003,0.736413,0.737529
2,0.7231,0.719823,0.7508,0.759851,0.75011,0.749934
3,0.6955,0.743658,0.7452,0.761838,0.744241,0.742865
4,0.6859,0.666997,0.7672,0.774522,0.767175,0.767643
5,0.6714,0.678403,0.7635,0.76695,0.763391,0.763819
6,0.6628,0.668853,0.7668,0.773697,0.7661,0.767632
7,0.6461,0.676391,0.7693,0.771375,0.76926,0.766381
8,0.6351,0.671863,0.7695,0.773791,0.769446,0.768599
9,0.631,0.689695,0.7633,0.769886,0.763259,0.763062
10,0.6162,0.657404,0.7725,0.7738,0.772241,0.771292


[I 2025-03-26 18:02:08,366] Trial 49 finished with value: 0.7712924073388041 and parameters: {'learning_rate': 0.002710922203881521, 'weight_decay': 0.01, 'warmup_steps': 17}. Best is trial 35 with value: 0.7718702742260117.


Trial 50 with params: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8752,0.767252,0.7346,0.758863,0.734742,0.735825
2,0.7242,0.721476,0.7496,0.758934,0.748896,0.748764
3,0.6968,0.743704,0.745,0.761211,0.744035,0.742707
4,0.6873,0.668232,0.7668,0.774628,0.766788,0.767277
5,0.6726,0.677876,0.7631,0.76644,0.762985,0.763449
6,0.6638,0.669196,0.7663,0.773302,0.765599,0.767162
7,0.6469,0.677039,0.7693,0.771491,0.769251,0.766412
8,0.6356,0.672503,0.7691,0.773517,0.769057,0.768227
9,0.6314,0.690452,0.7629,0.769569,0.762869,0.762674
10,0.6163,0.657501,0.7725,0.773837,0.772242,0.771313


[I 2025-03-26 18:10:39,146] Trial 50 finished with value: 0.7713125261318431 and parameters: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 12}. Best is trial 35 with value: 0.7718702742260117.


Trial 51 with params: {'learning_rate': 0.004200876474093228, 'weight_decay': 0.002, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8808,0.799764,0.731,0.751641,0.731431,0.731984
2,0.7573,0.746636,0.7424,0.759264,0.741787,0.744975
3,0.7309,0.718516,0.7524,0.759003,0.751626,0.751386
4,0.7215,0.695597,0.7642,0.776744,0.76451,0.764613
5,0.703,0.690432,0.759,0.762156,0.75866,0.75916
6,0.6886,0.680208,0.7638,0.771087,0.763125,0.764477


[I 2025-03-26 18:15:43,185] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.003398215327741219, 'weight_decay': 0.001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8798,0.786487,0.7316,0.755405,0.731978,0.732602
2,0.738,0.734742,0.7462,0.758592,0.745497,0.746756
3,0.7108,0.730892,0.751,0.762649,0.750084,0.749383
4,0.7017,0.681223,0.7653,0.776708,0.765476,0.766033
5,0.6854,0.679575,0.7621,0.764948,0.761869,0.762625
6,0.674,0.673976,0.7655,0.773123,0.764777,0.766409


[I 2025-03-26 18:20:45,632] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.0023588946747287996, 'weight_decay': 0.0, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8678,0.746782,0.7406,0.759665,0.740558,0.741776
2,0.7162,0.710548,0.7534,0.760727,0.752836,0.752184
3,0.6882,0.739754,0.7452,0.761635,0.74426,0.742536


[I 2025-03-26 18:23:19,522] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.004712777272461839, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8946,0.805546,0.7314,0.750487,0.731696,0.731965
2,0.7709,0.74757,0.7427,0.759287,0.741942,0.745141
3,0.7444,0.725369,0.7508,0.757683,0.750101,0.749793
4,0.7345,0.698943,0.7625,0.773828,0.762795,0.762709
5,0.7142,0.698247,0.7571,0.760454,0.756752,0.757016
6,0.6986,0.683366,0.7623,0.768558,0.761706,0.762652


[I 2025-03-26 18:28:23,579] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.0008953750478722926, 'weight_decay': 0.0, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9881,0.761361,0.7388,0.758555,0.738425,0.740165
2,0.7247,0.709872,0.7552,0.760795,0.754725,0.754662
3,0.6865,0.696692,0.7601,0.763987,0.75909,0.759088
4,0.6716,0.669686,0.7695,0.772024,0.769531,0.768939
5,0.6569,0.704043,0.754,0.763753,0.754101,0.755513
6,0.6519,0.660681,0.7721,0.773167,0.771794,0.771811
7,0.64,0.670537,0.7709,0.769407,0.770833,0.76744
8,0.6338,0.67074,0.7659,0.770336,0.765703,0.764998
9,0.6328,0.680212,0.7619,0.766965,0.761608,0.761828
10,0.6266,0.663939,0.7692,0.77061,0.768868,0.767952


[I 2025-03-26 18:36:50,242] Trial 55 finished with value: 0.7679522340774718 and parameters: {'learning_rate': 0.0008953750478722926, 'weight_decay': 0.0, 'warmup_steps': 13}. Best is trial 35 with value: 0.7718702742260117.


Trial 56 with params: {'learning_rate': 0.0037947803964906214, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8907,0.793063,0.7284,0.750643,0.72885,0.729389
2,0.7477,0.741838,0.7447,0.758995,0.744054,0.746335
3,0.7208,0.720983,0.7548,0.763121,0.753986,0.753811
4,0.7116,0.689405,0.7653,0.777614,0.765572,0.765879
5,0.6943,0.684681,0.7604,0.763376,0.760113,0.760787
6,0.6812,0.677526,0.7639,0.771703,0.76318,0.764717


[I 2025-03-26 18:41:44,570] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00280967498177828, 'weight_decay': 0.01, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8841,0.768905,0.7351,0.759424,0.735266,0.736313
2,0.7251,0.722244,0.7495,0.759058,0.748793,0.748792
3,0.6976,0.743555,0.7453,0.761327,0.744338,0.743003


[I 2025-03-26 18:44:14,577] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0026985540603563524, 'weight_decay': 0.005, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8952,0.764337,0.7369,0.760605,0.737004,0.738112
2,0.7233,0.719725,0.7503,0.759338,0.749607,0.749432
3,0.6955,0.743675,0.7455,0.762289,0.744545,0.743163


[I 2025-03-26 18:46:48,669] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.00212455012340745, 'weight_decay': 0.001, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8933,0.740644,0.7428,0.760066,0.742678,0.74377
2,0.7135,0.70567,0.7548,0.760981,0.754302,0.753294
3,0.6846,0.736719,0.7463,0.76281,0.745412,0.743576
4,0.6742,0.658907,0.7685,0.774003,0.768468,0.768831
5,0.6615,0.687264,0.7575,0.76373,0.757499,0.757997
6,0.6549,0.664501,0.7663,0.772229,0.765678,0.767047
7,0.6395,0.670115,0.7711,0.772285,0.771018,0.768116
8,0.6304,0.668463,0.7681,0.771987,0.768034,0.766888
9,0.6275,0.683281,0.7653,0.770955,0.765167,0.764951
10,0.6155,0.656858,0.7725,0.773612,0.772236,0.771271


[I 2025-03-26 18:55:07,816] Trial 59 finished with value: 0.771270937684686 and parameters: {'learning_rate': 0.00212455012340745, 'weight_decay': 0.001, 'warmup_steps': 17}. Best is trial 35 with value: 0.7718702742260117.


Trial 60 with params: {'learning_rate': 0.0011700191952905836, 'weight_decay': 0.003, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9873,0.756734,0.7371,0.761939,0.736727,0.7385
2,0.7167,0.702001,0.7565,0.761757,0.756096,0.755511
3,0.6814,0.708207,0.7558,0.765679,0.754727,0.754769
4,0.6679,0.66257,0.7699,0.773314,0.769924,0.769658
5,0.6539,0.70209,0.7556,0.766105,0.755678,0.757071
6,0.6491,0.657615,0.7731,0.774939,0.772793,0.773082
7,0.6363,0.66637,0.771,0.7696,0.770922,0.767759
8,0.6296,0.668531,0.7664,0.770531,0.7662,0.76524
9,0.6284,0.678368,0.7614,0.76626,0.761138,0.761139
10,0.6208,0.660073,0.7713,0.772534,0.770987,0.770013


[I 2025-03-26 19:03:30,665] Trial 60 finished with value: 0.7700125878698036 and parameters: {'learning_rate': 0.0011700191952905836, 'weight_decay': 0.003, 'warmup_steps': 32}. Best is trial 35 with value: 0.7718702742260117.


Trial 61 with params: {'learning_rate': 0.001407621876176079, 'weight_decay': 0.01, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9237,0.750141,0.7382,0.763435,0.737836,0.739514
2,0.7118,0.700607,0.7571,0.76306,0.756705,0.75583
3,0.679,0.719953,0.7495,0.764213,0.748481,0.748129
4,0.6668,0.660129,0.7711,0.775382,0.771085,0.770974
5,0.6537,0.699998,0.7569,0.767585,0.756932,0.758347
6,0.6488,0.65733,0.7709,0.773593,0.770558,0.771156
7,0.6353,0.665088,0.772,0.770959,0.771929,0.768883
8,0.6282,0.668,0.7663,0.770068,0.766139,0.764943
9,0.6268,0.678459,0.763,0.767997,0.762791,0.762661
10,0.618,0.65831,0.7716,0.772688,0.771303,0.770271


[I 2025-03-26 19:11:52,524] Trial 61 finished with value: 0.7702714623142419 and parameters: {'learning_rate': 0.001407621876176079, 'weight_decay': 0.01, 'warmup_steps': 14}. Best is trial 35 with value: 0.7718702742260117.


Trial 62 with params: {'learning_rate': 0.003272614457975996, 'weight_decay': 0.01, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.887,0.783765,0.7316,0.756382,0.731946,0.732611
2,0.7353,0.732154,0.7464,0.758273,0.745697,0.746611
3,0.708,0.734311,0.7486,0.761058,0.747652,0.746787
4,0.6988,0.678606,0.7653,0.776281,0.765453,0.766014
5,0.6828,0.678413,0.7623,0.765217,0.762091,0.762832
6,0.6719,0.672851,0.7662,0.773818,0.765455,0.767133


[I 2025-03-26 19:16:55,497] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0021843146271965205, 'weight_decay': 0.007, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8975,0.742231,0.7434,0.761102,0.743291,0.744396
2,0.7145,0.706942,0.7535,0.760041,0.752985,0.752099
3,0.6856,0.737637,0.746,0.762461,0.745083,0.743257
4,0.6753,0.659353,0.7682,0.773878,0.768154,0.768546
5,0.6625,0.686121,0.7584,0.764259,0.758384,0.758833
6,0.6557,0.665169,0.7668,0.772878,0.766165,0.76755
7,0.6401,0.670824,0.7706,0.771881,0.770516,0.767625
8,0.6308,0.668644,0.7685,0.772357,0.768443,0.767339
9,0.6278,0.683879,0.765,0.77075,0.764868,0.764672
10,0.6155,0.656871,0.7723,0.773448,0.772045,0.771076


[I 2025-03-26 19:25:17,656] Trial 63 finished with value: 0.7710762729812896 and parameters: {'learning_rate': 0.0021843146271965205, 'weight_decay': 0.007, 'warmup_steps': 20}. Best is trial 35 with value: 0.7718702742260117.


Trial 64 with params: {'learning_rate': 0.00011912397327149118, 'weight_decay': 0.006, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6371,1.182042,0.682,0.690385,0.681769,0.6807
2,1.0458,0.953981,0.7128,0.71922,0.711678,0.711992
3,0.9005,0.874974,0.7236,0.723434,0.722827,0.719616


[I 2025-03-26 19:27:49,006] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.000241251747353242, 'weight_decay': 0.009000000000000001, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3841,0.945512,0.7065,0.716628,0.706217,0.706311
2,0.864,0.817671,0.7329,0.741128,0.731761,0.732971
3,0.7803,0.777113,0.7412,0.742267,0.740407,0.738695
4,0.747,0.735775,0.754,0.75485,0.754019,0.752669
5,0.7243,0.757581,0.7416,0.748149,0.741459,0.742519
6,0.7133,0.716893,0.7575,0.759465,0.757252,0.757337


[I 2025-03-26 19:32:48,812] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0012111570754252796, 'weight_decay': 0.009000000000000001, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.966,0.755402,0.7374,0.762823,0.737012,0.738802
2,0.7153,0.701325,0.7578,0.763151,0.757415,0.756747
3,0.6806,0.710223,0.7548,0.765359,0.753723,0.753685
4,0.6675,0.661928,0.7704,0.773991,0.770416,0.770185
5,0.6536,0.701765,0.7554,0.766075,0.755478,0.756849
6,0.6489,0.657389,0.7735,0.775442,0.773196,0.77352
7,0.6359,0.665986,0.7712,0.769898,0.771124,0.768015
8,0.6292,0.668394,0.7661,0.770073,0.765909,0.764865
9,0.628,0.678313,0.7619,0.766843,0.761651,0.761646
10,0.6202,0.65967,0.7714,0.772613,0.771086,0.770088


[I 2025-03-26 19:41:08,000] Trial 66 finished with value: 0.7700876227588227 and parameters: {'learning_rate': 0.0012111570754252796, 'weight_decay': 0.009000000000000001, 'warmup_steps': 24}. Best is trial 35 with value: 0.7718702742260117.


Trial 67 with params: {'learning_rate': 0.0011019797105359123, 'weight_decay': 0.008, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.966,0.756346,0.7374,0.760957,0.737029,0.738643
2,0.7173,0.702981,0.7561,0.761244,0.755664,0.755261
3,0.6819,0.704031,0.758,0.766126,0.756898,0.756997
4,0.6683,0.663795,0.7695,0.772746,0.769535,0.769214
5,0.654,0.702481,0.7554,0.765965,0.755475,0.756953
6,0.6493,0.657951,0.7725,0.774066,0.77218,0.772389
7,0.6367,0.667087,0.7704,0.768899,0.770329,0.767052
8,0.6302,0.668813,0.7656,0.769687,0.765401,0.764435
9,0.6291,0.678574,0.7618,0.76672,0.761527,0.761575
10,0.6219,0.660781,0.7705,0.77185,0.770192,0.769269


[I 2025-03-26 19:49:24,805] Trial 67 finished with value: 0.7692688539508581 and parameters: {'learning_rate': 0.0011019797105359123, 'weight_decay': 0.008, 'warmup_steps': 18}. Best is trial 35 with value: 0.7718702742260117.


Trial 68 with params: {'learning_rate': 0.0013782701634688654, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9106,0.750541,0.7381,0.763485,0.737734,0.739363
2,0.7116,0.700283,0.7569,0.7627,0.756513,0.755675
3,0.6789,0.718204,0.7503,0.76428,0.749251,0.748911
4,0.6666,0.660327,0.771,0.775226,0.770999,0.770848
5,0.6535,0.70025,0.7569,0.767584,0.756937,0.758345
6,0.6487,0.657202,0.7709,0.77349,0.770556,0.771141
7,0.6352,0.665097,0.7719,0.770838,0.771821,0.768779
8,0.6282,0.667998,0.7665,0.770294,0.766334,0.765161
9,0.6268,0.678346,0.7624,0.767323,0.762177,0.762075
10,0.6183,0.658458,0.7717,0.77287,0.771391,0.770363


[I 2025-03-26 19:57:46,698] Trial 68 finished with value: 0.7703627841528011 and parameters: {'learning_rate': 0.0013782701634688654, 'weight_decay': 0.01, 'warmup_steps': 7}. Best is trial 35 with value: 0.7718702742260117.


Trial 69 with params: {'learning_rate': 0.0025154565050500215, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8787,0.754113,0.7383,0.759729,0.738304,0.739555
2,0.7193,0.714679,0.7523,0.760174,0.751655,0.751099
3,0.6914,0.742119,0.7447,0.76141,0.743735,0.742108


[I 2025-03-26 20:00:15,574] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0016122572632906974, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.909,0.743379,0.7415,0.762979,0.741221,0.74271
2,0.7106,0.700932,0.7561,0.762068,0.75568,0.754567
3,0.6792,0.727625,0.7464,0.762375,0.74539,0.744455
4,0.6676,0.659096,0.7707,0.77564,0.770666,0.770831
5,0.6552,0.697237,0.7567,0.766405,0.756731,0.757935
6,0.6499,0.658571,0.7703,0.773599,0.769883,0.770682
7,0.6357,0.665494,0.7715,0.771306,0.771431,0.768459
8,0.6281,0.667892,0.7671,0.770735,0.766979,0.765728
9,0.6264,0.679287,0.7641,0.769207,0.763919,0.763724
10,0.6167,0.657472,0.7728,0.773869,0.772515,0.771508


[I 2025-03-26 20:08:35,759] Trial 70 finished with value: 0.7715082531452764 and parameters: {'learning_rate': 0.0016122572632906974, 'weight_decay': 0.0, 'warmup_steps': 14}. Best is trial 35 with value: 0.7718702742260117.


Trial 71 with params: {'learning_rate': 0.001168312729167294, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9687,0.75592,0.7372,0.761775,0.736821,0.738592
2,0.7161,0.701828,0.7565,0.761709,0.756096,0.755504
3,0.6811,0.707792,0.7563,0.765967,0.755237,0.755286
4,0.6677,0.662578,0.7698,0.773232,0.769823,0.769548
5,0.6537,0.702072,0.7556,0.766148,0.755683,0.7571
6,0.649,0.657568,0.7732,0.775007,0.772892,0.773165
7,0.6362,0.666365,0.7708,0.769352,0.770729,0.767515
8,0.6296,0.668531,0.7662,0.770331,0.766001,0.76503
9,0.6284,0.678388,0.7614,0.766236,0.761138,0.761136
10,0.6208,0.660071,0.771,0.772208,0.770688,0.769718


[I 2025-03-26 20:16:59,322] Trial 71 finished with value: 0.7697182035247658 and parameters: {'learning_rate': 0.001168312729167294, 'weight_decay': 0.0, 'warmup_steps': 23}. Best is trial 35 with value: 0.7718702742260117.


Trial 72 with params: {'learning_rate': 0.0015431597769652691, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9157,0.745691,0.7404,0.763257,0.740073,0.741674
2,0.7109,0.700904,0.7561,0.762142,0.75569,0.754656
3,0.679,0.725422,0.7473,0.762814,0.746289,0.745632
4,0.6672,0.659399,0.7703,0.775044,0.770259,0.770334
5,0.6546,0.698327,0.7571,0.767257,0.757123,0.758435
6,0.6495,0.658036,0.7707,0.773842,0.770312,0.771058
7,0.6355,0.665233,0.7718,0.77127,0.771718,0.768766
8,0.6281,0.667908,0.7664,0.76994,0.766262,0.764962
9,0.6264,0.678953,0.7642,0.76928,0.764016,0.763866
10,0.617,0.65771,0.7726,0.773668,0.772307,0.771287


[I 2025-03-26 20:25:22,240] Trial 72 finished with value: 0.7712872312669788 and parameters: {'learning_rate': 0.0015431597769652691, 'weight_decay': 0.0, 'warmup_steps': 15}. Best is trial 35 with value: 0.7718702742260117.


Trial 73 with params: {'learning_rate': 0.0010064665948024652, 'weight_decay': 0.002, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9543,0.756814,0.7387,0.759914,0.738315,0.73988
2,0.7195,0.705394,0.757,0.76238,0.756561,0.756425
3,0.6833,0.699182,0.7592,0.764911,0.75816,0.758262
4,0.6693,0.666119,0.769,0.772019,0.769033,0.768605
5,0.6549,0.703038,0.7538,0.763906,0.753908,0.755281
6,0.6501,0.658859,0.7725,0.773833,0.772193,0.772319
7,0.6379,0.668374,0.7719,0.770416,0.771822,0.768527
8,0.6315,0.669468,0.7657,0.77008,0.765493,0.764706
9,0.6305,0.679081,0.7616,0.766569,0.761329,0.761448
10,0.6237,0.661993,0.77,0.771503,0.769656,0.768785


[I 2025-03-26 20:33:49,284] Trial 73 finished with value: 0.7687846609773211 and parameters: {'learning_rate': 0.0010064665948024652, 'weight_decay': 0.002, 'warmup_steps': 7}. Best is trial 35 with value: 0.7718702742260117.


Trial 74 with params: {'learning_rate': 0.0016920438049043997, 'weight_decay': 0.01, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9087,0.741347,0.7427,0.763278,0.742455,0.743815
2,0.7106,0.701111,0.7561,0.762118,0.755676,0.754546
3,0.6796,0.729809,0.7459,0.762173,0.744925,0.743784


[I 2025-03-26 20:36:19,757] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0024260250980723008, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8779,0.749878,0.7397,0.759739,0.739674,0.740955
2,0.7177,0.712363,0.7528,0.760365,0.752191,0.751577
3,0.6896,0.740895,0.7451,0.761549,0.744155,0.742424
4,0.6797,0.662169,0.768,0.773722,0.767986,0.768255
5,0.6662,0.682118,0.761,0.765352,0.760952,0.761275
6,0.6587,0.667149,0.767,0.773691,0.766318,0.767829
7,0.6426,0.673395,0.7698,0.771504,0.769737,0.766859
8,0.6326,0.669687,0.7689,0.772714,0.768843,0.767869
9,0.6291,0.686437,0.7645,0.770651,0.764426,0.764216
10,0.6157,0.657008,0.7728,0.773997,0.772542,0.771567


[I 2025-03-26 20:44:44,898] Trial 75 finished with value: 0.7715670339123073 and parameters: {'learning_rate': 0.0024260250980723008, 'weight_decay': 0.0, 'warmup_steps': 12}. Best is trial 35 with value: 0.7718702742260117.


Trial 76 with params: {'learning_rate': 0.003726943900346767, 'weight_decay': 0.0, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8914,0.792092,0.7295,0.752068,0.729931,0.730493
2,0.7461,0.74066,0.7444,0.758689,0.743726,0.745947
3,0.7191,0.722237,0.7548,0.763389,0.753972,0.753687
4,0.71,0.68818,0.7654,0.777574,0.765638,0.765999
5,0.6928,0.683741,0.7603,0.763336,0.760029,0.760749
6,0.6799,0.676973,0.7637,0.771557,0.762978,0.764543


[I 2025-03-26 20:49:46,589] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0025367696014408194, 'weight_decay': 0.0, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8785,0.755173,0.738,0.759677,0.738015,0.739208
2,0.7196,0.715242,0.7524,0.760547,0.75176,0.751259
3,0.6918,0.742372,0.7448,0.761549,0.743831,0.742248
4,0.682,0.663904,0.7682,0.774591,0.768182,0.768534
5,0.6681,0.68048,0.7632,0.767131,0.76312,0.763456
6,0.6602,0.667841,0.7672,0.774,0.76651,0.768036
7,0.6439,0.674578,0.769,0.770878,0.768955,0.766052
8,0.6335,0.670399,0.769,0.772958,0.76893,0.767999
9,0.6298,0.68767,0.7641,0.770452,0.764031,0.763791
10,0.6159,0.657158,0.7727,0.773949,0.772445,0.771471


[I 2025-03-26 20:58:08,494] Trial 77 finished with value: 0.7714708418174039 and parameters: {'learning_rate': 0.0025367696014408194, 'weight_decay': 0.0, 'warmup_steps': 13}. Best is trial 35 with value: 0.7718702742260117.


Trial 78 with params: {'learning_rate': 0.0019835589322887252, 'weight_decay': 0.0, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.886,0.738424,0.743,0.760426,0.742845,0.743997
2,0.7117,0.703168,0.7559,0.761705,0.755437,0.75433
3,0.6824,0.734684,0.7455,0.76207,0.744589,0.742823
4,0.6718,0.658382,0.7706,0.776,0.770539,0.770969
5,0.6594,0.690056,0.7577,0.764992,0.757713,0.758426
6,0.6532,0.662777,0.7666,0.771968,0.76604,0.767339
7,0.6381,0.668543,0.772,0.772943,0.771914,0.769065
8,0.6295,0.668099,0.7686,0.77227,0.768555,0.767254
9,0.627,0.681937,0.7652,0.770709,0.765042,0.764872
10,0.6156,0.656873,0.7727,0.773845,0.77243,0.77148


[I 2025-03-26 21:06:33,267] Trial 78 finished with value: 0.7714803555832026 and parameters: {'learning_rate': 0.0019835589322887252, 'weight_decay': 0.0, 'warmup_steps': 11}. Best is trial 35 with value: 0.7718702742260117.


Trial 79 with params: {'learning_rate': 0.0017079739195970326, 'weight_decay': 0.001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8952,0.74062,0.7429,0.763265,0.742638,0.744012
2,0.7103,0.700973,0.756,0.761872,0.755565,0.754429
3,0.6796,0.729928,0.7458,0.76223,0.744823,0.743654


[I 2025-03-26 21:09:08,963] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0016813188772311783, 'weight_decay': 0.0, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8987,0.741288,0.7431,0.763584,0.742836,0.744236
2,0.7103,0.700942,0.7561,0.762065,0.755669,0.754511
3,0.6794,0.729356,0.7459,0.76205,0.744921,0.743788
4,0.6681,0.658849,0.7709,0.775981,0.770858,0.771101
5,0.6558,0.69602,0.7565,0.76584,0.756528,0.75765
6,0.6504,0.659207,0.7697,0.773346,0.769257,0.770166
7,0.636,0.665875,0.7718,0.771931,0.77173,0.76882
8,0.6282,0.667883,0.767,0.770574,0.766871,0.765641
9,0.6263,0.679675,0.7645,0.769676,0.76431,0.764207
10,0.6163,0.6573,0.7723,0.773386,0.772011,0.771014


[I 2025-03-26 21:17:35,632] Trial 80 finished with value: 0.7710138745172611 and parameters: {'learning_rate': 0.0016813188772311783, 'weight_decay': 0.0, 'warmup_steps': 11}. Best is trial 35 with value: 0.7718702742260117.


Trial 81 with params: {'learning_rate': 0.0034858921849936974, 'weight_decay': 0.0, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8725,0.788614,0.73,0.75339,0.730404,0.730973
2,0.7398,0.73636,0.7455,0.758138,0.744806,0.746261
3,0.7127,0.728675,0.7514,0.761981,0.750516,0.749838
4,0.7037,0.683041,0.765,0.776468,0.765182,0.765685
5,0.6871,0.680499,0.7621,0.765006,0.761852,0.762613
6,0.6754,0.674786,0.7653,0.772942,0.764573,0.766185


[I 2025-03-26 21:22:40,076] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0019880648598790095, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8937,0.738679,0.7435,0.76085,0.743328,0.744487
2,0.712,0.703364,0.7557,0.761637,0.755236,0.754103
3,0.6826,0.734841,0.7457,0.762071,0.744786,0.743015
4,0.6719,0.658366,0.7706,0.775958,0.770539,0.770935
5,0.6596,0.689922,0.7577,0.764964,0.757711,0.758415
6,0.6533,0.662872,0.7669,0.772309,0.766336,0.767643
7,0.6382,0.668625,0.772,0.772942,0.771915,0.76906
8,0.6295,0.668133,0.7685,0.772156,0.76845,0.767151
9,0.627,0.682004,0.7651,0.770644,0.764944,0.764771
10,0.6156,0.656882,0.7728,0.773913,0.772536,0.771564


[I 2025-03-26 21:31:04,054] Trial 82 finished with value: 0.771564083473678 and parameters: {'learning_rate': 0.0019880648598790095, 'weight_decay': 0.0, 'warmup_steps': 15}. Best is trial 35 with value: 0.7718702742260117.


Trial 83 with params: {'learning_rate': 0.0013456594474062285, 'weight_decay': 0.002, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9315,0.751962,0.7378,0.763391,0.737425,0.739173
2,0.7125,0.700562,0.7568,0.76273,0.756422,0.755577
3,0.6793,0.717038,0.7514,0.764983,0.750369,0.750101
4,0.6668,0.66054,0.7709,0.775152,0.77089,0.770747
5,0.6535,0.700638,0.7572,0.767927,0.757237,0.758653
6,0.6487,0.657194,0.7715,0.774039,0.771162,0.771713
7,0.6353,0.665231,0.7724,0.77134,0.772321,0.769271
8,0.6284,0.668071,0.7665,0.770321,0.766334,0.765182
9,0.627,0.678335,0.7626,0.767522,0.762375,0.762289
10,0.6186,0.658677,0.7715,0.772717,0.771186,0.770177


[I 2025-03-26 21:39:31,524] Trial 83 finished with value: 0.7701768359532906 and parameters: {'learning_rate': 0.0013456594474062285, 'weight_decay': 0.002, 'warmup_steps': 15}. Best is trial 35 with value: 0.7718702742260117.


Trial 84 with params: {'learning_rate': 0.002416387654306804, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8836,0.749748,0.7397,0.759671,0.739657,0.740935
2,0.7177,0.712231,0.7528,0.760381,0.752189,0.751588
3,0.6895,0.740859,0.7451,0.761638,0.744156,0.742444
4,0.6796,0.662047,0.7679,0.773614,0.767874,0.768184
5,0.6661,0.682216,0.7609,0.765343,0.76085,0.761198
6,0.6586,0.667105,0.7669,0.773636,0.766218,0.767737
7,0.6425,0.673322,0.77,0.771683,0.769936,0.767057
8,0.6325,0.66965,0.7686,0.772451,0.768546,0.767585
9,0.629,0.686354,0.7647,0.770841,0.764617,0.764404
10,0.6157,0.657022,0.7729,0.774109,0.772642,0.771676


[I 2025-03-26 21:48:00,646] Trial 84 finished with value: 0.7716759615526805 and parameters: {'learning_rate': 0.002416387654306804, 'weight_decay': 0.0, 'warmup_steps': 15}. Best is trial 35 with value: 0.7718702742260117.


Trial 85 with params: {'learning_rate': 0.0016741696814498851, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9075,0.741751,0.7431,0.764054,0.742847,0.744257
2,0.7106,0.701053,0.7558,0.761759,0.755369,0.754214
3,0.6795,0.729318,0.746,0.762146,0.745014,0.743907


[I 2025-03-26 21:50:32,367] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0004769496281028948, 'weight_decay': 0.0, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1404,0.816489,0.7269,0.740854,0.726497,0.727933
2,0.7691,0.747,0.743,0.751116,0.742071,0.743264
3,0.7164,0.72152,0.7532,0.754242,0.752398,0.751609
4,0.6954,0.692528,0.7614,0.762684,0.761479,0.760162
5,0.6785,0.719109,0.7502,0.75749,0.750198,0.751385
6,0.6718,0.679479,0.7683,0.769438,0.768042,0.767932
7,0.6609,0.688482,0.7646,0.763706,0.76451,0.761353
8,0.6542,0.683437,0.7639,0.769056,0.763674,0.763252
9,0.6531,0.69338,0.7585,0.763999,0.758197,0.758714
10,0.6491,0.680891,0.766,0.76814,0.765603,0.764943


[I 2025-03-26 21:58:55,754] Trial 86 finished with value: 0.7649430733065191 and parameters: {'learning_rate': 0.0004769496281028948, 'weight_decay': 0.0, 'warmup_steps': 19}. Best is trial 35 with value: 0.7718702742260117.


Trial 87 with params: {'learning_rate': 0.0015741907195455007, 'weight_decay': 0.0, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.918,0.744821,0.7416,0.763684,0.741288,0.742789
2,0.7109,0.701018,0.7562,0.762423,0.755799,0.754773
3,0.6791,0.726557,0.7474,0.763265,0.746388,0.745652
4,0.6674,0.659253,0.7707,0.775504,0.770664,0.770772
5,0.6549,0.697866,0.7571,0.767017,0.75712,0.75839
6,0.6497,0.658279,0.7705,0.773691,0.770097,0.770862
7,0.6356,0.66536,0.7717,0.771322,0.771628,0.768699
8,0.6281,0.667928,0.7666,0.770229,0.766463,0.765204
9,0.6264,0.679102,0.7642,0.769272,0.764012,0.763872
10,0.6169,0.657604,0.7728,0.773891,0.772515,0.771509


[I 2025-03-26 22:07:16,649] Trial 87 finished with value: 0.7715091086473789 and parameters: {'learning_rate': 0.0015741907195455007, 'weight_decay': 0.0, 'warmup_steps': 17}. Best is trial 35 with value: 0.7718702742260117.


Trial 88 with params: {'learning_rate': 0.0014610308731877058, 'weight_decay': 0.0, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9239,0.748605,0.7382,0.762579,0.737847,0.739569
2,0.7114,0.700783,0.7566,0.76251,0.756205,0.755237
3,0.679,0.722381,0.7483,0.763623,0.74728,0.746828
4,0.6669,0.659816,0.7706,0.775057,0.770553,0.770526
5,0.654,0.699414,0.7574,0.768048,0.75743,0.758816
6,0.6491,0.657544,0.7708,0.773597,0.770434,0.771085
7,0.6353,0.665091,0.7722,0.771239,0.772114,0.769114
8,0.6281,0.667953,0.7661,0.769779,0.765957,0.764679
9,0.6266,0.678624,0.7635,0.768588,0.763296,0.763173
10,0.6176,0.658046,0.7718,0.77285,0.771506,0.770472


[I 2025-03-26 22:15:35,297] Trial 88 finished with value: 0.7704719754199718 and parameters: {'learning_rate': 0.0014610308731877058, 'weight_decay': 0.0, 'warmup_steps': 16}. Best is trial 35 with value: 0.7718702742260117.


Trial 89 with params: {'learning_rate': 0.002170102786443416, 'weight_decay': 0.0, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.894,0.741686,0.7432,0.760762,0.743095,0.744162
2,0.7142,0.706571,0.7541,0.760492,0.75359,0.752661
3,0.6853,0.737368,0.7461,0.762687,0.74519,0.743357
4,0.675,0.659233,0.768,0.773753,0.767952,0.768351
5,0.6622,0.686402,0.7582,0.764107,0.758188,0.758643
6,0.6555,0.664993,0.7666,0.772655,0.765973,0.76736


[I 2025-03-26 22:20:38,161] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0002862248799922069, 'weight_decay': 0.001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2772,0.899122,0.7134,0.724062,0.71307,0.71355
2,0.8308,0.794641,0.7361,0.744327,0.735014,0.736246
3,0.7595,0.760077,0.7449,0.745878,0.74411,0.74262
4,0.7307,0.721709,0.7577,0.758469,0.75777,0.75634
5,0.71,0.745659,0.7439,0.75064,0.743803,0.744906
6,0.7005,0.70533,0.7608,0.762416,0.760568,0.760544


[I 2025-03-26 22:25:38,980] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.004109682586407891, 'weight_decay': 0.003, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8904,0.797098,0.7304,0.750998,0.730824,0.731338
2,0.7554,0.746064,0.7428,0.759627,0.742187,0.745366
3,0.7288,0.718135,0.7531,0.759837,0.752321,0.752139
4,0.7194,0.694419,0.7636,0.776182,0.763909,0.764007
5,0.7013,0.689186,0.7598,0.762982,0.759466,0.760024
6,0.687,0.679675,0.7638,0.771442,0.763095,0.764544


[I 2025-03-26 22:30:38,496] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0013580763387376013, 'weight_decay': 0.0, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9166,0.751165,0.738,0.76342,0.737629,0.739296
2,0.7119,0.700336,0.7571,0.762931,0.756715,0.755922
3,0.679,0.717357,0.7506,0.764285,0.749566,0.749252
4,0.6667,0.660474,0.7708,0.775006,0.770792,0.770629
5,0.6534,0.70046,0.7568,0.767467,0.756844,0.758241
6,0.6487,0.657172,0.7713,0.773869,0.770953,0.77152
7,0.6352,0.665157,0.7725,0.771461,0.772425,0.769383
8,0.6283,0.668041,0.7665,0.770329,0.766333,0.765179
9,0.6269,0.678337,0.7626,0.767522,0.762375,0.762289
10,0.6185,0.658573,0.7717,0.77287,0.771389,0.770372


[I 2025-03-26 22:38:56,093] Trial 92 finished with value: 0.7703716969956951 and parameters: {'learning_rate': 0.0013580763387376013, 'weight_decay': 0.0, 'warmup_steps': 9}. Best is trial 35 with value: 0.7718702742260117.


Trial 93 with params: {'learning_rate': 0.0028270655217701316, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8787,0.769462,0.7348,0.759355,0.73498,0.735988
2,0.7253,0.722598,0.7492,0.758749,0.748482,0.748501
3,0.6979,0.743498,0.7455,0.761517,0.744535,0.743201


[I 2025-03-26 22:41:26,545] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0007958687580908783, 'weight_decay': 0.002, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0256,0.768064,0.7353,0.754311,0.734893,0.736754
2,0.7311,0.71498,0.7536,0.759618,0.753071,0.753218
3,0.6905,0.697377,0.7606,0.76333,0.759649,0.759571
4,0.6746,0.673626,0.769,0.77148,0.769054,0.768367
5,0.6596,0.705368,0.753,0.761788,0.753095,0.754479
6,0.6544,0.663071,0.7718,0.772696,0.771504,0.771471
7,0.6427,0.673126,0.7699,0.768425,0.769806,0.766445
8,0.6365,0.672402,0.7661,0.77079,0.765893,0.765312
9,0.6355,0.681748,0.7616,0.766796,0.7613,0.761536
10,0.6298,0.666238,0.7685,0.769955,0.768146,0.767265


[I 2025-03-26 22:49:47,166] Trial 94 finished with value: 0.7672652637501701 and parameters: {'learning_rate': 0.0007958687580908783, 'weight_decay': 0.002, 'warmup_steps': 20}. Best is trial 35 with value: 0.7718702742260117.


Trial 95 with params: {'learning_rate': 0.0018779933920771155, 'weight_decay': 0.002, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8998,0.738674,0.7425,0.76072,0.742316,0.743493
2,0.7112,0.702103,0.7568,0.762662,0.756335,0.755265
3,0.6813,0.733414,0.7462,0.762565,0.745257,0.743775


[I 2025-03-26 22:52:16,797] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.0024570395962088356, 'weight_decay': 0.0, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8904,0.751987,0.7396,0.760009,0.739566,0.740662
2,0.7186,0.713394,0.752,0.75966,0.751368,0.750767
3,0.6904,0.741541,0.7456,0.762312,0.744644,0.742988
4,0.6805,0.662664,0.7679,0.773875,0.767891,0.768185
5,0.6669,0.681564,0.7616,0.765843,0.761537,0.761874
6,0.6592,0.667383,0.7672,0.773911,0.766519,0.768031
7,0.6431,0.673793,0.7698,0.771546,0.769743,0.76687
8,0.6329,0.669923,0.7689,0.772708,0.768837,0.767862
9,0.6293,0.686828,0.7645,0.77072,0.764426,0.764221
10,0.6158,0.657066,0.773,0.774224,0.772741,0.771774


[I 2025-03-26 23:00:31,400] Trial 96 finished with value: 0.7717736785826419 and parameters: {'learning_rate': 0.0024570395962088356, 'weight_decay': 0.0, 'warmup_steps': 19}. Best is trial 35 with value: 0.7718702742260117.


Trial 97 with params: {'learning_rate': 0.0016725497233777782, 'weight_decay': 0.0, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9228,0.742254,0.7432,0.764045,0.742947,0.744383
2,0.711,0.701202,0.7558,0.761874,0.75537,0.754283
3,0.6797,0.729509,0.7456,0.761902,0.744621,0.743522


[I 2025-03-26 23:03:02,549] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.004400603985954562, 'weight_decay': 0.001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8997,0.799547,0.7313,0.751027,0.731677,0.732199
2,0.7629,0.747001,0.7426,0.759886,0.741931,0.745326
3,0.7366,0.719237,0.7521,0.758133,0.751369,0.751078
4,0.7269,0.697463,0.763,0.775535,0.763298,0.763425
5,0.7078,0.69369,0.7573,0.760628,0.756941,0.757344
6,0.6927,0.68134,0.7629,0.76977,0.76223,0.763447


[I 2025-03-26 23:08:02,385] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0012738504686545394, 'weight_decay': 0.0, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9433,0.753718,0.738,0.763876,0.737622,0.73941
2,0.7136,0.700743,0.7576,0.763284,0.757225,0.756506
3,0.6798,0.713403,0.7536,0.765594,0.752549,0.752438
4,0.667,0.661186,0.7707,0.774658,0.770717,0.770522
5,0.6534,0.701253,0.7561,0.766864,0.756159,0.757562
6,0.6487,0.657204,0.7728,0.774989,0.772489,0.772912
7,0.6355,0.665546,0.7717,0.77044,0.771623,0.768545
8,0.6287,0.668205,0.7661,0.769989,0.765926,0.764804
9,0.6275,0.678282,0.7621,0.767002,0.761866,0.761819
10,0.6194,0.659162,0.7712,0.772315,0.770882,0.769835


[I 2025-03-26 23:16:19,366] Trial 99 finished with value: 0.7698346434023947 and parameters: {'learning_rate': 0.0012738504686545394, 'weight_decay': 0.0, 'warmup_steps': 17}. Best is trial 35 with value: 0.7718702742260117.


Trial 100 with params: {'learning_rate': 0.00214909178274976, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9044,0.741802,0.743,0.76052,0.742897,0.743988
2,0.7142,0.706353,0.7545,0.760802,0.753994,0.753045
3,0.6852,0.737283,0.7461,0.762688,0.745198,0.743358


[I 2025-03-26 23:18:47,970] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0034679197373456024, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8837,0.787803,0.7303,0.753749,0.730714,0.73134
2,0.7397,0.735988,0.746,0.758648,0.745308,0.746763
3,0.7125,0.728738,0.7519,0.762687,0.751015,0.750349
4,0.7035,0.682801,0.7649,0.776374,0.765084,0.765587
5,0.687,0.680362,0.7617,0.764612,0.761449,0.762201
6,0.6752,0.674642,0.7654,0.77307,0.76468,0.766308


[I 2025-03-26 23:23:42,466] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0028941323356702423, 'weight_decay': 0.0, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8769,0.772282,0.7337,0.758628,0.733895,0.734794
2,0.7266,0.724171,0.7493,0.759154,0.748558,0.748632
3,0.6993,0.742968,0.7459,0.761501,0.74493,0.743683


[I 2025-03-26 23:26:13,096] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0017430993217002369, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8977,0.739968,0.7428,0.762439,0.742565,0.743779
2,0.7105,0.701144,0.7562,0.761986,0.755761,0.754622
3,0.6799,0.73082,0.7459,0.76234,0.744939,0.743702
4,0.6687,0.658605,0.7707,0.775928,0.770653,0.770966
5,0.6565,0.694882,0.7568,0.765789,0.756832,0.757912
6,0.6509,0.659866,0.7689,0.773015,0.76845,0.769482
7,0.6363,0.666324,0.7719,0.772202,0.771825,0.76892
8,0.6284,0.667884,0.7676,0.7712,0.767491,0.766248
9,0.6264,0.680069,0.7647,0.770029,0.764522,0.764411
10,0.6161,0.657162,0.7723,0.773346,0.772018,0.771012


[I 2025-03-26 23:34:38,335] Trial 103 finished with value: 0.7710122363087929 and parameters: {'learning_rate': 0.0017430993217002369, 'weight_decay': 0.0, 'warmup_steps': 12}. Best is trial 35 with value: 0.7718702742260117.


Trial 104 with params: {'learning_rate': 0.000711448169496144, 'weight_decay': 0.001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0323,0.774196,0.7356,0.752664,0.735151,0.736991
2,0.737,0.720155,0.7517,0.757731,0.751068,0.751403
3,0.6946,0.700214,0.7604,0.762475,0.759486,0.759295
4,0.6779,0.677411,0.7676,0.76978,0.767647,0.76679
5,0.6626,0.707045,0.7523,0.760761,0.752379,0.753779
6,0.6572,0.665816,0.7714,0.772162,0.771082,0.771024
7,0.6457,0.675911,0.7701,0.768828,0.769995,0.766653
8,0.6395,0.674266,0.7655,0.770299,0.765298,0.764743
9,0.6386,0.683589,0.761,0.76618,0.760676,0.761006
10,0.6333,0.668762,0.7682,0.769703,0.767831,0.766949


[I 2025-03-26 23:43:01,219] Trial 104 finished with value: 0.7669491547527053 and parameters: {'learning_rate': 0.000711448169496144, 'weight_decay': 0.001, 'warmup_steps': 13}. Best is trial 35 with value: 0.7718702742260117.


Trial 105 with params: {'learning_rate': 0.0019700761140720837, 'weight_decay': 0.002, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8844,0.7383,0.7433,0.760842,0.743136,0.7443
2,0.7116,0.702987,0.756,0.761798,0.755541,0.754448
3,0.6822,0.734454,0.7455,0.76195,0.744572,0.742846
4,0.6716,0.65835,0.7711,0.77643,0.771037,0.771442
5,0.6592,0.690346,0.758,0.765415,0.758005,0.758761
6,0.6531,0.662621,0.767,0.772324,0.766452,0.76774
7,0.638,0.668373,0.772,0.772917,0.771917,0.769061
8,0.6294,0.66808,0.7688,0.772447,0.76875,0.767455
9,0.6269,0.681829,0.7653,0.770797,0.765143,0.764958
10,0.6156,0.656864,0.7727,0.773881,0.77243,0.771494


[I 2025-03-26 23:51:24,583] Trial 105 finished with value: 0.7714944932076386 and parameters: {'learning_rate': 0.0019700761140720837, 'weight_decay': 0.002, 'warmup_steps': 10}. Best is trial 35 with value: 0.7718702742260117.


Trial 106 with params: {'learning_rate': 0.0010935263575250845, 'weight_decay': 0.004, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.953,0.755778,0.7376,0.760914,0.737233,0.738819
2,0.7171,0.703049,0.7568,0.761954,0.756368,0.755982
3,0.6818,0.703324,0.7579,0.765889,0.756801,0.756955
4,0.6682,0.663961,0.7698,0.772983,0.769839,0.769477
5,0.654,0.702483,0.7553,0.765812,0.755377,0.75683
6,0.6493,0.657997,0.7724,0.773942,0.772085,0.772283
7,0.6368,0.66716,0.7705,0.769043,0.770426,0.767161
8,0.6303,0.668858,0.7656,0.769708,0.765397,0.764448
9,0.6292,0.678611,0.7619,0.76679,0.761634,0.761654
10,0.622,0.660877,0.7705,0.771825,0.770188,0.769259


[I 2025-03-26 23:59:46,580] Trial 106 finished with value: 0.7692593610469267 and parameters: {'learning_rate': 0.0010935263575250845, 'weight_decay': 0.004, 'warmup_steps': 12}. Best is trial 35 with value: 0.7718702742260117.


Trial 107 with params: {'learning_rate': 0.0027370678464080814, 'weight_decay': 0.005, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8753,0.765229,0.7352,0.759091,0.735325,0.736381
2,0.7234,0.720416,0.7501,0.759262,0.7494,0.749212
3,0.6959,0.743663,0.7449,0.761472,0.74395,0.742582
4,0.6863,0.667401,0.7668,0.774289,0.766789,0.767259
5,0.6718,0.678212,0.7634,0.766791,0.763285,0.763722
6,0.6632,0.668956,0.7667,0.773624,0.766001,0.767541


[I 2025-03-27 00:04:41,871] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0010856015565956756, 'weight_decay': 0.002, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9612,0.756237,0.7371,0.760274,0.736739,0.738347
2,0.7175,0.703311,0.7566,0.761693,0.756172,0.75578
3,0.682,0.703046,0.7581,0.765962,0.757004,0.757146
4,0.6684,0.664132,0.7696,0.772728,0.769639,0.769266
5,0.6541,0.702562,0.7552,0.765718,0.755277,0.756727
6,0.6494,0.65808,0.7725,0.774,0.772187,0.772372
7,0.6369,0.667269,0.7708,0.769321,0.770741,0.767476
8,0.6304,0.66892,0.7656,0.769759,0.765401,0.764454
9,0.6293,0.678649,0.7618,0.766713,0.761531,0.761566
10,0.6222,0.660955,0.7702,0.771544,0.769886,0.768958


[I 2025-03-27 00:12:53,343] Trial 108 finished with value: 0.7689576522000077 and parameters: {'learning_rate': 0.0010856015565956756, 'weight_decay': 0.002, 'warmup_steps': 15}. Best is trial 35 with value: 0.7718702742260117.


Trial 109 with params: {'learning_rate': 0.0024199144553487028, 'weight_decay': 0.002, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8742,0.749451,0.7399,0.759886,0.739865,0.741124
2,0.7174,0.712147,0.7529,0.760429,0.752293,0.751685
3,0.6894,0.740774,0.7452,0.761706,0.744256,0.742522


[I 2025-03-27 00:15:21,607] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0016009286790297352, 'weight_decay': 0.001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9206,0.744094,0.7422,0.763984,0.741926,0.743392
2,0.7109,0.701067,0.7559,0.76197,0.755481,0.754443
3,0.6793,0.727465,0.7464,0.762416,0.745392,0.744557
4,0.6677,0.659124,0.7703,0.775158,0.770269,0.7704
5,0.6551,0.697437,0.7567,0.766425,0.756726,0.757953
6,0.6499,0.658524,0.7704,0.773696,0.76999,0.770782
7,0.6357,0.665496,0.7716,0.771394,0.771533,0.768569
8,0.6281,0.667912,0.7668,0.770449,0.766673,0.765428
9,0.6264,0.679244,0.7639,0.769024,0.763714,0.763562
10,0.6167,0.657524,0.7728,0.773849,0.772512,0.771498


[I 2025-03-27 00:23:38,464] Trial 110 finished with value: 0.7714975549333889 and parameters: {'learning_rate': 0.0016009286790297352, 'weight_decay': 0.001, 'warmup_steps': 19}. Best is trial 35 with value: 0.7718702742260117.


Trial 111 with params: {'learning_rate': 0.002591961999759103, 'weight_decay': 0.002, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8889,0.758525,0.7374,0.759907,0.737457,0.738624
2,0.721,0.716844,0.7516,0.760122,0.750931,0.750569
3,0.6931,0.743002,0.7447,0.761823,0.743729,0.742302
4,0.6834,0.664868,0.7676,0.774411,0.767567,0.767994
5,0.6693,0.679689,0.7627,0.76639,0.762598,0.762967
6,0.6611,0.668184,0.7673,0.774017,0.766608,0.768115
7,0.6446,0.675181,0.7694,0.771397,0.769356,0.76644
8,0.634,0.670857,0.7693,0.773368,0.769238,0.768354
9,0.6302,0.688327,0.7636,0.769914,0.763541,0.763299
10,0.616,0.657223,0.7729,0.774153,0.772651,0.77169


[I 2025-03-27 00:32:00,388] Trial 111 finished with value: 0.7716896257092117 and parameters: {'learning_rate': 0.002591961999759103, 'weight_decay': 0.002, 'warmup_steps': 19}. Best is trial 35 with value: 0.7718702742260117.


Trial 112 with params: {'learning_rate': 0.0020024819320657934, 'weight_decay': 0.001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9031,0.739247,0.744,0.761267,0.743839,0.744978
2,0.7124,0.703743,0.7555,0.761462,0.755025,0.753908
3,0.6829,0.735207,0.7458,0.762196,0.744901,0.743122


[I 2025-03-27 00:34:32,308] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.003678989445221232, 'weight_decay': 0.002, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8908,0.791412,0.729,0.751842,0.729422,0.72999
2,0.7449,0.739775,0.744,0.757929,0.743329,0.745422
3,0.7179,0.723189,0.7536,0.762517,0.75275,0.752445
4,0.7088,0.687216,0.766,0.778038,0.766231,0.766601
5,0.6918,0.683085,0.7602,0.76329,0.759928,0.760678
6,0.6791,0.676572,0.764,0.771846,0.763273,0.764847


[I 2025-03-27 00:39:34,214] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0015741984461299717, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9246,0.745026,0.7416,0.763869,0.741306,0.742862
2,0.7111,0.701076,0.7559,0.762048,0.755497,0.754482
3,0.6792,0.726634,0.7473,0.763164,0.746289,0.745552
4,0.6675,0.659251,0.7707,0.77553,0.770662,0.770789
5,0.6549,0.69789,0.7569,0.766819,0.756919,0.758183
6,0.6497,0.65833,0.7704,0.773631,0.769998,0.770775
7,0.6356,0.66538,0.7717,0.771322,0.771628,0.768699
8,0.6281,0.667934,0.7665,0.770124,0.766363,0.765097
9,0.6264,0.679105,0.7642,0.769291,0.764012,0.763875
10,0.6169,0.657609,0.7729,0.773955,0.772613,0.771604


[I 2025-03-27 00:47:55,376] Trial 114 finished with value: 0.7716040934642886 and parameters: {'learning_rate': 0.0015741984461299717, 'weight_decay': 0.003, 'warmup_steps': 20}. Best is trial 35 with value: 0.7718702742260117.


Trial 115 with params: {'learning_rate': 0.001044463102392654, 'weight_decay': 0.002, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9917,0.758245,0.7374,0.760042,0.737016,0.738688
2,0.7196,0.704629,0.7565,0.761745,0.75606,0.755785
3,0.6832,0.701499,0.7583,0.76522,0.757221,0.757378
4,0.6691,0.665134,0.7692,0.772226,0.769231,0.768831
5,0.6547,0.702927,0.7541,0.764601,0.754192,0.755645
6,0.6499,0.658534,0.7724,0.773796,0.772086,0.772233
7,0.6375,0.667848,0.7712,0.769682,0.77113,0.76788
8,0.631,0.669223,0.7656,0.769917,0.765391,0.764521
9,0.63,0.678875,0.7615,0.766474,0.761219,0.761295
10,0.623,0.661504,0.7698,0.771275,0.769462,0.768579


[I 2025-03-27 00:56:19,983] Trial 115 finished with value: 0.7685785889976303 and parameters: {'learning_rate': 0.001044463102392654, 'weight_decay': 0.002, 'warmup_steps': 26}. Best is trial 35 with value: 0.7718702742260117.


Trial 116 with params: {'learning_rate': 0.0011617865121432338, 'weight_decay': 0.001, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9651,0.755803,0.7374,0.76191,0.737017,0.738813
2,0.7161,0.701877,0.7566,0.76177,0.756186,0.755647
3,0.6811,0.707346,0.7564,0.76588,0.755338,0.755413
4,0.6677,0.662682,0.7702,0.773656,0.770237,0.769952
5,0.6537,0.702116,0.7556,0.766144,0.755683,0.757099
6,0.649,0.657588,0.7731,0.774911,0.772799,0.773066
7,0.6362,0.666427,0.7708,0.769378,0.770719,0.767531
8,0.6296,0.668557,0.7662,0.770331,0.766001,0.76503
9,0.6285,0.678387,0.7614,0.766236,0.761138,0.761136
10,0.6209,0.660139,0.771,0.772211,0.770688,0.769722


[I 2025-03-27 01:04:39,032] Trial 116 finished with value: 0.7697218101812506 and parameters: {'learning_rate': 0.0011617865121432338, 'weight_decay': 0.001, 'warmup_steps': 21}. Best is trial 35 with value: 0.7718702742260117.


Trial 117 with params: {'learning_rate': 0.0031875156408721394, 'weight_decay': 0.004, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8947,0.781894,0.7316,0.75649,0.731949,0.732589
2,0.7336,0.730518,0.7465,0.757924,0.745802,0.746481
3,0.7062,0.736598,0.7478,0.761478,0.746827,0.74609
4,0.697,0.676753,0.7659,0.776685,0.766035,0.766634
5,0.6812,0.677825,0.7622,0.765205,0.762014,0.762758
6,0.6706,0.672129,0.7661,0.773595,0.765349,0.767012


[I 2025-03-27 01:09:35,963] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.001539137192521179, 'weight_decay': 0.004, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9271,0.746211,0.7409,0.763723,0.740582,0.742183
2,0.7112,0.70104,0.7562,0.762161,0.755798,0.754769
3,0.6791,0.72546,0.7471,0.762587,0.746093,0.745458
4,0.6673,0.659408,0.77,0.774619,0.769963,0.770005
5,0.6546,0.698406,0.7573,0.767386,0.757323,0.758624
6,0.6495,0.658056,0.7708,0.773926,0.770417,0.771164
7,0.6355,0.665261,0.772,0.771465,0.771919,0.768995
8,0.6281,0.667949,0.7665,0.770112,0.766357,0.765085
9,0.6264,0.678953,0.7642,0.76928,0.764016,0.763866
10,0.6171,0.657732,0.7727,0.773792,0.772406,0.77139


[I 2025-03-27 01:17:58,121] Trial 118 finished with value: 0.771389661136302 and parameters: {'learning_rate': 0.001539137192521179, 'weight_decay': 0.004, 'warmup_steps': 20}. Best is trial 35 with value: 0.7718702742260117.


Trial 119 with params: {'learning_rate': 0.0021387830634521134, 'weight_decay': 0.003, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9027,0.741444,0.743,0.760428,0.742886,0.743989
2,0.714,0.706124,0.7546,0.760892,0.754092,0.753131
3,0.685,0.737082,0.7461,0.762739,0.745204,0.743378


[I 2025-03-27 01:20:29,829] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.00022650159354999495, 'weight_decay': 0.004, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4017,0.960849,0.7045,0.714454,0.704245,0.704196
2,0.8756,0.826252,0.7313,0.739658,0.730166,0.731452
3,0.7882,0.783672,0.7401,0.741196,0.73931,0.737506
4,0.7534,0.741329,0.7528,0.753773,0.752821,0.751473
5,0.7299,0.762316,0.7402,0.746533,0.740057,0.741075
6,0.7183,0.721509,0.7557,0.757816,0.755435,0.75559


[I 2025-03-27 01:25:31,312] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.0019998137207421058, 'weight_decay': 0.002, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8972,0.738962,0.7439,0.76119,0.743734,0.744895
2,0.7122,0.7036,0.7556,0.761539,0.75513,0.753998
3,0.6828,0.735062,0.7458,0.762121,0.744901,0.743085


[I 2025-03-27 01:28:02,678] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0017536231569678761, 'weight_decay': 0.002, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9204,0.740616,0.7435,0.762751,0.743279,0.744501
2,0.7112,0.701438,0.7563,0.762187,0.755858,0.754776
3,0.6803,0.731387,0.7456,0.762016,0.744643,0.743371
4,0.6691,0.658487,0.771,0.776153,0.770958,0.771276
5,0.6568,0.694685,0.7561,0.765194,0.75613,0.757225
6,0.6512,0.660128,0.7691,0.773262,0.768643,0.769677
7,0.6365,0.666518,0.7721,0.772428,0.772022,0.76913
8,0.6286,0.667921,0.7672,0.770895,0.767102,0.765882
9,0.6265,0.680188,0.7647,0.770013,0.764525,0.764414
10,0.6161,0.657169,0.7727,0.773755,0.772409,0.771414


[I 2025-03-27 01:36:17,186] Trial 122 finished with value: 0.7714139357308637 and parameters: {'learning_rate': 0.0017536231569678761, 'weight_decay': 0.002, 'warmup_steps': 23}. Best is trial 35 with value: 0.7718702742260117.


Trial 123 with params: {'learning_rate': 0.0020555598618777456, 'weight_decay': 0.002, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8993,0.739704,0.7441,0.761167,0.743946,0.744999
2,0.7129,0.704515,0.7545,0.760528,0.753999,0.752929
3,0.6836,0.73588,0.7465,0.762962,0.745608,0.743804


[I 2025-03-27 01:38:48,730] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0030329416208506785, 'weight_decay': 0.0, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8876,0.777399,0.7323,0.757258,0.732577,0.73326
2,0.7299,0.727278,0.7469,0.757509,0.746177,0.746536
3,0.7026,0.740568,0.7462,0.760977,0.745235,0.744198
4,0.6932,0.673436,0.7667,0.776356,0.766759,0.767366
5,0.6778,0.677144,0.7624,0.765512,0.762244,0.762931
6,0.6679,0.670867,0.766,0.77321,0.765262,0.766855


[I 2025-03-27 01:43:52,671] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.00023456763594861504, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3375,0.942989,0.7062,0.71622,0.705923,0.705993
2,0.864,0.8192,0.7314,0.739809,0.730275,0.731554
3,0.7821,0.779171,0.741,0.742003,0.740213,0.738467


[I 2025-03-27 01:46:23,401] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.0021059976417271012, 'weight_decay': 0.003, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8747,0.739768,0.7426,0.759925,0.74248,0.743532
2,0.7128,0.705072,0.7543,0.760351,0.753798,0.752732
3,0.684,0.736167,0.7458,0.762164,0.744903,0.743004
4,0.6736,0.658817,0.7693,0.77482,0.769263,0.769647
5,0.6611,0.687757,0.7572,0.763584,0.757205,0.75774
6,0.6545,0.6642,0.7665,0.772346,0.765892,0.767261


[I 2025-03-27 01:51:25,759] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0011220236547807357, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9563,0.755704,0.7378,0.761545,0.737427,0.739039
2,0.7166,0.70248,0.7565,0.761659,0.756065,0.755628
3,0.6814,0.704956,0.7581,0.766606,0.757018,0.757098
4,0.668,0.663376,0.7694,0.772713,0.769435,0.769138
5,0.6539,0.702329,0.7551,0.765549,0.75518,0.75661
6,0.6491,0.65779,0.773,0.774626,0.772689,0.772907
7,0.6365,0.666838,0.7706,0.769127,0.770525,0.767269
8,0.63,0.668709,0.7656,0.769712,0.765404,0.764433
9,0.6289,0.678506,0.7616,0.766479,0.761336,0.761358
10,0.6215,0.660538,0.7707,0.77204,0.77039,0.769452


[I 2025-03-27 01:59:48,104] Trial 127 finished with value: 0.7694516766458042 and parameters: {'learning_rate': 0.0011220236547807357, 'weight_decay': 0.0, 'warmup_steps': 15}. Best is trial 35 with value: 0.7718702742260117.


Trial 128 with params: {'learning_rate': 0.003594320208457203, 'weight_decay': 0.008, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9004,0.790203,0.7293,0.752304,0.729738,0.730318
2,0.7432,0.738092,0.7453,0.758666,0.744639,0.746559
3,0.7161,0.724828,0.7535,0.763077,0.752624,0.752167
4,0.7069,0.685562,0.7655,0.777372,0.765715,0.766177
5,0.6901,0.682066,0.7609,0.763961,0.760633,0.761379
6,0.6777,0.675893,0.7647,0.772402,0.763984,0.765589


[I 2025-03-27 02:04:47,435] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.0005723781758500758, 'weight_decay': 0.004, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0986,0.795065,0.7303,0.745277,0.729851,0.731698
2,0.7533,0.733486,0.7465,0.753495,0.745701,0.746596
3,0.7054,0.710521,0.7569,0.758273,0.756097,0.755645
4,0.6865,0.685259,0.7644,0.766023,0.764454,0.763305
5,0.6704,0.712499,0.751,0.758623,0.751041,0.752363
6,0.6643,0.672577,0.7696,0.770402,0.769307,0.769203
7,0.6532,0.682276,0.7682,0.767149,0.768139,0.764815
8,0.6468,0.678768,0.7644,0.769488,0.764178,0.763725
9,0.6458,0.68828,0.7598,0.765092,0.75948,0.75995
10,0.6413,0.674763,0.7677,0.769551,0.767307,0.766513


[I 2025-03-27 02:13:09,367] Trial 129 finished with value: 0.7665129733743903 and parameters: {'learning_rate': 0.0005723781758500758, 'weight_decay': 0.004, 'warmup_steps': 21}. Best is trial 35 with value: 0.7718702742260117.


Trial 130 with params: {'learning_rate': 0.004456615149543606, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8929,0.801051,0.7318,0.751756,0.73217,0.732715
2,0.7641,0.74754,0.7425,0.759898,0.741818,0.74522
3,0.7378,0.720322,0.7522,0.758497,0.751474,0.75121
4,0.7281,0.697891,0.7628,0.77523,0.763098,0.763191
5,0.7088,0.694423,0.7571,0.760408,0.756738,0.757127
6,0.6937,0.681656,0.7633,0.770028,0.762646,0.763804


[I 2025-03-27 02:18:13,398] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0037360605335589204, 'weight_decay': 0.0, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9031,0.792318,0.7304,0.752763,0.730829,0.731394
2,0.7467,0.740542,0.7446,0.758834,0.743935,0.746175
3,0.7197,0.721611,0.7545,0.76304,0.753684,0.753492
4,0.7104,0.688394,0.7653,0.777501,0.765545,0.765893
5,0.6933,0.683992,0.7607,0.763726,0.76042,0.761128
6,0.6803,0.677093,0.764,0.771787,0.763279,0.764813


[I 2025-03-27 02:23:13,332] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.001310216166904791, 'weight_decay': 0.001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9233,0.752356,0.7377,0.763447,0.737318,0.739059
2,0.7126,0.700386,0.7578,0.763528,0.757433,0.756692
3,0.6793,0.714985,0.752,0.76471,0.750956,0.750713
4,0.6668,0.660843,0.7703,0.774493,0.770309,0.770165
5,0.6533,0.700871,0.7568,0.767455,0.756855,0.758239
6,0.6486,0.657131,0.772,0.774399,0.771674,0.772179
7,0.6354,0.665336,0.7722,0.771041,0.772121,0.769079
8,0.6285,0.668115,0.7661,0.769953,0.765932,0.764791
9,0.6272,0.678259,0.7622,0.76706,0.761976,0.761909
10,0.619,0.658875,0.7713,0.772472,0.770989,0.769968


[I 2025-03-27 02:31:32,903] Trial 132 finished with value: 0.7699677352753758 and parameters: {'learning_rate': 0.001310216166904791, 'weight_decay': 0.001, 'warmup_steps': 10}. Best is trial 35 with value: 0.7718702742260117.


Trial 133 with params: {'learning_rate': 0.001751073432859934, 'weight_decay': 0.0, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9077,0.740194,0.7433,0.762933,0.743072,0.744304
2,0.7108,0.701301,0.7563,0.762119,0.755855,0.754752
3,0.6801,0.731183,0.746,0.762446,0.745037,0.743787


[I 2025-03-27 02:34:03,129] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0023498960173462258, 'weight_decay': 0.002, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8771,0.746684,0.741,0.760247,0.740937,0.74219
2,0.7164,0.710427,0.7535,0.760745,0.752932,0.752256
3,0.6882,0.739786,0.7451,0.761536,0.744164,0.742399
4,0.6782,0.661101,0.768,0.77363,0.767971,0.76829
5,0.6649,0.683343,0.7602,0.765003,0.760174,0.760509
6,0.6576,0.666551,0.7666,0.773164,0.765936,0.767433


[I 2025-03-27 02:39:02,925] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.002337379939202047, 'weight_decay': 0.0, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8773,0.746206,0.7413,0.76042,0.74124,0.74248
2,0.7162,0.710132,0.7536,0.760712,0.75303,0.752311
3,0.688,0.739633,0.7451,0.761536,0.744164,0.742399


[I 2025-03-27 02:41:31,302] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0013096246132766028, 'weight_decay': 0.0, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9125,0.752049,0.7381,0.763946,0.737733,0.739457
2,0.7123,0.700203,0.7573,0.762996,0.756933,0.756168
3,0.6792,0.714675,0.7518,0.764454,0.750744,0.750496
4,0.6667,0.660859,0.7704,0.774525,0.770415,0.770231
5,0.6532,0.700855,0.7569,0.767421,0.756952,0.758322
6,0.6485,0.65711,0.7722,0.774537,0.771873,0.772355
7,0.6353,0.6653,0.7724,0.771201,0.77232,0.769277
8,0.6284,0.668059,0.7662,0.770052,0.766028,0.764884
9,0.6272,0.678232,0.7623,0.767157,0.762074,0.762019
10,0.619,0.658878,0.7716,0.772756,0.771293,0.770262


[I 2025-03-27 02:49:46,970] Trial 136 finished with value: 0.7702624917597668 and parameters: {'learning_rate': 0.0013096246132766028, 'weight_decay': 0.0, 'warmup_steps': 5}. Best is trial 35 with value: 0.7718702742260117.


Trial 137 with params: {'learning_rate': 0.0010349855093137908, 'weight_decay': 0.0, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9732,0.757394,0.7379,0.759939,0.737524,0.739148
2,0.7192,0.704728,0.7568,0.761982,0.756369,0.756081
3,0.6831,0.700769,0.7586,0.765024,0.757529,0.757618
4,0.6691,0.665363,0.7691,0.772117,0.769127,0.768725
5,0.6547,0.702958,0.754,0.764306,0.754102,0.755496
6,0.6499,0.658584,0.7726,0.773978,0.77229,0.772434
7,0.6375,0.667978,0.7714,0.769889,0.771328,0.768037
8,0.6311,0.669283,0.7655,0.769881,0.765299,0.764465
9,0.6301,0.678924,0.7614,0.766436,0.761118,0.761231
10,0.6232,0.661618,0.7696,0.771066,0.769259,0.768367


[I 2025-03-27 02:58:08,475] Trial 137 finished with value: 0.7683671245359802 and parameters: {'learning_rate': 0.0010349855093137908, 'weight_decay': 0.0, 'warmup_steps': 17}. Best is trial 35 with value: 0.7718702742260117.


Trial 138 with params: {'learning_rate': 0.00013086161901401913, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5774,1.135335,0.6861,0.694757,0.685895,0.685055
2,1.0107,0.928236,0.7168,0.72382,0.715693,0.716309
3,0.8785,0.857579,0.7266,0.72662,0.72582,0.722706


[I 2025-03-27 03:00:37,810] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0019794167880229294, 'weight_decay': 0.001, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8959,0.738745,0.7437,0.761041,0.743528,0.744686
2,0.712,0.703251,0.7558,0.761757,0.75534,0.754229
3,0.6825,0.734762,0.7459,0.762277,0.744988,0.743201
4,0.6718,0.658314,0.7707,0.776058,0.770637,0.771032
5,0.6595,0.690085,0.7577,0.764998,0.757712,0.758421
6,0.6532,0.662795,0.767,0.772339,0.766448,0.767732
7,0.6381,0.668532,0.7721,0.773022,0.772016,0.76917
8,0.6295,0.668098,0.7686,0.772264,0.768547,0.767256
9,0.627,0.681901,0.7654,0.770906,0.765244,0.765065
10,0.6156,0.65688,0.7728,0.773944,0.772531,0.771584


[I 2025-03-27 03:08:52,881] Trial 139 finished with value: 0.7715842787464534 and parameters: {'learning_rate': 0.0019794167880229294, 'weight_decay': 0.001, 'warmup_steps': 16}. Best is trial 35 with value: 0.7718702742260117.


Trial 140 with params: {'learning_rate': 0.004835436386814665, 'weight_decay': 0.01, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9042,0.806714,0.7311,0.749665,0.731378,0.731403
2,0.7747,0.74727,0.7433,0.759069,0.74249,0.745448
3,0.748,0.728286,0.7507,0.757948,0.749979,0.749598
4,0.7379,0.698892,0.7628,0.772974,0.76311,0.762832
5,0.717,0.700201,0.757,0.760488,0.75668,0.756941
6,0.7012,0.68437,0.7623,0.768323,0.761734,0.762573


[I 2025-03-27 03:13:51,332] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.001479504841296372, 'weight_decay': 0.002, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.918,0.747832,0.7383,0.762349,0.737946,0.739656
2,0.7112,0.700766,0.7561,0.762001,0.755704,0.754729
3,0.6789,0.723005,0.7479,0.763366,0.746875,0.746394
4,0.6669,0.659731,0.7702,0.774682,0.770152,0.770121
5,0.6541,0.699171,0.7573,0.767758,0.757333,0.758684
6,0.6491,0.657647,0.7703,0.773201,0.769931,0.770619
7,0.6353,0.665096,0.7721,0.771218,0.772015,0.769029
8,0.6281,0.667947,0.7662,0.769868,0.76606,0.764795
9,0.6265,0.678685,0.7638,0.76889,0.763592,0.763492
10,0.6175,0.657974,0.7721,0.773173,0.77181,0.770783


[I 2025-03-27 03:22:16,369] Trial 141 finished with value: 0.770783032738926 and parameters: {'learning_rate': 0.001479504841296372, 'weight_decay': 0.002, 'warmup_steps': 14}. Best is trial 35 with value: 0.7718702742260117.


Trial 142 with params: {'learning_rate': 0.002025175293097073, 'weight_decay': 0.001, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8944,0.739085,0.7439,0.761146,0.743745,0.744846
2,0.7124,0.703925,0.7555,0.76153,0.755028,0.753966
3,0.6831,0.735406,0.7459,0.762435,0.745007,0.743186


[I 2025-03-27 03:24:46,581] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0018298968226767127, 'weight_decay': 0.0, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9018,0.739025,0.7433,0.762002,0.743091,0.744312
2,0.711,0.701703,0.7563,0.762079,0.75585,0.754759
3,0.6808,0.732576,0.7465,0.76299,0.745552,0.744182
4,0.6698,0.65835,0.7712,0.776379,0.771139,0.771505
5,0.6576,0.693145,0.7569,0.765239,0.75692,0.757882
6,0.6517,0.660939,0.7687,0.773155,0.768202,0.769311
7,0.637,0.667073,0.772,0.772588,0.771922,0.769049
8,0.6288,0.66794,0.7675,0.771121,0.767417,0.766153
9,0.6266,0.680709,0.7648,0.770079,0.764636,0.764473
10,0.6159,0.657031,0.7726,0.773779,0.772309,0.771366


[I 2025-03-27 03:33:12,365] Trial 143 finished with value: 0.7713657197926551 and parameters: {'learning_rate': 0.0018298968226767127, 'weight_decay': 0.0, 'warmup_steps': 16}. Best is trial 35 with value: 0.7718702742260117.


Trial 144 with params: {'learning_rate': 0.00284442869646421, 'weight_decay': 0.002, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8787,0.77027,0.7346,0.759247,0.734787,0.735768
2,0.7256,0.723025,0.7493,0.758942,0.748583,0.748615
3,0.6982,0.743385,0.7454,0.761276,0.744433,0.743095


[I 2025-03-27 03:35:40,351] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.002204296852030761, 'weight_decay': 0.007, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8913,0.742485,0.7436,0.761433,0.743509,0.744612
2,0.7146,0.70728,0.7538,0.760355,0.753276,0.752402
3,0.6859,0.737855,0.746,0.76258,0.745082,0.743304
4,0.6756,0.659522,0.7681,0.773787,0.768052,0.768425
5,0.6627,0.68577,0.7591,0.764708,0.759084,0.759486
6,0.6559,0.665331,0.7672,0.773295,0.76656,0.767962
7,0.6403,0.670984,0.7702,0.771571,0.770123,0.76722
8,0.6309,0.668687,0.7685,0.772277,0.768447,0.767333
9,0.6279,0.684079,0.765,0.770743,0.764868,0.764661
10,0.6155,0.656886,0.7725,0.773656,0.772242,0.77128


[I 2025-03-27 03:44:02,161] Trial 145 finished with value: 0.7712802892302493 and parameters: {'learning_rate': 0.002204296852030761, 'weight_decay': 0.007, 'warmup_steps': 17}. Best is trial 35 with value: 0.7718702742260117.


Trial 146 with params: {'learning_rate': 0.0013457064923291322, 'weight_decay': 0.002, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9431,0.752372,0.7378,0.763301,0.737423,0.739122
2,0.7129,0.700713,0.757,0.762982,0.756629,0.755814
3,0.6794,0.717269,0.751,0.764664,0.749968,0.74968
4,0.6669,0.660543,0.771,0.775219,0.770983,0.770843
5,0.6536,0.70071,0.7575,0.768307,0.757534,0.758954
6,0.6488,0.657242,0.7716,0.774133,0.771262,0.771814
7,0.6354,0.665263,0.7723,0.771222,0.77222,0.769176
8,0.6284,0.668077,0.7665,0.770361,0.766334,0.765198
9,0.6271,0.678334,0.7626,0.767522,0.762375,0.762289
10,0.6186,0.658671,0.7715,0.772717,0.771186,0.770177


[I 2025-03-27 03:52:22,643] Trial 146 finished with value: 0.7701768359532906 and parameters: {'learning_rate': 0.0013457064923291322, 'weight_decay': 0.002, 'warmup_steps': 20}. Best is trial 35 with value: 0.7718702742260117.


Trial 147 with params: {'learning_rate': 0.0016655064395304869, 'weight_decay': 0.0, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8996,0.741697,0.7431,0.763889,0.742838,0.744301
2,0.7103,0.70093,0.7562,0.762135,0.755768,0.754604
3,0.6793,0.728964,0.746,0.762162,0.745012,0.74392
4,0.668,0.6589,0.7705,0.775564,0.770451,0.770667
5,0.6556,0.696305,0.7566,0.765973,0.756627,0.757751
6,0.6503,0.659049,0.7697,0.773263,0.769268,0.770143
7,0.6359,0.665782,0.7717,0.771842,0.771631,0.768744
8,0.6282,0.667879,0.767,0.770641,0.766876,0.765636
9,0.6263,0.679581,0.7644,0.769581,0.76421,0.76411
10,0.6164,0.657326,0.7724,0.773432,0.77211,0.771091


[I 2025-03-27 04:00:40,921] Trial 147 finished with value: 0.7710910225725691 and parameters: {'learning_rate': 0.0016655064395304869, 'weight_decay': 0.0, 'warmup_steps': 11}. Best is trial 35 with value: 0.7718702742260117.


Trial 148 with params: {'learning_rate': 0.0025234679894924404, 'weight_decay': 0.0, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8878,0.755003,0.7381,0.759657,0.738106,0.739319
2,0.7197,0.715052,0.7527,0.760723,0.752057,0.751523
3,0.6917,0.74233,0.745,0.762037,0.744031,0.742473
4,0.6819,0.663716,0.768,0.774258,0.767988,0.768298
5,0.668,0.680603,0.763,0.766968,0.762922,0.763262
6,0.6601,0.667803,0.7671,0.773936,0.766408,0.767948
7,0.6438,0.67447,0.7691,0.770992,0.76905,0.766154
8,0.6334,0.670348,0.7689,0.772851,0.768832,0.767885
9,0.6297,0.687548,0.7642,0.770523,0.764132,0.763901
10,0.6159,0.657127,0.7728,0.774076,0.772546,0.771583


[I 2025-03-27 04:09:01,945] Trial 148 finished with value: 0.7715827087741464 and parameters: {'learning_rate': 0.0025234679894924404, 'weight_decay': 0.0, 'warmup_steps': 18}. Best is trial 35 with value: 0.7718702742260117.


Trial 149 with params: {'learning_rate': 0.0026160813994285134, 'weight_decay': 0.0, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8887,0.759756,0.7373,0.760032,0.737364,0.738452
2,0.7215,0.717489,0.7512,0.759859,0.750524,0.750222
3,0.6936,0.743203,0.7446,0.761801,0.743629,0.742229


[I 2025-03-27 04:11:31,898] Trial 149 pruned. 


In [29]:
print(best_base_head)

BestRun(run_id='35', objective=0.7718702742260117, hyperparameters={'learning_rate': 0.0024870786738035154, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}, run_summary=None)


In [30]:
base.reset_seed()

## Prohledávání s destilací s doučením klasifikační hlavy předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [31]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-head-KD_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-head-KD_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [32]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [33]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [34]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.freeze_model(base.get_mobilenet(10))
)

Nastavení prohledávání.

In [35]:
best_distill_head = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-03-27 04:11:32,540] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9278,0.710567,0.7189,0.732462,0.718512,0.719215
2,0.6957,0.67157,0.7377,0.745886,0.736877,0.737689
3,0.6701,0.661333,0.7452,0.745244,0.744454,0.743318


[I 2025-03-27 04:14:02,269] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1538,0.859543,0.6691,0.68011,0.668834,0.667032
2,0.7931,0.739717,0.7112,0.717668,0.710116,0.709903
3,0.7261,0.708224,0.7237,0.721646,0.72297,0.719713
4,0.7029,0.684622,0.7348,0.737278,0.73479,0.732946
5,0.6897,0.692499,0.7308,0.737552,0.730573,0.731707
6,0.6823,0.670247,0.7421,0.743801,0.741855,0.741484


[I 2025-03-27 04:19:09,896] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3056,1.046661,0.6171,0.627847,0.616911,0.614105
2,0.9409,0.85084,0.6811,0.684181,0.68012,0.678152
3,0.8142,0.777804,0.7016,0.697311,0.700859,0.695707


[I 2025-03-27 04:21:38,507] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1099,0.826789,0.6778,0.68834,0.677519,0.675871
2,0.7715,0.725549,0.7147,0.721508,0.71363,0.713585
3,0.7149,0.698928,0.7273,0.72547,0.726559,0.723641
4,0.6948,0.67773,0.7383,0.740955,0.738327,0.736538
5,0.683,0.686537,0.7326,0.739386,0.732356,0.733541
6,0.6763,0.664569,0.7445,0.746203,0.744276,0.743939


[I 2025-03-27 04:26:30,399] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7825,0.684958,0.7302,0.752987,0.73003,0.730984
2,0.6678,0.657059,0.7468,0.753283,0.746295,0.74538
3,0.6543,0.657181,0.7505,0.757488,0.749648,0.749038
4,0.6495,0.630792,0.7621,0.768739,0.762208,0.762066
5,0.6449,0.642681,0.7503,0.759662,0.750113,0.752434
6,0.6421,0.629288,0.7613,0.764441,0.760974,0.761259


[I 2025-03-27 04:31:14,821] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7617,0.678197,0.7323,0.753483,0.732336,0.733158
2,0.6762,0.643582,0.7532,0.759692,0.752445,0.752994
3,0.6676,0.647455,0.7527,0.759874,0.751989,0.753226
4,0.6646,0.647137,0.752,0.765463,0.75235,0.751638
5,0.6578,0.664227,0.7427,0.756375,0.742365,0.745907
6,0.6536,0.633527,0.7576,0.763663,0.757266,0.757729
7,0.648,0.641343,0.7588,0.759592,0.758893,0.754499
8,0.6401,0.641531,0.7594,0.767204,0.759211,0.758689
9,0.6384,0.643367,0.7575,0.765402,0.757252,0.75744
10,0.631,0.628689,0.7614,0.766579,0.761173,0.76056


[I 2025-03-27 04:39:11,188] Trial 5 finished with value: 0.7605600951237762 and parameters: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 5 with value: 0.7605600951237762.


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7748,0.684372,0.73,0.752626,0.729814,0.730708
2,0.6678,0.656819,0.7474,0.753561,0.746905,0.745967
3,0.6542,0.657286,0.7505,0.757607,0.749628,0.749059
4,0.6493,0.630835,0.7621,0.768454,0.762211,0.762027
5,0.6448,0.642861,0.7503,0.759828,0.750112,0.752434
6,0.6419,0.629387,0.7616,0.764797,0.761293,0.761575


[I 2025-03-27 04:43:54,624] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7478,0.670216,0.7373,0.752466,0.737344,0.737257
2,0.6794,0.650267,0.7491,0.758145,0.748092,0.748264
3,0.6712,0.653538,0.749,0.757058,0.748301,0.749467
4,0.6678,0.650728,0.7512,0.765737,0.751467,0.750949
5,0.6608,0.667964,0.7416,0.756679,0.741316,0.745093
6,0.6565,0.635178,0.757,0.764357,0.756694,0.757604
7,0.6504,0.642771,0.7569,0.758344,0.757115,0.752422
8,0.6414,0.645877,0.7575,0.766863,0.757329,0.756884
9,0.6394,0.643479,0.7574,0.765531,0.757149,0.757414
10,0.6316,0.628751,0.7609,0.766099,0.760676,0.760066


[I 2025-03-27 04:51:48,081] Trial 7 finished with value: 0.7600664823234333 and parameters: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 5 with value: 0.7605600951237762.


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1662,0.891126,0.6607,0.672252,0.660432,0.658651
2,0.8176,0.758066,0.7063,0.711362,0.705242,0.704552
3,0.7409,0.720632,0.7191,0.716587,0.718362,0.714719
4,0.7139,0.694266,0.7311,0.733432,0.73109,0.72924
5,0.6989,0.70082,0.7276,0.734154,0.727369,0.728378
6,0.6905,0.678176,0.7378,0.739422,0.737522,0.737146


[I 2025-03-27 04:56:35,031] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7791,0.688736,0.7285,0.753022,0.728399,0.729162
2,0.6681,0.654325,0.747,0.754127,0.746324,0.745675
3,0.6556,0.656102,0.7497,0.757672,0.748913,0.748533


[I 2025-03-27 04:59:00,711] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.003553256925699131, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7701,0.682261,0.7311,0.759272,0.730913,0.731601
2,0.7061,0.674521,0.741,0.749299,0.740023,0.73749
3,0.6997,0.677933,0.7383,0.757857,0.737843,0.73568
4,0.6887,0.66451,0.7436,0.758485,0.743883,0.742867
5,0.6821,0.690585,0.7329,0.754486,0.7326,0.736542
6,0.6742,0.64284,0.7567,0.762467,0.756144,0.757415
7,0.6635,0.641113,0.755,0.762091,0.75471,0.752124
8,0.651,0.643226,0.7578,0.766887,0.757507,0.757796
9,0.6468,0.639769,0.7589,0.764016,0.758547,0.75874
10,0.6353,0.629234,0.7612,0.766473,0.760952,0.76032


[I 2025-03-27 05:07:12,563] Trial 10 finished with value: 0.7603199806174945 and parameters: {'learning_rate': 0.003553256925699131, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 5 with value: 0.7605600951237762.


Trial 11 with params: {'learning_rate': 0.0036979694616670403, 'weight_decay': 0.006, 'warmup_steps': 28, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7771,0.687604,0.7279,0.76084,0.727688,0.728513
2,0.7092,0.673425,0.7424,0.750611,0.741376,0.740088
3,0.7021,0.684053,0.7363,0.757632,0.735826,0.732671


[I 2025-03-27 05:09:41,199] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.0044803639948611095, 'weight_decay': 0.001, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.779,0.703912,0.7205,0.758576,0.719978,0.720431
2,0.7236,0.668747,0.7404,0.752276,0.739496,0.741721
3,0.7123,0.704414,0.7276,0.748795,0.727067,0.72049
4,0.7016,0.676485,0.7368,0.756255,0.737,0.737581
5,0.6956,0.696466,0.7294,0.753663,0.729092,0.733135
6,0.6831,0.648061,0.7538,0.759726,0.753312,0.753685


[I 2025-03-27 05:14:37,604] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.002518208951412107, 'weight_decay': 0.0, 'warmup_steps': 10, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7547,0.655921,0.746,0.7529,0.74583,0.744835
2,0.6868,0.669247,0.7415,0.754502,0.74031,0.73761
3,0.6803,0.668231,0.7433,0.753964,0.742658,0.742811


[I 2025-03-27 05:17:01,353] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0035985903311758468, 'weight_decay': 0.007, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7643,0.682798,0.7308,0.75941,0.730622,0.731187
2,0.7068,0.675036,0.7409,0.749402,0.739918,0.73752
3,0.7002,0.679087,0.7391,0.758732,0.738652,0.736191
4,0.6892,0.66476,0.7436,0.758491,0.743872,0.742916
5,0.6827,0.691029,0.733,0.754758,0.732699,0.736585
6,0.6745,0.643738,0.756,0.761982,0.755452,0.756747
7,0.6638,0.641176,0.7554,0.762791,0.755104,0.752628
8,0.6512,0.643255,0.7575,0.766634,0.757212,0.757521
9,0.647,0.63972,0.7588,0.76394,0.758445,0.758658
10,0.6354,0.629235,0.7613,0.766613,0.761054,0.760431


[I 2025-03-27 05:24:53,556] Trial 14 finished with value: 0.7604308242472327 and parameters: {'learning_rate': 0.0035985903311758468, 'weight_decay': 0.007, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 5 with value: 0.7605600951237762.


Trial 15 with params: {'learning_rate': 0.0021853805778439743, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7554,0.665073,0.7389,0.749593,0.738893,0.738096
2,0.6813,0.655456,0.7478,0.757998,0.746653,0.746256
3,0.6736,0.658302,0.7467,0.755642,0.746018,0.747026
4,0.6696,0.651419,0.7504,0.764288,0.750661,0.750074
5,0.6627,0.669716,0.7405,0.756497,0.740248,0.744278
6,0.6582,0.636064,0.7568,0.764803,0.756489,0.757603


[I 2025-03-27 05:29:41,917] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.002549155387318145, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7482,0.65588,0.7464,0.753424,0.746255,0.745279
2,0.6871,0.669835,0.741,0.754127,0.739814,0.736966
3,0.6808,0.66848,0.7434,0.753871,0.742768,0.74284
4,0.6743,0.648999,0.7495,0.758559,0.749789,0.748885
5,0.6678,0.672154,0.74,0.758078,0.73972,0.744304
6,0.6631,0.636055,0.7564,0.764243,0.756072,0.757343
7,0.6555,0.64889,0.7549,0.758201,0.755263,0.750062
8,0.6446,0.649047,0.755,0.765515,0.754843,0.754535
9,0.6417,0.642052,0.7579,0.764709,0.757614,0.757854
10,0.6328,0.628897,0.7606,0.766059,0.760366,0.759768


[I 2025-03-27 05:37:35,231] Trial 16 finished with value: 0.7597678180873688 and parameters: {'learning_rate': 0.002549155387318145, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 5 with value: 0.7605600951237762.


Trial 17 with params: {'learning_rate': 0.0010500483500959763, 'weight_decay': 0.005, 'warmup_steps': 19, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7821,0.688777,0.7283,0.752782,0.728203,0.72895
2,0.6681,0.654557,0.7468,0.754019,0.746141,0.745515
3,0.6556,0.656215,0.7493,0.757302,0.748508,0.74809
4,0.6518,0.631376,0.762,0.769257,0.762136,0.761636
5,0.6468,0.643158,0.7489,0.757613,0.748688,0.751152
6,0.6441,0.628729,0.7614,0.764224,0.761018,0.761127


[I 2025-03-27 05:42:14,526] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.004571777982388411, 'weight_decay': 0.007, 'warmup_steps': 14, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7787,0.70686,0.7186,0.758199,0.718015,0.718669
2,0.7249,0.66764,0.7395,0.75133,0.73863,0.741148
3,0.7134,0.703329,0.7272,0.74678,0.726662,0.719775


[I 2025-03-27 05:44:34,776] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0037879597571121707, 'weight_decay': 0.004, 'warmup_steps': 27, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7775,0.690388,0.727,0.761785,0.726769,0.727574
2,0.711,0.673415,0.7423,0.750853,0.741284,0.740614
3,0.7034,0.688132,0.7358,0.757775,0.735315,0.731722
4,0.692,0.667393,0.7425,0.758361,0.742786,0.742033
5,0.6859,0.692576,0.7307,0.752578,0.730436,0.733928
6,0.6764,0.648009,0.7535,0.761337,0.752918,0.754403


[I 2025-03-27 05:49:15,973] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.00018075272535631178, 'weight_decay': 0.001, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0177,0.754236,0.7022,0.713308,0.701796,0.701198
2,0.7239,0.693042,0.7272,0.735769,0.726146,0.72673
3,0.6875,0.675356,0.738,0.737562,0.737236,0.735566


[I 2025-03-27 05:51:41,420] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.001374694191134823, 'weight_decay': 0.002, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.778,0.686367,0.7274,0.753274,0.727299,0.728339
2,0.671,0.641812,0.756,0.758972,0.7554,0.755082
3,0.6594,0.648128,0.7534,0.760117,0.752696,0.752638
4,0.6568,0.637423,0.7582,0.768102,0.758526,0.757588
5,0.6507,0.6512,0.7456,0.756229,0.745294,0.748216
6,0.6478,0.630599,0.7601,0.763627,0.759702,0.759721
7,0.6424,0.639985,0.7616,0.762132,0.761459,0.75755
8,0.6369,0.632176,0.7634,0.768405,0.763177,0.762655
9,0.6359,0.641,0.7601,0.766363,0.759793,0.759741
10,0.6299,0.628616,0.7613,0.766127,0.761086,0.760447


[I 2025-03-27 06:00:00,863] Trial 21 finished with value: 0.7604474554615893 and parameters: {'learning_rate': 0.001374694191134823, 'weight_decay': 0.002, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 2.5}. Best is trial 5 with value: 0.7605600951237762.


Trial 22 with params: {'learning_rate': 0.0010950442824421052, 'weight_decay': 0.003, 'warmup_steps': 24, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7864,0.688791,0.7277,0.752437,0.727596,0.728368
2,0.6685,0.65288,0.7477,0.754367,0.746997,0.746379
3,0.656,0.655392,0.7487,0.756518,0.747936,0.747572
4,0.6524,0.63188,0.7614,0.768987,0.761555,0.760977
5,0.6473,0.64407,0.7486,0.75727,0.748411,0.750821
6,0.6446,0.628887,0.7607,0.763404,0.760311,0.760367


[I 2025-03-27 06:04:50,520] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0003899403924597235, 'weight_decay': 0.0, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8781,0.692958,0.7273,0.743874,0.72689,0.728339
2,0.6825,0.660715,0.742,0.74844,0.741367,0.741836
3,0.6619,0.656886,0.7493,0.751015,0.74842,0.747848
4,0.6538,0.638757,0.7566,0.759426,0.756648,0.755643
5,0.6476,0.652678,0.7441,0.753455,0.744026,0.745819
6,0.6441,0.633838,0.7588,0.760349,0.758591,0.758246
7,0.6399,0.636419,0.7617,0.761056,0.761576,0.758535
8,0.6369,0.634571,0.7607,0.76649,0.760438,0.760284
9,0.6369,0.641318,0.7567,0.760697,0.756316,0.756575
10,0.6342,0.635702,0.7604,0.76512,0.760112,0.759608


[I 2025-03-27 06:12:47,874] Trial 23 finished with value: 0.7596075343129438 and parameters: {'learning_rate': 0.0003899403924597235, 'weight_decay': 0.0, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 5 with value: 0.7605600951237762.


Trial 24 with params: {'learning_rate': 0.0029879927857027525, 'weight_decay': 0.008, 'warmup_steps': 19, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7649,0.661017,0.7467,0.757322,0.746381,0.746075
2,0.6954,0.685397,0.7357,0.749736,0.734755,0.728894
3,0.6896,0.671151,0.7409,0.755891,0.740269,0.740069


[I 2025-03-27 06:15:09,398] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0015489366670967813, 'weight_decay': 0.005, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7612,0.684426,0.7297,0.755628,0.729607,0.730846
2,0.6724,0.637948,0.7555,0.75753,0.754909,0.754992
3,0.6621,0.64586,0.7543,0.760772,0.7536,0.754009
4,0.6594,0.641004,0.7544,0.765464,0.754806,0.753712
5,0.6531,0.656087,0.7447,0.756521,0.744378,0.747615
6,0.6497,0.631499,0.7596,0.763971,0.759186,0.759345
7,0.6443,0.640598,0.7614,0.761856,0.761317,0.757283
8,0.638,0.634673,0.7625,0.768087,0.762266,0.761766
9,0.6368,0.642073,0.7601,0.766848,0.759819,0.759806
10,0.6303,0.6286,0.761,0.765915,0.760782,0.760166


[I 2025-03-27 06:23:29,646] Trial 25 finished with value: 0.7601661349980244 and parameters: {'learning_rate': 0.0015489366670967813, 'weight_decay': 0.005, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 5 with value: 0.7605600951237762.


Trial 26 with params: {'learning_rate': 0.0037265547828964274, 'weight_decay': 0.001, 'warmup_steps': 19, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7724,0.687999,0.7275,0.7605,0.727289,0.728003
2,0.7097,0.6733,0.7428,0.751124,0.741763,0.740491
3,0.7023,0.684709,0.7355,0.756766,0.735018,0.731795


[I 2025-03-27 06:25:59,386] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0035075938351392137, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7566,0.679421,0.7328,0.758756,0.732579,0.733169
2,0.7046,0.676412,0.7394,0.748155,0.738463,0.735287
3,0.6986,0.675685,0.7391,0.758059,0.738602,0.73706
4,0.6879,0.663977,0.7442,0.758893,0.744494,0.743485
5,0.6812,0.689895,0.7327,0.754561,0.732408,0.7366
6,0.6736,0.641297,0.7577,0.763007,0.757159,0.758378
7,0.663,0.641007,0.7554,0.762223,0.755152,0.752447
8,0.6506,0.643172,0.7578,0.766888,0.757519,0.757798
9,0.6465,0.639845,0.7583,0.763462,0.757952,0.758139
10,0.6351,0.629225,0.7613,0.766584,0.761051,0.76042


[I 2025-03-27 06:34:19,422] Trial 27 finished with value: 0.7604201413011444 and parameters: {'learning_rate': 0.0035075938351392137, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}. Best is trial 5 with value: 0.7605600951237762.


Trial 28 with params: {'learning_rate': 0.0014917628138361467, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.751,0.685499,0.7292,0.755481,0.729114,0.730333
2,0.6717,0.638623,0.7554,0.757373,0.754821,0.754739
3,0.661,0.646399,0.7536,0.760151,0.752897,0.753134
4,0.6585,0.639756,0.7554,0.766119,0.755797,0.754652
5,0.6522,0.654257,0.7449,0.756439,0.744592,0.747747
6,0.649,0.631179,0.7593,0.763285,0.758906,0.758997
7,0.6436,0.640433,0.7622,0.76265,0.762088,0.758092
8,0.6376,0.633635,0.7629,0.768195,0.76268,0.76216
9,0.6365,0.641695,0.7604,0.766883,0.760106,0.760079
10,0.6301,0.628605,0.7611,0.765952,0.760885,0.760227


[I 2025-03-27 06:42:34,739] Trial 28 finished with value: 0.7602269955406308 and parameters: {'learning_rate': 0.0014917628138361467, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 2.5}. Best is trial 5 with value: 0.7605600951237762.


Trial 29 with params: {'learning_rate': 0.004340056460419262, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7733,0.70006,0.7233,0.761194,0.722842,0.723612
2,0.7211,0.671067,0.7408,0.752413,0.73985,0.741606
3,0.7109,0.705351,0.728,0.751707,0.727444,0.721741


[I 2025-03-27 06:44:56,434] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0015549498669164572, 'weight_decay': 0.002, 'warmup_steps': 29, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7788,0.6861,0.7288,0.754893,0.72872,0.729896
2,0.6728,0.638077,0.7551,0.757353,0.75451,0.754644
3,0.6624,0.64582,0.7545,0.760953,0.753802,0.75424
4,0.6597,0.641271,0.754,0.765212,0.75441,0.75333
5,0.6533,0.656572,0.7448,0.756766,0.744467,0.747731
6,0.6499,0.631583,0.7594,0.763845,0.758984,0.759129


[I 2025-03-27 06:49:38,583] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.004139848384721143, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7664,0.694808,0.7245,0.760967,0.724153,0.725096
2,0.7175,0.673934,0.7416,0.752495,0.740616,0.741568
3,0.7084,0.702189,0.7287,0.753174,0.728138,0.723328
4,0.6965,0.673599,0.7381,0.756591,0.738396,0.738317
5,0.6908,0.694737,0.7311,0.75344,0.730859,0.734191
6,0.6795,0.649582,0.7542,0.762111,0.753567,0.754688


[I 2025-03-27 06:54:35,474] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.003397667647980948, 'weight_decay': 0.004, 'warmup_steps': 5, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7569,0.67879,0.7342,0.758893,0.733952,0.734497
2,0.7026,0.678335,0.7398,0.749094,0.738896,0.735105
3,0.6968,0.673024,0.7406,0.758407,0.740042,0.73902


[I 2025-03-27 06:56:57,396] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 5.8367877335939255e-05, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2855,1.023457,0.6259,0.636515,0.625711,0.623021
2,0.9214,0.835418,0.6846,0.687969,0.683593,0.681779
3,0.8019,0.768233,0.7043,0.700179,0.703567,0.69853
4,0.7563,0.731939,0.7177,0.719896,0.717644,0.715473
5,0.7326,0.731306,0.7163,0.72163,0.716076,0.716829
6,0.7196,0.706458,0.7272,0.727627,0.726855,0.72591


[I 2025-03-27 07:01:40,167] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0019356297372892925, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7453,0.675726,0.7337,0.753224,0.733731,0.734406
2,0.6772,0.645337,0.7506,0.75775,0.749748,0.750108
3,0.6687,0.649021,0.7522,0.759585,0.751486,0.752748
4,0.6657,0.648512,0.7528,0.766552,0.753127,0.752476
5,0.6588,0.665572,0.7423,0.75636,0.741971,0.745561
6,0.6546,0.634037,0.7569,0.763474,0.756565,0.757208


[I 2025-03-27 07:06:34,238] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.0006554115916869456, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7935,0.682529,0.7302,0.752183,0.729832,0.731308
2,0.6697,0.654204,0.7472,0.751976,0.74666,0.746122
3,0.6548,0.659038,0.7496,0.75596,0.748607,0.748345
4,0.649,0.63202,0.762,0.766529,0.762083,0.761605
5,0.6442,0.645947,0.7488,0.759725,0.748687,0.750923
6,0.6411,0.630391,0.7614,0.764163,0.761161,0.761326
7,0.6368,0.633373,0.7635,0.763482,0.763346,0.760269
8,0.6335,0.630726,0.7641,0.768827,0.763931,0.763313
9,0.6335,0.637649,0.7588,0.76291,0.758437,0.758472
10,0.6298,0.630815,0.7619,0.766628,0.761658,0.76105


[I 2025-03-27 07:14:30,863] Trial 35 finished with value: 0.7610497285272 and parameters: {'learning_rate': 0.0006554115916869456, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}. Best is trial 35 with value: 0.7610497285272.


Trial 36 with params: {'learning_rate': 0.00018295064185792114, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9888,0.749146,0.7036,0.714784,0.703184,0.702708
2,0.7213,0.691722,0.7275,0.736111,0.726451,0.727037
3,0.6864,0.674628,0.7378,0.737441,0.737044,0.735402


[I 2025-03-27 07:16:55,652] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.00022093021195818488, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9396,0.727542,0.711,0.722469,0.710583,0.710498
2,0.7074,0.681716,0.7329,0.741392,0.731921,0.732586
3,0.678,0.667747,0.7415,0.741294,0.740736,0.739333
4,0.6667,0.653463,0.7501,0.753128,0.750106,0.748809
5,0.6589,0.664582,0.7399,0.747526,0.739747,0.741138
6,0.6544,0.644067,0.7537,0.755281,0.753519,0.753121


[I 2025-03-27 07:21:50,135] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0007175088625232427, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7831,0.682833,0.7303,0.752825,0.729974,0.731215
2,0.6687,0.655155,0.7466,0.752015,0.746079,0.745331
3,0.6543,0.658427,0.7502,0.756801,0.74925,0.748839
4,0.6489,0.631371,0.7631,0.768209,0.76319,0.762873
5,0.6443,0.644551,0.7488,0.759317,0.748654,0.750932
6,0.6413,0.630034,0.7617,0.764591,0.761446,0.761647
7,0.6369,0.63387,0.7631,0.763406,0.762943,0.759838
8,0.6334,0.630328,0.7634,0.767992,0.763223,0.762541
9,0.6334,0.637519,0.7593,0.763507,0.758937,0.758989
10,0.6295,0.63029,0.7616,0.766318,0.761369,0.760782


[I 2025-03-27 07:29:43,221] Trial 38 finished with value: 0.760782189870727 and parameters: {'learning_rate': 0.0007175088625232427, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 35 with value: 0.7610497285272.


Trial 39 with params: {'learning_rate': 0.001225373596517502, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7572,0.686995,0.7275,0.752521,0.727391,0.728314
2,0.6691,0.647701,0.7513,0.756303,0.7506,0.749999
3,0.6571,0.652013,0.7504,0.757817,0.749658,0.749389
4,0.6542,0.6339,0.7598,0.767922,0.760007,0.759174
5,0.6487,0.646977,0.747,0.756156,0.746743,0.749315
6,0.6459,0.629555,0.7598,0.762504,0.759398,0.75936


[I 2025-03-27 07:34:30,521] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0005204177961071862, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8144,0.684663,0.7308,0.750142,0.730363,0.732145
2,0.6737,0.653943,0.7472,0.75186,0.746633,0.746721
3,0.6569,0.658346,0.7507,0.756096,0.7497,0.749726
4,0.6502,0.634316,0.7594,0.762815,0.759481,0.758559
5,0.6448,0.649192,0.7481,0.758751,0.748012,0.749951
6,0.6416,0.631229,0.7593,0.761433,0.759102,0.759066
7,0.6374,0.633539,0.7637,0.763213,0.763568,0.760532
8,0.6344,0.632099,0.7625,0.767988,0.762261,0.761882
9,0.6344,0.638699,0.7583,0.762324,0.757914,0.758053
10,0.6312,0.632592,0.7616,0.766211,0.761362,0.760793


[I 2025-03-27 07:42:21,327] Trial 40 finished with value: 0.7607932646863834 and parameters: {'learning_rate': 0.0005204177961071862, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 35 with value: 0.7610497285272.


Trial 41 with params: {'learning_rate': 0.0005700534719446925, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8135,0.683621,0.7312,0.751819,0.73077,0.732456
2,0.6721,0.653655,0.7471,0.75137,0.746533,0.746329
3,0.656,0.659104,0.7506,0.75659,0.749598,0.749577
4,0.6496,0.633339,0.7605,0.764263,0.760598,0.759779
5,0.6445,0.648028,0.7481,0.758937,0.748001,0.750048
6,0.6413,0.630863,0.7595,0.761922,0.759284,0.759336


[I 2025-03-27 07:47:02,709] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.000205184884717044, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9546,0.734962,0.7089,0.720157,0.708506,0.708198
2,0.7123,0.685413,0.7301,0.738746,0.729078,0.729721
3,0.6811,0.670229,0.7404,0.740204,0.739622,0.738143
4,0.6691,0.655793,0.7485,0.751529,0.748507,0.747164
5,0.661,0.666636,0.739,0.746488,0.73882,0.740131
6,0.6564,0.645864,0.7519,0.753595,0.751719,0.751335


[I 2025-03-27 07:51:45,962] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0005669493937280702, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.807,0.683425,0.7309,0.751468,0.730469,0.732157
2,0.672,0.653576,0.7477,0.752057,0.747123,0.746954
3,0.656,0.659,0.7506,0.756445,0.749605,0.749595
4,0.6496,0.633385,0.7605,0.764251,0.760593,0.759789
5,0.6445,0.648088,0.7477,0.758483,0.7476,0.749628
6,0.6413,0.630868,0.7597,0.762059,0.759482,0.759519
7,0.637,0.633232,0.7639,0.763568,0.763749,0.760707
8,0.6339,0.631527,0.7636,0.768791,0.763398,0.762923
9,0.634,0.638194,0.7588,0.76288,0.758432,0.758531
10,0.6306,0.631856,0.7614,0.766106,0.761154,0.760561


[I 2025-03-27 07:59:47,312] Trial 43 finished with value: 0.760560948578733 and parameters: {'learning_rate': 0.0005669493937280702, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 4.5}. Best is trial 35 with value: 0.7610497285272.


Trial 44 with params: {'learning_rate': 0.00063155918393816, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7996,0.682664,0.7313,0.752982,0.73089,0.732399
2,0.6703,0.653967,0.7471,0.751577,0.746545,0.746052
3,0.655,0.659169,0.7493,0.75573,0.74831,0.748163
4,0.6491,0.632337,0.7613,0.765576,0.761386,0.760807
5,0.6443,0.646526,0.7486,0.759678,0.748485,0.750715
6,0.6411,0.630521,0.7608,0.763429,0.760558,0.760685
7,0.6368,0.633271,0.7632,0.763013,0.763059,0.759993
8,0.6336,0.630913,0.7637,0.768512,0.763523,0.76292
9,0.6336,0.637753,0.7587,0.762778,0.758341,0.758385
10,0.63,0.631056,0.762,0.766694,0.761751,0.761129


[I 2025-03-27 08:07:49,288] Trial 44 finished with value: 0.7611287079618219 and parameters: {'learning_rate': 0.00063155918393816, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 45 with params: {'learning_rate': 0.00034737705396206735, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8769,0.696669,0.7248,0.740359,0.724402,0.725612
2,0.686,0.66406,0.7406,0.748,0.739887,0.740549
3,0.6642,0.657501,0.7482,0.749091,0.747401,0.746552


[I 2025-03-27 08:10:11,220] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0009777798959522516, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7616,0.687917,0.7287,0.75268,0.728592,0.729331
2,0.6674,0.656582,0.7459,0.75334,0.745339,0.744682
3,0.6549,0.656749,0.7504,0.758241,0.7496,0.749099
4,0.6508,0.630869,0.7618,0.768852,0.761925,0.761556
5,0.646,0.642288,0.7483,0.757195,0.748085,0.75061
6,0.6433,0.628693,0.7622,0.765313,0.761838,0.762096
7,0.6384,0.636773,0.7624,0.763719,0.762222,0.75897
8,0.6342,0.629715,0.7639,0.767938,0.763731,0.76301
9,0.634,0.638306,0.7585,0.763433,0.758157,0.758209
10,0.6292,0.629078,0.761,0.765675,0.760773,0.760116


[I 2025-03-27 08:18:00,507] Trial 46 finished with value: 0.7601163720690252 and parameters: {'learning_rate': 0.0009777798959522516, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 47 with params: {'learning_rate': 0.001015464180032683, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7616,0.688464,0.729,0.753327,0.728917,0.729665
2,0.6676,0.655693,0.7463,0.753804,0.745663,0.745028
3,0.6551,0.656574,0.7497,0.757681,0.748899,0.748415
4,0.6512,0.631051,0.7621,0.769215,0.762229,0.761798
5,0.6463,0.642587,0.749,0.757734,0.748787,0.751279
6,0.6437,0.628669,0.7619,0.764885,0.761528,0.761754
7,0.6387,0.637133,0.7625,0.763823,0.762333,0.758973
8,0.6344,0.629757,0.7635,0.767578,0.763333,0.762608
9,0.6341,0.638514,0.7588,0.763824,0.758454,0.758486
10,0.6292,0.628995,0.7607,0.76537,0.760477,0.75981


[I 2025-03-27 08:26:32,432] Trial 47 finished with value: 0.7598104207347329 and parameters: {'learning_rate': 0.001015464180032683, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 48 with params: {'learning_rate': 0.00021569661681237394, 'weight_decay': 0.01, 'warmup_steps': 7, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9555,0.731029,0.7104,0.721988,0.709999,0.709821
2,0.7095,0.683065,0.7313,0.73984,0.730296,0.730899
3,0.6792,0.668608,0.7416,0.741425,0.740825,0.739418


[I 2025-03-27 08:28:54,879] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0004104869495610688, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8409,0.68957,0.7298,0.746738,0.729397,0.731001
2,0.6799,0.658959,0.7427,0.748972,0.742101,0.742568
3,0.6605,0.656562,0.7496,0.751834,0.748683,0.748236
4,0.6529,0.637752,0.7573,0.760031,0.757345,0.756326
5,0.6469,0.651949,0.7447,0.754244,0.744617,0.746392
6,0.6434,0.633193,0.7595,0.761138,0.759301,0.759003
7,0.6393,0.635715,0.7615,0.760903,0.761379,0.758359
8,0.6363,0.634034,0.7612,0.766917,0.760929,0.760754
9,0.6364,0.640771,0.7575,0.761514,0.757111,0.75737
10,0.6336,0.635074,0.7602,0.764881,0.759927,0.759405


[I 2025-03-27 08:36:51,556] Trial 49 finished with value: 0.7594045340703877 and parameters: {'learning_rate': 0.0004104869495610688, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 5.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 50 with params: {'learning_rate': 0.0009942237827925895, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7637,0.688391,0.7288,0.753144,0.728699,0.72944
2,0.6675,0.656142,0.7464,0.753994,0.745814,0.745205
3,0.655,0.656698,0.7492,0.757173,0.7484,0.747919
4,0.651,0.630941,0.7627,0.76987,0.762828,0.76246
5,0.6461,0.642385,0.7489,0.757804,0.748693,0.751225
6,0.6434,0.628669,0.7618,0.764826,0.761431,0.761672
7,0.6385,0.636944,0.7624,0.763685,0.76222,0.758912
8,0.6343,0.629723,0.7638,0.767844,0.763638,0.762898
9,0.6341,0.638389,0.7589,0.763843,0.758559,0.758583
10,0.6292,0.62904,0.7608,0.76545,0.760576,0.759923


[I 2025-03-27 08:45:19,559] Trial 50 finished with value: 0.759922723731641 and parameters: {'learning_rate': 0.0009942237827925895, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 51 with params: {'learning_rate': 0.00044891102302703554, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8297,0.687349,0.7307,0.748698,0.730277,0.731999
2,0.6774,0.65655,0.7441,0.749684,0.743539,0.743893
3,0.659,0.65696,0.7503,0.7538,0.749348,0.749142
4,0.6517,0.636284,0.7595,0.762395,0.759544,0.758613
5,0.6459,0.650892,0.7459,0.755635,0.745815,0.747618
6,0.6426,0.632266,0.7596,0.761439,0.759415,0.759223


[I 2025-03-27 08:50:03,951] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0027158955385139997, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7609,0.654512,0.7497,0.755746,0.749448,0.748785
2,0.6902,0.676389,0.7391,0.752716,0.737932,0.733767
3,0.6843,0.669972,0.7429,0.755033,0.742263,0.742292
4,0.6768,0.649457,0.7506,0.759261,0.750865,0.749776
5,0.6703,0.674074,0.74,0.758923,0.739714,0.744392
6,0.6654,0.635149,0.7576,0.764581,0.75724,0.758467
7,0.657,0.649814,0.7544,0.758485,0.754734,0.749544
8,0.6457,0.647756,0.7546,0.764844,0.754417,0.754181
9,0.6426,0.641413,0.7582,0.764594,0.757889,0.758141
10,0.6332,0.628969,0.7608,0.766282,0.76056,0.759962


[I 2025-03-27 08:57:58,944] Trial 52 finished with value: 0.7599623524070915 and parameters: {'learning_rate': 0.0027158955385139997, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 53 with params: {'learning_rate': 0.0012662584041563882, 'weight_decay': 0.0, 'warmup_steps': 20, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7749,0.68696,0.7282,0.753418,0.728099,0.728988
2,0.6698,0.645992,0.7526,0.756926,0.751907,0.751362
3,0.6578,0.650739,0.7514,0.758617,0.750671,0.750422
4,0.655,0.634896,0.7591,0.767938,0.759346,0.758503
5,0.6493,0.648221,0.7468,0.756366,0.746523,0.749206
6,0.6465,0.629879,0.7601,0.762996,0.759705,0.75968
7,0.6412,0.639291,0.7617,0.762444,0.761528,0.75776
8,0.6361,0.631031,0.7631,0.767545,0.76287,0.762307
9,0.6354,0.640231,0.7605,0.76636,0.76018,0.760173
10,0.6296,0.628661,0.7608,0.765642,0.760587,0.759964


[I 2025-03-27 09:05:53,139] Trial 53 finished with value: 0.7599641277449907 and parameters: {'learning_rate': 0.0012662584041563882, 'weight_decay': 0.0, 'warmup_steps': 20, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 54 with params: {'learning_rate': 0.0006054414211845816, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8104,0.683136,0.7314,0.752654,0.73097,0.732591
2,0.6711,0.653827,0.7473,0.751543,0.746737,0.746326
3,0.6554,0.659276,0.7502,0.756561,0.749192,0.74911
4,0.6493,0.632751,0.7611,0.765202,0.761194,0.76052
5,0.6443,0.647169,0.7482,0.759219,0.748095,0.75024
6,0.6411,0.630664,0.7606,0.763196,0.760372,0.760462
7,0.6369,0.63323,0.7637,0.763398,0.763552,0.76049
8,0.6337,0.631155,0.7635,0.768455,0.763312,0.762753
9,0.6338,0.637906,0.7587,0.762804,0.758341,0.758412
10,0.6302,0.631359,0.7618,0.766412,0.761553,0.760936


[I 2025-03-27 09:13:49,057] Trial 54 finished with value: 0.7609360483893066 and parameters: {'learning_rate': 0.0006054414211845816, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 55 with params: {'learning_rate': 0.00041006057864054, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8554,0.690352,0.7298,0.746829,0.729374,0.730974
2,0.6804,0.659088,0.7427,0.748967,0.742109,0.742564
3,0.6607,0.656701,0.7494,0.75181,0.748481,0.748062
4,0.6529,0.637796,0.7573,0.760026,0.757351,0.756322
5,0.6469,0.651989,0.7445,0.754092,0.74443,0.746229
6,0.6435,0.633217,0.7591,0.760775,0.758899,0.758626


[I 2025-03-27 09:18:34,771] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0006559855710494636, 'weight_decay': 0.008, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.815,0.683402,0.7311,0.753368,0.730717,0.732183
2,0.6702,0.654469,0.7471,0.751826,0.746557,0.746029
3,0.655,0.659159,0.7498,0.756208,0.748806,0.748502
4,0.6491,0.632043,0.7615,0.766119,0.76158,0.761116
5,0.6443,0.645934,0.748,0.75894,0.747888,0.750122
6,0.6412,0.630414,0.761,0.763874,0.760761,0.760959
7,0.6368,0.633462,0.7637,0.763729,0.763556,0.760478
8,0.6335,0.630746,0.7642,0.768933,0.764039,0.763407
9,0.6336,0.637666,0.759,0.763112,0.758631,0.758656
10,0.6298,0.63081,0.7617,0.766413,0.76146,0.760851


[I 2025-03-27 09:26:29,578] Trial 56 finished with value: 0.7608511608090005 and parameters: {'learning_rate': 0.0006559855710494636, 'weight_decay': 0.008, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 57 with params: {'learning_rate': 0.0006683409885640866, 'weight_decay': 0.007, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8184,0.683615,0.7307,0.753123,0.730334,0.731786
2,0.6701,0.654692,0.7468,0.751568,0.746256,0.745615
3,0.6549,0.659092,0.7499,0.756319,0.748918,0.748613
4,0.6491,0.631914,0.7623,0.767036,0.76238,0.761957
5,0.6443,0.645632,0.7485,0.759297,0.748371,0.750626
6,0.6412,0.630357,0.7609,0.763849,0.760658,0.760862
7,0.6368,0.633564,0.7635,0.76355,0.763345,0.760253
8,0.6335,0.630658,0.7641,0.76875,0.76394,0.763309
9,0.6335,0.637627,0.7594,0.763547,0.759027,0.759056
10,0.6298,0.630691,0.7616,0.76629,0.761361,0.76076


[I 2025-03-27 09:34:18,194] Trial 57 finished with value: 0.760760385659038 and parameters: {'learning_rate': 0.0006683409885640866, 'weight_decay': 0.007, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 4.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 58 with params: {'learning_rate': 0.0005681778838074924, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8193,0.683902,0.7312,0.751746,0.730786,0.732464
2,0.6722,0.653712,0.7471,0.751362,0.746535,0.746329
3,0.656,0.659129,0.7503,0.756264,0.7493,0.74928
4,0.6496,0.633385,0.7603,0.764093,0.760401,0.759587
5,0.6445,0.648079,0.7481,0.758981,0.74801,0.750063
6,0.6413,0.630879,0.7595,0.761871,0.759284,0.759321


[I 2025-03-27 09:39:01,253] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0005840556168096027, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8192,0.68368,0.7318,0.752653,0.731388,0.733072
2,0.6718,0.653771,0.7466,0.750694,0.746038,0.745674
3,0.6558,0.659267,0.7505,0.756687,0.749489,0.749422
4,0.6495,0.633125,0.7602,0.763996,0.760299,0.75951
5,0.6444,0.647688,0.7482,0.759076,0.748096,0.750188
6,0.6412,0.630797,0.7604,0.762928,0.760178,0.760258
7,0.637,0.633242,0.7638,0.76345,0.763648,0.760557
8,0.6338,0.631378,0.7635,0.768545,0.763308,0.76277
9,0.6339,0.638059,0.759,0.763049,0.758641,0.758708
10,0.6304,0.63162,0.7613,0.765846,0.761048,0.760435


[I 2025-03-27 09:46:46,630] Trial 59 finished with value: 0.7604347018604051 and parameters: {'learning_rate': 0.0005840556168096027, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 60 with params: {'learning_rate': 0.0001810459911618726, 'weight_decay': 0.008, 'warmup_steps': 26, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.015,0.753679,0.7024,0.713488,0.701988,0.701382
2,0.7236,0.692882,0.7273,0.735896,0.726246,0.726837
3,0.6874,0.675266,0.7381,0.737668,0.737341,0.735674


[I 2025-03-27 09:49:08,026] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.000557340241759661, 'weight_decay': 0.007, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.834,0.68471,0.7313,0.751847,0.730867,0.732584
2,0.6729,0.653875,0.7467,0.750932,0.74613,0.745968
3,0.6564,0.659145,0.7502,0.756116,0.749201,0.749187
4,0.6498,0.633629,0.7607,0.764412,0.7608,0.759954
5,0.6446,0.648351,0.7483,0.759035,0.748204,0.750212
6,0.6414,0.630984,0.7599,0.762247,0.759689,0.759712
7,0.6371,0.633332,0.7639,0.763482,0.763752,0.760704
8,0.634,0.631683,0.7635,0.768785,0.763295,0.762825
9,0.6341,0.638292,0.7586,0.762698,0.758235,0.758338
10,0.6307,0.631994,0.7614,0.766064,0.761154,0.760557


[I 2025-03-27 09:57:17,816] Trial 61 finished with value: 0.7605574366287031 and parameters: {'learning_rate': 0.000557340241759661, 'weight_decay': 0.007, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 62 with params: {'learning_rate': 0.0007898099894901177, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7989,0.684524,0.7298,0.752266,0.7296,0.730494
2,0.6684,0.656572,0.7475,0.753486,0.746992,0.746054
3,0.6544,0.657661,0.7501,0.757218,0.74921,0.748654
4,0.6493,0.630965,0.762,0.768046,0.762106,0.761895
5,0.6447,0.64327,0.7502,0.760132,0.750022,0.752385
6,0.6418,0.62959,0.7614,0.764627,0.761101,0.761369
7,0.6372,0.63477,0.7628,0.763477,0.762647,0.759496
8,0.6335,0.630006,0.7637,0.767973,0.763538,0.762796
9,0.6335,0.637579,0.7595,0.763825,0.759145,0.75914
10,0.6293,0.629809,0.7609,0.765589,0.760658,0.760028


[I 2025-03-27 10:05:35,239] Trial 62 finished with value: 0.7600275413303457 and parameters: {'learning_rate': 0.0007898099894901177, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 63 with params: {'learning_rate': 0.0008119033286046865, 'weight_decay': 0.007, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7982,0.68494,0.7294,0.7519,0.729223,0.73007
2,0.6683,0.6569,0.7474,0.753612,0.746887,0.745966
3,0.6544,0.657433,0.7503,0.757272,0.749413,0.748797
4,0.6494,0.630905,0.762,0.768379,0.76211,0.761939
5,0.6448,0.642965,0.7503,0.759916,0.750112,0.752449
6,0.6419,0.629446,0.7617,0.76493,0.761392,0.761679
7,0.6373,0.635054,0.7631,0.763795,0.762942,0.759784
8,0.6336,0.629941,0.7638,0.767945,0.763643,0.762885
9,0.6335,0.637644,0.759,0.76338,0.758645,0.75864
10,0.6293,0.629687,0.7612,0.765838,0.760953,0.760315


[I 2025-03-27 10:13:26,495] Trial 63 finished with value: 0.7603150486566023 and parameters: {'learning_rate': 0.0008119033286046865, 'weight_decay': 0.007, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 64 with params: {'learning_rate': 0.0004849731106634812, 'weight_decay': 0.01, 'warmup_steps': 6, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8281,0.686127,0.7303,0.749227,0.729886,0.731666
2,0.6755,0.654933,0.745,0.75007,0.744442,0.74467
3,0.6579,0.657703,0.7507,0.755265,0.749734,0.749716
4,0.6509,0.63519,0.76,0.763241,0.76004,0.759147
5,0.6453,0.650032,0.7476,0.757971,0.747494,0.7494
6,0.642,0.631654,0.7598,0.761712,0.759616,0.759449


[I 2025-03-27 10:18:03,745] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0006867897673032607, 'weight_decay': 0.008, 'warmup_steps': 27, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8209,0.683878,0.7307,0.753233,0.730358,0.731738
2,0.6699,0.655025,0.7471,0.752117,0.746585,0.745918
3,0.6548,0.658919,0.7502,0.756723,0.749217,0.748839
4,0.6491,0.63172,0.7625,0.767386,0.762593,0.762187
5,0.6444,0.645204,0.7487,0.75934,0.74856,0.750812
6,0.6413,0.630256,0.7611,0.76405,0.760843,0.761053
7,0.6369,0.633713,0.7635,0.763627,0.763344,0.76023
8,0.6335,0.630534,0.7634,0.768144,0.763229,0.762596
9,0.6335,0.637592,0.7593,0.763474,0.758925,0.758963
10,0.6297,0.630536,0.7615,0.766083,0.761263,0.760644


[I 2025-03-27 10:25:51,350] Trial 65 finished with value: 0.7606442752198398 and parameters: {'learning_rate': 0.0006867897673032607, 'weight_decay': 0.008, 'warmup_steps': 27, 'lambda_param': 0.8, 'temperature': 5.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 66 with params: {'learning_rate': 0.0002115341401239819, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9884,0.736235,0.708,0.719669,0.707598,0.707332
2,0.7124,0.684609,0.7303,0.738875,0.729298,0.729909
3,0.6805,0.6695,0.7408,0.740593,0.740015,0.738561


[I 2025-03-27 10:28:11,775] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0005229317221679667, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8318,0.685332,0.7311,0.750784,0.730672,0.732534
2,0.6741,0.654078,0.7472,0.751613,0.746638,0.746657
3,0.657,0.658582,0.7507,0.756057,0.749702,0.749703
4,0.6503,0.634296,0.7593,0.762769,0.759388,0.758465
5,0.6449,0.649155,0.7484,0.759028,0.748309,0.750255
6,0.6416,0.631244,0.7594,0.761568,0.759199,0.759171
7,0.6374,0.633553,0.7635,0.763028,0.763367,0.76032
8,0.6344,0.632087,0.7626,0.768023,0.762364,0.76196
9,0.6344,0.638679,0.7583,0.76235,0.757911,0.758045
10,0.6312,0.632537,0.7616,0.766219,0.761364,0.760786


[I 2025-03-27 10:36:20,310] Trial 67 finished with value: 0.7607860328717779 and parameters: {'learning_rate': 0.0005229317221679667, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15, 'lambda_param': 0.5, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 68 with params: {'learning_rate': 0.0004782404120300235, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8505,0.687435,0.7303,0.749393,0.729878,0.731651
2,0.6764,0.655355,0.745,0.750194,0.744443,0.744696
3,0.6583,0.657805,0.751,0.755415,0.750039,0.750002
4,0.6511,0.635445,0.7598,0.762994,0.759835,0.758933
5,0.6455,0.650228,0.747,0.757235,0.746909,0.748792
6,0.6422,0.631787,0.7597,0.761613,0.759513,0.759337
7,0.638,0.63414,0.7628,0.762221,0.76268,0.759633
8,0.635,0.632751,0.7626,0.768185,0.762348,0.762053
9,0.635,0.639323,0.7584,0.762477,0.758009,0.758206
10,0.6319,0.6334,0.7618,0.766473,0.761548,0.760978


[I 2025-03-27 10:44:38,157] Trial 68 finished with value: 0.7609776224671181 and parameters: {'learning_rate': 0.0004782404120300235, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 69 with params: {'learning_rate': 0.0012341236585528084, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7881,0.687832,0.727,0.752217,0.726911,0.727648
2,0.6698,0.64723,0.7523,0.757073,0.751603,0.75105
3,0.6575,0.651576,0.7507,0.758031,0.749965,0.749735
4,0.6546,0.634309,0.7594,0.768055,0.759625,0.758828
5,0.649,0.647533,0.7467,0.756185,0.746432,0.749096
6,0.6462,0.62971,0.7599,0.762646,0.759504,0.759443


[I 2025-03-27 10:49:24,618] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.000845364490791224, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7859,0.685285,0.7299,0.752796,0.729739,0.730672
2,0.6679,0.657174,0.7465,0.753005,0.745994,0.745063
3,0.6544,0.657139,0.7502,0.757342,0.749364,0.748743
4,0.6496,0.630786,0.762,0.76871,0.762104,0.761977
5,0.645,0.642606,0.7506,0.759833,0.750412,0.752708
6,0.6422,0.629233,0.7613,0.764419,0.760967,0.761252
7,0.6375,0.635421,0.7634,0.764206,0.763236,0.760075
8,0.6337,0.629827,0.7644,0.768331,0.764245,0.763463
9,0.6336,0.637719,0.7592,0.763611,0.758852,0.758851
10,0.6292,0.629527,0.7615,0.766059,0.761254,0.760564


[I 2025-03-27 10:57:16,225] Trial 70 finished with value: 0.7605643622881295 and parameters: {'learning_rate': 0.000845364490791224, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 3.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 71 with params: {'learning_rate': 0.00013167338407996473, 'weight_decay': 0.007, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0744,0.800842,0.6842,0.694565,0.683881,0.682449
2,0.7545,0.71422,0.7201,0.727297,0.719051,0.719162
3,0.7056,0.691027,0.732,0.730598,0.731252,0.728798
4,0.6879,0.671961,0.7407,0.743704,0.740761,0.738986
5,0.6772,0.681376,0.7325,0.739627,0.732253,0.73347
6,0.6711,0.659633,0.7458,0.747458,0.745575,0.745206


[I 2025-03-27 11:01:59,426] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.000293943992598787, 'weight_decay': 0.01, 'warmup_steps': 25, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9213,0.707476,0.7198,0.733632,0.719414,0.720237
2,0.6936,0.669832,0.7379,0.745935,0.737116,0.737967
3,0.6688,0.660332,0.7457,0.745801,0.744945,0.743835


[I 2025-03-27 11:04:20,774] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.00044760545299715043, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8457,0.688176,0.7303,0.748835,0.729879,0.731723
2,0.6778,0.656715,0.7439,0.749463,0.743351,0.743667
3,0.6592,0.657096,0.7508,0.754363,0.749855,0.749657
4,0.6518,0.636358,0.7593,0.762203,0.75934,0.758427
5,0.646,0.650974,0.7458,0.75561,0.745723,0.747527
6,0.6426,0.63231,0.7592,0.76106,0.759013,0.758834


[I 2025-03-27 11:09:05,032] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0004676901817000123, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.822,0.686407,0.7305,0.748948,0.730097,0.73179
2,0.6762,0.655599,0.7445,0.749802,0.743936,0.744187
3,0.6583,0.657267,0.7507,0.754784,0.749737,0.749618
4,0.6512,0.635689,0.7601,0.763071,0.760129,0.75917
5,0.6456,0.650413,0.7464,0.756313,0.746302,0.748155
6,0.6422,0.631917,0.7596,0.761535,0.759425,0.759259


[I 2025-03-27 11:14:00,859] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00052345293171928, 'weight_decay': 0.008, 'warmup_steps': 20, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8381,0.685615,0.7309,0.750594,0.730465,0.732266
2,0.6742,0.654128,0.7471,0.751552,0.746538,0.74657
3,0.6571,0.658665,0.7506,0.756048,0.749596,0.749611
4,0.6503,0.634305,0.7593,0.762828,0.759384,0.758472
5,0.6449,0.649153,0.7483,0.758918,0.748209,0.750166
6,0.6416,0.631245,0.7594,0.761568,0.759199,0.759171
7,0.6374,0.633551,0.7636,0.763139,0.763468,0.760425
8,0.6344,0.632088,0.7626,0.768007,0.762363,0.76196
9,0.6344,0.638663,0.7583,0.762362,0.757915,0.758041
10,0.6312,0.632533,0.7616,0.766219,0.761364,0.760786


[I 2025-03-27 11:22:06,911] Trial 75 finished with value: 0.7607860328717779 and parameters: {'learning_rate': 0.00052345293171928, 'weight_decay': 0.008, 'warmup_steps': 20, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 76 with params: {'learning_rate': 0.0007517220088835749, 'weight_decay': 0.008, 'warmup_steps': 18, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8026,0.684021,0.7301,0.752598,0.729868,0.730917
2,0.6687,0.655963,0.7477,0.753492,0.747191,0.746366
3,0.6544,0.658109,0.7496,0.756592,0.748674,0.74819
4,0.6491,0.631155,0.7624,0.768022,0.7625,0.762246
5,0.6445,0.643879,0.7493,0.759638,0.749144,0.751473
6,0.6415,0.629839,0.7611,0.764091,0.760824,0.761046
7,0.637,0.634329,0.763,0.763377,0.762839,0.759713
8,0.6335,0.630155,0.7638,0.768326,0.763633,0.762924
9,0.6335,0.637546,0.7596,0.763971,0.759237,0.759294
10,0.6294,0.630047,0.7611,0.765781,0.760856,0.760221


[I 2025-03-27 11:30:02,472] Trial 76 finished with value: 0.7602207347906388 and parameters: {'learning_rate': 0.0007517220088835749, 'weight_decay': 0.008, 'warmup_steps': 18, 'lambda_param': 0.2, 'temperature': 2.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 77 with params: {'learning_rate': 0.0002766868044218671, 'weight_decay': 0.01, 'warmup_steps': 16, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9205,0.710787,0.7192,0.732671,0.718812,0.719494
2,0.696,0.671978,0.7373,0.745558,0.736479,0.737268
3,0.6704,0.661606,0.7451,0.745162,0.744354,0.743202
4,0.6605,0.646812,0.752,0.754967,0.752028,0.750877
5,0.6535,0.658967,0.7419,0.750262,0.741782,0.743411
6,0.6494,0.639311,0.756,0.757643,0.75579,0.755395


[I 2025-03-27 11:34:46,042] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.0006190777929360731, 'weight_decay': 0.007, 'warmup_steps': 23, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8251,0.683708,0.7311,0.752571,0.730675,0.732227
2,0.6711,0.654138,0.7471,0.751393,0.746555,0.746072
3,0.6554,0.659374,0.7494,0.755886,0.748393,0.748294
4,0.6493,0.632588,0.7608,0.765,0.760896,0.76025
5,0.6444,0.646831,0.7482,0.759208,0.748098,0.750282
6,0.6412,0.630622,0.7607,0.763403,0.760471,0.760601
7,0.6369,0.633299,0.7637,0.763476,0.763554,0.760477
8,0.6337,0.631057,0.7634,0.768283,0.763214,0.762637
9,0.6337,0.63783,0.7586,0.762666,0.758243,0.758286
10,0.6301,0.631197,0.7615,0.766145,0.761245,0.76062


[I 2025-03-27 11:42:43,920] Trial 78 finished with value: 0.7606204775285091 and parameters: {'learning_rate': 0.0006190777929360731, 'weight_decay': 0.007, 'warmup_steps': 23, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 79 with params: {'learning_rate': 0.0005979148097337759, 'weight_decay': 0.007, 'warmup_steps': 13, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8156,0.683414,0.7315,0.752667,0.731074,0.732727
2,0.6714,0.653823,0.747,0.751161,0.746436,0.746056
3,0.6556,0.659288,0.7502,0.75654,0.749184,0.749096
4,0.6494,0.632886,0.7609,0.764967,0.760992,0.760297
5,0.6444,0.64735,0.7482,0.759183,0.748095,0.750211
6,0.6412,0.630714,0.7604,0.763036,0.760174,0.76027
7,0.6369,0.633237,0.7637,0.763394,0.763548,0.760477
8,0.6338,0.631239,0.7634,0.768363,0.76321,0.762653
9,0.6338,0.637957,0.7587,0.762785,0.758337,0.758411
10,0.6303,0.631442,0.7613,0.76589,0.761049,0.760443


[I 2025-03-27 11:50:43,361] Trial 79 finished with value: 0.7604430807740009 and parameters: {'learning_rate': 0.0005979148097337759, 'weight_decay': 0.007, 'warmup_steps': 13, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 80 with params: {'learning_rate': 0.0014896625709368565, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.772,0.685746,0.7286,0.754863,0.728506,0.729682
2,0.672,0.63861,0.7557,0.757716,0.75511,0.755037
3,0.6612,0.646359,0.7533,0.759835,0.752602,0.752877
4,0.6586,0.63988,0.7554,0.766301,0.755803,0.754678
5,0.6523,0.65448,0.7447,0.756255,0.744388,0.747551
6,0.6491,0.631221,0.7596,0.763755,0.759205,0.75931
7,0.6436,0.640465,0.7624,0.762879,0.762295,0.758291
8,0.6377,0.633773,0.7628,0.768109,0.762581,0.762064
9,0.6365,0.64174,0.7604,0.766917,0.760105,0.760103
10,0.6301,0.628594,0.7612,0.766068,0.760987,0.760353


[I 2025-03-27 11:58:38,637] Trial 80 finished with value: 0.7603527821211187 and parameters: {'learning_rate': 0.0014896625709368565, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.1, 'temperature': 2.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 81 with params: {'learning_rate': 0.0006949855366388246, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7856,0.682653,0.7302,0.752693,0.729855,0.731209
2,0.669,0.65479,0.7473,0.752463,0.74677,0.746128
3,0.6545,0.6587,0.7503,0.756755,0.749325,0.748958
4,0.6489,0.631577,0.763,0.767936,0.763094,0.762711
5,0.6443,0.645034,0.7487,0.759221,0.748564,0.7508
6,0.6412,0.63017,0.7613,0.764212,0.761047,0.761233
7,0.6368,0.633657,0.7632,0.763392,0.763047,0.759934
8,0.6334,0.630452,0.7635,0.768084,0.763332,0.762676
9,0.6335,0.637541,0.7589,0.763061,0.758534,0.758594
10,0.6296,0.63047,0.7616,0.766217,0.761372,0.760769


[I 2025-03-27 12:06:33,148] Trial 81 finished with value: 0.7607690982562456 and parameters: {'learning_rate': 0.0006949855366388246, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 82 with params: {'learning_rate': 0.0006798079066880597, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8093,0.683387,0.7304,0.752909,0.730049,0.731429
2,0.6697,0.654762,0.7472,0.752144,0.746668,0.746017
3,0.6548,0.658934,0.7502,0.756712,0.749218,0.748862
4,0.649,0.631749,0.7623,0.767062,0.762387,0.761983
5,0.6443,0.645363,0.7487,0.759347,0.748565,0.750801
6,0.6412,0.630276,0.7611,0.764046,0.76085,0.761048
7,0.6368,0.63362,0.7635,0.763596,0.763347,0.760237
8,0.6335,0.630571,0.7639,0.768558,0.76374,0.763108
9,0.6335,0.637596,0.7591,0.763272,0.758729,0.758767
10,0.6297,0.630588,0.7618,0.766412,0.761569,0.760955


[I 2025-03-27 12:14:27,143] Trial 82 finished with value: 0.7609552930436482 and parameters: {'learning_rate': 0.0006798079066880597, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 83 with params: {'learning_rate': 0.0013474428491815453, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7691,0.685844,0.7289,0.754199,0.728795,0.729791
2,0.6705,0.642817,0.7558,0.759009,0.755178,0.754703
3,0.6589,0.648742,0.7525,0.759354,0.751786,0.751658
4,0.6563,0.636703,0.759,0.768559,0.759309,0.758362
5,0.6503,0.650334,0.7464,0.756817,0.746108,0.748983
6,0.6474,0.630394,0.7602,0.76348,0.7598,0.759799
7,0.642,0.639784,0.7618,0.762391,0.76166,0.757771
8,0.6366,0.631817,0.7634,0.768204,0.763165,0.762661
9,0.6358,0.640779,0.7606,0.766705,0.760295,0.760242
10,0.6298,0.628627,0.761,0.765793,0.760787,0.760156


[I 2025-03-27 12:22:20,869] Trial 83 finished with value: 0.7601558382232734 and parameters: {'learning_rate': 0.0013474428491815453, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 84 with params: {'learning_rate': 0.0003945872812866664, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8692,0.692096,0.728,0.744708,0.727574,0.729082
2,0.6818,0.660299,0.742,0.748303,0.741377,0.74183
3,0.6615,0.656794,0.7485,0.750397,0.747602,0.747057
4,0.6535,0.638515,0.7565,0.759277,0.756549,0.755519
5,0.6474,0.652509,0.7443,0.753663,0.744231,0.746012
6,0.6439,0.633683,0.7589,0.760484,0.758696,0.758365


[I 2025-03-27 12:27:05,242] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0005289612479212862, 'weight_decay': 0.007, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8332,0.685254,0.7306,0.750469,0.730165,0.731992
2,0.6739,0.654014,0.7473,0.751732,0.746741,0.746751
3,0.6569,0.658712,0.7508,0.756392,0.749798,0.749822
4,0.6502,0.634173,0.7595,0.762984,0.759587,0.758674
5,0.6448,0.649023,0.7483,0.758935,0.748205,0.750179
6,0.6416,0.631188,0.7596,0.761775,0.759395,0.759375


[I 2025-03-27 12:31:50,452] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0002681159956916346, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9344,0.713803,0.7173,0.730613,0.716891,0.717499
2,0.6979,0.673372,0.7363,0.744486,0.735441,0.736172
3,0.6716,0.662418,0.7446,0.744568,0.743854,0.742683
4,0.6614,0.647722,0.7513,0.754295,0.751321,0.750164
5,0.6542,0.659705,0.7412,0.749446,0.74107,0.742686
6,0.6501,0.639928,0.7554,0.757026,0.755196,0.754814


[I 2025-03-27 12:36:35,623] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0006473705727672428, 'weight_decay': 0.01, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8248,0.683733,0.7312,0.753202,0.730805,0.732283
2,0.6706,0.654488,0.7466,0.75126,0.746054,0.745516
3,0.6552,0.659265,0.7499,0.756247,0.74891,0.748632
4,0.6492,0.632188,0.7618,0.766364,0.761886,0.761391
5,0.6443,0.646127,0.7483,0.759242,0.748197,0.750415
6,0.6412,0.630475,0.7608,0.763659,0.760561,0.760752
7,0.6368,0.633433,0.7634,0.763425,0.763254,0.760191
8,0.6336,0.630821,0.7639,0.768686,0.763727,0.763107
9,0.6336,0.637706,0.759,0.763113,0.758636,0.758655
10,0.6299,0.630899,0.7619,0.766571,0.761658,0.76104


[I 2025-03-27 12:44:30,279] Trial 87 finished with value: 0.7610400382532697 and parameters: {'learning_rate': 0.0006473705727672428, 'weight_decay': 0.01, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 88 with params: {'learning_rate': 0.00028887054581493106, 'weight_decay': 0.01, 'warmup_steps': 31, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9314,0.709244,0.7186,0.732389,0.71823,0.719002
2,0.6947,0.670597,0.7379,0.746018,0.737106,0.737953
3,0.6694,0.660757,0.7454,0.745485,0.744654,0.743536


[I 2025-03-27 12:46:54,178] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0004908898893145527, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8253,0.68583,0.7301,0.749171,0.729681,0.731488
2,0.6752,0.654712,0.7447,0.7497,0.744142,0.744353
3,0.6577,0.657802,0.7507,0.755349,0.749726,0.749715
4,0.6508,0.635031,0.76,0.763308,0.760042,0.759168
5,0.6452,0.649884,0.7477,0.758135,0.747591,0.749511
6,0.6419,0.631571,0.7596,0.761537,0.759416,0.759265
7,0.6377,0.633917,0.7626,0.762051,0.762478,0.759426
8,0.6348,0.632523,0.7624,0.767909,0.762156,0.761832
9,0.6348,0.639129,0.7584,0.762471,0.758012,0.758199
10,0.6317,0.633131,0.7618,0.76644,0.761547,0.760984


[I 2025-03-27 12:55:13,328] Trial 89 finished with value: 0.760984490367046 and parameters: {'learning_rate': 0.0004908898893145527, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.0, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 90 with params: {'learning_rate': 0.0004673973344070402, 'weight_decay': 0.004, 'warmup_steps': 6, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8324,0.686823,0.7303,0.749025,0.7299,0.731652
2,0.6765,0.655652,0.7447,0.750025,0.74414,0.744406
3,0.6584,0.657343,0.7511,0.755121,0.750142,0.750033
4,0.6513,0.635692,0.7599,0.76298,0.759917,0.759027
5,0.6456,0.650447,0.7468,0.75677,0.746709,0.748552
6,0.6423,0.631927,0.7595,0.761436,0.759313,0.759154
7,0.6381,0.634316,0.7622,0.761606,0.762073,0.759027
8,0.6351,0.632904,0.762,0.767674,0.761757,0.76147
9,0.6352,0.639512,0.7583,0.762316,0.757919,0.758112
10,0.6321,0.633619,0.7614,0.766171,0.761141,0.760593


[I 2025-03-27 13:03:38,639] Trial 90 finished with value: 0.7605933171881867 and parameters: {'learning_rate': 0.0004673973344070402, 'weight_decay': 0.004, 'warmup_steps': 6, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 91 with params: {'learning_rate': 0.0010753638787907999, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7572,0.688154,0.7292,0.75368,0.729104,0.729984
2,0.6678,0.653869,0.747,0.753902,0.746326,0.745646
3,0.6556,0.655779,0.7497,0.757618,0.748913,0.748498
4,0.652,0.631555,0.7612,0.768624,0.761338,0.760807
5,0.647,0.643463,0.7493,0.757914,0.749086,0.751501
6,0.6443,0.628764,0.7614,0.76414,0.761016,0.761095
7,0.6392,0.637654,0.7622,0.763379,0.762026,0.758581
8,0.6348,0.62988,0.7635,0.76745,0.763306,0.762621
9,0.6344,0.638894,0.7592,0.764343,0.758852,0.758828
10,0.6293,0.628869,0.7606,0.765341,0.760395,0.759768


[I 2025-03-27 13:11:31,949] Trial 91 finished with value: 0.7597682312548822 and parameters: {'learning_rate': 0.0010753638787907999, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 92 with params: {'learning_rate': 0.00028296610760148165, 'weight_decay': 0.008, 'warmup_steps': 13, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.913,0.708928,0.7195,0.733087,0.719112,0.719852
2,0.6948,0.671067,0.7375,0.745711,0.736691,0.737539
3,0.6697,0.661054,0.7452,0.745291,0.744456,0.743337
4,0.6599,0.646187,0.752,0.754839,0.752015,0.750879
5,0.653,0.65846,0.7418,0.750243,0.741691,0.743346
6,0.649,0.638884,0.7563,0.75792,0.756094,0.75569


[I 2025-03-27 13:16:14,869] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.0002984882556853889, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8907,0.704399,0.72,0.733764,0.719618,0.720454
2,0.6918,0.668913,0.7375,0.745611,0.736723,0.737621
3,0.668,0.659813,0.7465,0.746615,0.745744,0.744653


[I 2025-03-27 13:18:38,913] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0006760775337579033, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8004,0.683002,0.7302,0.752614,0.72985,0.731294
2,0.6695,0.654592,0.7472,0.751949,0.746655,0.745991
3,0.6547,0.658917,0.7505,0.756939,0.749517,0.749155
4,0.649,0.63177,0.7623,0.76704,0.762387,0.761977
5,0.6443,0.645468,0.7489,0.759612,0.748763,0.750998
6,0.6412,0.630287,0.7614,0.764303,0.761152,0.761334
7,0.6368,0.633567,0.7636,0.76369,0.763445,0.760349
8,0.6335,0.630579,0.7639,0.768558,0.76374,0.763108
9,0.6335,0.637596,0.7593,0.763482,0.758927,0.758964
10,0.6297,0.630618,0.7617,0.766365,0.761466,0.760864


[I 2025-03-27 13:26:31,132] Trial 94 finished with value: 0.7608635377090673 and parameters: {'learning_rate': 0.0006760775337579033, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 95 with params: {'learning_rate': 0.0005923425194163755, 'weight_decay': 0.004, 'warmup_steps': 14, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8178,0.683539,0.7316,0.752574,0.731186,0.732864
2,0.6716,0.653814,0.7468,0.750924,0.746239,0.74585
3,0.6557,0.659284,0.7501,0.756352,0.74909,0.749028
4,0.6494,0.632976,0.7605,0.764486,0.760598,0.759857
5,0.6444,0.647489,0.748,0.758889,0.747892,0.749992
6,0.6412,0.630746,0.7604,0.762987,0.760173,0.760268
7,0.6369,0.633241,0.7637,0.763412,0.763547,0.760489
8,0.6338,0.631297,0.7634,0.768376,0.76321,0.762656
9,0.6338,0.637986,0.7587,0.762772,0.758337,0.758414
10,0.6303,0.631511,0.7614,0.765983,0.761148,0.760542


[I 2025-03-27 13:34:45,738] Trial 95 finished with value: 0.7605418013188621 and parameters: {'learning_rate': 0.0005923425194163755, 'weight_decay': 0.004, 'warmup_steps': 14, 'lambda_param': 0.0, 'temperature': 4.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 96 with params: {'learning_rate': 0.0007059281518825818, 'weight_decay': 0.001, 'warmup_steps': 8, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.794,0.683069,0.7303,0.752968,0.729982,0.731271
2,0.669,0.655019,0.747,0.752272,0.746471,0.745793
3,0.6545,0.658597,0.7501,0.75675,0.749132,0.748748
4,0.6489,0.631453,0.7627,0.767737,0.762784,0.762458
5,0.6443,0.644786,0.7486,0.759169,0.748452,0.750712
6,0.6413,0.630107,0.7617,0.764602,0.761443,0.761651
7,0.6369,0.633805,0.7629,0.763117,0.762747,0.759639
8,0.6334,0.630391,0.7633,0.767924,0.763123,0.762452
9,0.6335,0.637529,0.7589,0.763107,0.758527,0.758608
10,0.6296,0.630368,0.7617,0.766304,0.761469,0.760855


[I 2025-03-27 13:42:35,769] Trial 96 finished with value: 0.760854762812694 and parameters: {'learning_rate': 0.0007059281518825818, 'weight_decay': 0.001, 'warmup_steps': 8, 'lambda_param': 0.1, 'temperature': 2.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 97 with params: {'learning_rate': 0.0006306290671949608, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8025,0.682763,0.7316,0.753262,0.73119,0.73274
2,0.6703,0.653993,0.7472,0.751659,0.74665,0.746163
3,0.6551,0.659195,0.7493,0.755699,0.748311,0.748149


[I 2025-03-27 13:44:56,768] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0018766279033906998, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7541,0.677904,0.7327,0.753666,0.732727,0.733521
2,0.6764,0.643877,0.7531,0.759499,0.752332,0.752811
3,0.6678,0.647731,0.7526,0.759852,0.751883,0.753133
4,0.6648,0.647421,0.7522,0.765688,0.752547,0.751831
5,0.658,0.664521,0.7429,0.756651,0.742563,0.746102
6,0.6538,0.633624,0.757,0.763257,0.756664,0.757178


[I 2025-03-27 13:49:51,049] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0008753492454600271, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8031,0.686691,0.7291,0.752402,0.728972,0.729776
2,0.6682,0.657494,0.7466,0.753262,0.746074,0.745173
3,0.6547,0.657114,0.7504,0.757681,0.749566,0.748946
4,0.6499,0.630811,0.7617,0.768381,0.761812,0.761633
5,0.6453,0.642405,0.7499,0.758931,0.749689,0.752032
6,0.6425,0.629075,0.762,0.765066,0.761644,0.761926
7,0.6377,0.635843,0.763,0.76402,0.762845,0.759667
8,0.6338,0.629803,0.7648,0.768652,0.764633,0.763829
9,0.6337,0.637855,0.7596,0.763941,0.759256,0.759201
10,0.6292,0.629402,0.7614,0.765996,0.761163,0.760476


[I 2025-03-27 13:57:39,624] Trial 99 finished with value: 0.7604762831314132 and parameters: {'learning_rate': 0.0008753492454600271, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 100 with params: {'learning_rate': 0.0007817907601091025, 'weight_decay': 0.001, 'warmup_steps': 17, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7983,0.684352,0.73,0.752484,0.7298,0.730682
2,0.6684,0.656438,0.7471,0.753064,0.746597,0.745667
3,0.6544,0.657747,0.7501,0.75708,0.749205,0.748618
4,0.6492,0.630991,0.762,0.768064,0.762107,0.761905
5,0.6447,0.643388,0.7502,0.760176,0.75002,0.752384
6,0.6417,0.629639,0.7616,0.764755,0.761308,0.761567
7,0.6371,0.634673,0.7629,0.763549,0.762745,0.759599
8,0.6335,0.630032,0.7638,0.768083,0.763639,0.762884
9,0.6335,0.637574,0.7595,0.76384,0.759145,0.759148
10,0.6293,0.629857,0.7609,0.765583,0.760661,0.760034


[I 2025-03-27 14:05:34,730] Trial 100 finished with value: 0.7600336826970897 and parameters: {'learning_rate': 0.0007817907601091025, 'weight_decay': 0.001, 'warmup_steps': 17, 'lambda_param': 0.2, 'temperature': 3.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 101 with params: {'learning_rate': 0.00023875927982590684, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9433,0.722219,0.7144,0.726813,0.713977,0.714246
2,0.7037,0.678292,0.7346,0.743117,0.733669,0.734427
3,0.6754,0.665501,0.7432,0.742999,0.742446,0.74115


[I 2025-03-27 14:07:56,972] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0006793964735150562, 'weight_decay': 0.0, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8,0.683021,0.7301,0.752533,0.729748,0.731186
2,0.6695,0.654631,0.7471,0.751965,0.746558,0.745927
3,0.6547,0.658893,0.7503,0.756772,0.74932,0.748955
4,0.649,0.631733,0.7625,0.767261,0.762586,0.76219
5,0.6443,0.645388,0.7487,0.759328,0.748559,0.750775
6,0.6412,0.630262,0.7614,0.764299,0.761152,0.761339
7,0.6368,0.633584,0.7637,0.763823,0.763545,0.760453
8,0.6335,0.630563,0.7638,0.768469,0.763634,0.763001
9,0.6335,0.637585,0.7592,0.763368,0.758829,0.758867
10,0.6297,0.63059,0.7616,0.766229,0.761366,0.76076


[I 2025-03-27 14:15:47,527] Trial 102 finished with value: 0.76076047191836 and parameters: {'learning_rate': 0.0006793964735150562, 'weight_decay': 0.0, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 103 with params: {'learning_rate': 0.0005678775676814374, 'weight_decay': 0.01, 'warmup_steps': 25, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8358,0.684613,0.7312,0.751903,0.730775,0.732502
2,0.6727,0.653896,0.7467,0.750976,0.746129,0.745925
3,0.6562,0.659269,0.7497,0.755795,0.748693,0.748658
4,0.6497,0.63345,0.7599,0.763699,0.76,0.759202
5,0.6446,0.648092,0.7482,0.759042,0.748111,0.75016
6,0.6413,0.63092,0.7595,0.761913,0.759285,0.759324


[I 2025-03-27 14:20:48,041] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.0007762300303098858, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7893,0.683902,0.7291,0.751599,0.728887,0.729846
2,0.6683,0.656255,0.7475,0.753518,0.746988,0.746094
3,0.6543,0.657764,0.75,0.756993,0.749092,0.748528
4,0.6491,0.631001,0.7624,0.768288,0.762501,0.762299
5,0.6446,0.643473,0.7499,0.759897,0.749725,0.752074
6,0.6416,0.629666,0.7614,0.764556,0.761113,0.761373
7,0.6371,0.634567,0.7627,0.763279,0.762547,0.759403
8,0.6335,0.630036,0.7636,0.767937,0.763435,0.762682
9,0.6335,0.637554,0.7597,0.763975,0.759341,0.759338
10,0.6293,0.629887,0.761,0.765681,0.760764,0.760128


[I 2025-03-27 14:28:43,808] Trial 104 finished with value: 0.7601279320894943 and parameters: {'learning_rate': 0.0007762300303098858, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 105 with params: {'learning_rate': 0.001394113520827695, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7844,0.686655,0.7277,0.753599,0.727596,0.728556
2,0.6713,0.641105,0.7558,0.758298,0.755184,0.754881
3,0.6598,0.647767,0.753,0.759643,0.752302,0.752259
4,0.6572,0.637914,0.758,0.768001,0.758332,0.757334
5,0.651,0.65183,0.7452,0.756009,0.744895,0.747883
6,0.648,0.630732,0.7599,0.763503,0.759503,0.759518


[I 2025-03-27 14:33:28,585] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.00016644555832767357, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0047,0.760739,0.6995,0.710278,0.699126,0.69837
2,0.7289,0.69714,0.7253,0.73386,0.724248,0.724854
3,0.6911,0.678636,0.7366,0.736041,0.735867,0.734086
4,0.6769,0.662771,0.7453,0.748447,0.745326,0.743847
5,0.6678,0.67298,0.736,0.743155,0.7358,0.737021
6,0.6626,0.651674,0.7487,0.750525,0.74851,0.748181


[I 2025-03-27 14:38:12,873] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0011358128539475233, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7681,0.688246,0.7279,0.752899,0.727814,0.728649
2,0.6684,0.651249,0.7486,0.75477,0.7479,0.747257
3,0.6562,0.654439,0.7494,0.757226,0.748641,0.748319


[I 2025-03-27 14:40:35,626] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.00024000574648689587, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.932,0.720826,0.7148,0.726974,0.714383,0.714594
2,0.7029,0.677899,0.7345,0.743057,0.733567,0.734326
3,0.675,0.665268,0.7433,0.743129,0.742548,0.74125


[I 2025-03-27 14:42:57,183] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0003148500788331691, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8806,0.701098,0.7221,0.736525,0.721716,0.722786
2,0.6895,0.667057,0.739,0.746431,0.738248,0.73898
3,0.6665,0.658804,0.7468,0.747187,0.746038,0.74502
4,0.6575,0.643353,0.7545,0.75745,0.754513,0.75354
5,0.6509,0.656188,0.7425,0.751367,0.742394,0.744146
6,0.6471,0.636989,0.7579,0.759441,0.757668,0.757294


[I 2025-03-27 14:47:55,121] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0029152812441595134, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7508,0.656575,0.7489,0.756711,0.74866,0.748242
2,0.6935,0.682986,0.7362,0.750553,0.73517,0.729855
3,0.6878,0.670401,0.7426,0.756414,0.74197,0.741967
4,0.6797,0.653054,0.7492,0.758783,0.74945,0.748212
5,0.6729,0.677598,0.7387,0.758343,0.738399,0.743132
6,0.6676,0.634221,0.7594,0.764848,0.759052,0.760048
7,0.6584,0.648041,0.7559,0.760285,0.756167,0.751245
8,0.6469,0.645926,0.7542,0.764305,0.753991,0.753911
9,0.6435,0.640858,0.7579,0.763782,0.757571,0.757837
10,0.6337,0.629062,0.7608,0.766103,0.760563,0.759915


[I 2025-03-27 14:56:18,815] Trial 110 finished with value: 0.7599145819769637 and parameters: {'learning_rate': 0.0029152812441595134, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 111 with params: {'learning_rate': 0.00024240225751325542, 'weight_decay': 0.001, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9368,0.720578,0.7149,0.727516,0.714475,0.714795
2,0.7026,0.677542,0.7349,0.743421,0.73398,0.734769
3,0.6748,0.665036,0.7434,0.743203,0.742657,0.741384


[I 2025-03-27 14:58:47,215] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 8.37052256414999e-05, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1868,0.907197,0.657,0.668512,0.656749,0.654873
2,0.8287,0.76553,0.7047,0.709484,0.703647,0.702784
3,0.7467,0.725132,0.7173,0.714732,0.71655,0.712709
4,0.7179,0.69773,0.7299,0.732248,0.729869,0.728028
5,0.7021,0.703665,0.7259,0.732445,0.725659,0.726718
6,0.6933,0.680847,0.7368,0.738345,0.736546,0.7361


[I 2025-03-27 15:03:49,971] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0012720551228624925, 'weight_decay': 0.004, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7571,0.686318,0.7279,0.752973,0.727809,0.728707
2,0.6695,0.645833,0.7523,0.756618,0.751624,0.751077
3,0.6577,0.650685,0.7514,0.758668,0.750679,0.750463
4,0.655,0.634892,0.7594,0.768233,0.759641,0.75881
5,0.6493,0.648185,0.7467,0.756273,0.746428,0.749118
6,0.6465,0.629862,0.76,0.762869,0.759609,0.759572


[I 2025-03-27 15:08:40,585] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.000832629753027237, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8064,0.685685,0.7297,0.752362,0.729548,0.730408
2,0.6684,0.657233,0.7468,0.753255,0.746282,0.745387
3,0.6545,0.657319,0.7507,0.757816,0.749848,0.749279
4,0.6496,0.630884,0.762,0.768586,0.762092,0.761978
5,0.645,0.642739,0.7502,0.759666,0.750012,0.752355
6,0.6421,0.629329,0.7614,0.764565,0.761074,0.761361
7,0.6375,0.635345,0.7632,0.763979,0.763036,0.759877
8,0.6337,0.629894,0.7643,0.768338,0.764144,0.763368
9,0.6336,0.637712,0.7592,0.763611,0.758852,0.758851
10,0.6292,0.629594,0.7615,0.766121,0.761258,0.760601


[I 2025-03-27 15:16:50,497] Trial 114 finished with value: 0.760600900918263 and parameters: {'learning_rate': 0.000832629753027237, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 115 with params: {'learning_rate': 0.0006304836479510941, 'weight_decay': 0.006, 'warmup_steps': 5, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7997,0.682666,0.7312,0.752904,0.730792,0.732314
2,0.6703,0.653951,0.747,0.75146,0.74644,0.745945
3,0.6551,0.659184,0.7493,0.755697,0.74831,0.748161


[I 2025-03-27 15:19:17,998] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.004442131336597166, 'weight_decay': 0.006, 'warmup_steps': 27, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7852,0.703575,0.7203,0.758791,0.719777,0.720401
2,0.723,0.669075,0.7399,0.751897,0.739,0.741236
3,0.7121,0.7046,0.7273,0.749022,0.726758,0.720213
4,0.7012,0.676528,0.7368,0.75623,0.737011,0.737484
5,0.6953,0.69653,0.7299,0.754268,0.729611,0.733657
6,0.6828,0.648072,0.7539,0.75994,0.753401,0.753829


[I 2025-03-27 15:24:18,307] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.00012050092247739796, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1076,0.821002,0.6785,0.688942,0.678198,0.676624
2,0.7673,0.722494,0.7159,0.722767,0.714821,0.714814
3,0.7123,0.696702,0.7286,0.726933,0.727854,0.725083


[I 2025-03-27 15:26:47,615] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0007579282964594197, 'weight_decay': 0.01, 'warmup_steps': 20, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8046,0.684177,0.7298,0.752329,0.729581,0.730534
2,0.6687,0.656107,0.7477,0.753601,0.747183,0.746334
3,0.6544,0.658031,0.7498,0.756806,0.748891,0.748397
4,0.6491,0.631123,0.7624,0.768053,0.762504,0.762263
5,0.6446,0.643766,0.7492,0.759596,0.749041,0.751416
6,0.6416,0.629797,0.761,0.764008,0.760723,0.760946
7,0.6371,0.634415,0.7625,0.762974,0.762349,0.759203
8,0.6335,0.630146,0.7639,0.768361,0.763729,0.763025
9,0.6335,0.637543,0.7596,0.763971,0.759237,0.759294
10,0.6294,0.63,0.7611,0.765816,0.760855,0.760224


[I 2025-03-27 15:35:03,411] Trial 118 finished with value: 0.7602237404456644 and parameters: {'learning_rate': 0.0007579282964594197, 'weight_decay': 0.01, 'warmup_steps': 20, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 119 with params: {'learning_rate': 0.00030426047933377736, 'weight_decay': 0.01, 'warmup_steps': 6, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8919,0.703527,0.7209,0.735034,0.720517,0.721429
2,0.6911,0.668289,0.7382,0.746047,0.737433,0.738262
3,0.6675,0.65946,0.7465,0.746705,0.745737,0.744685
4,0.6582,0.644229,0.7533,0.756258,0.753315,0.752293
5,0.6515,0.65689,0.7421,0.750771,0.742002,0.743701
6,0.6476,0.637574,0.7574,0.758979,0.757183,0.75678


[I 2025-03-27 15:39:59,769] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.0005632144622584998, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8106,0.683608,0.731,0.751368,0.730569,0.732257
2,0.6722,0.653605,0.7476,0.75191,0.747029,0.746849
3,0.656,0.658983,0.7505,0.756354,0.749507,0.749496
4,0.6497,0.633455,0.7606,0.764289,0.76069,0.759863
5,0.6445,0.648181,0.7481,0.75883,0.748004,0.750021
6,0.6413,0.630902,0.7595,0.761826,0.759284,0.759311


[I 2025-03-27 15:45:00,954] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 5.962261310139537e-05, 'weight_decay': 0.008, 'warmup_steps': 9, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2722,1.012298,0.6297,0.640444,0.629521,0.626928
2,0.9129,0.829355,0.6856,0.689129,0.684589,0.682841
3,0.7972,0.764739,0.7048,0.700776,0.704079,0.699137
4,0.7533,0.729289,0.7189,0.721122,0.718845,0.716741
5,0.7303,0.729254,0.7163,0.721806,0.716081,0.716882
6,0.7177,0.704605,0.728,0.728485,0.72767,0.726761


[I 2025-03-27 15:50:06,536] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 6.27144654905372e-05, 'weight_decay': 0.006, 'warmup_steps': 23, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2718,1.001737,0.6321,0.643138,0.631907,0.62938
2,0.9022,0.81974,0.6887,0.69218,0.6877,0.686076
3,0.7892,0.758231,0.7072,0.703519,0.706464,0.70178


[I 2025-03-27 15:52:28,695] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.001058889824072625, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7687,0.688685,0.7287,0.753225,0.728605,0.729421
2,0.6679,0.654204,0.7468,0.753941,0.746118,0.745441
3,0.6555,0.656066,0.7495,0.757501,0.748703,0.748333
4,0.6518,0.631401,0.7621,0.769407,0.762229,0.76175
5,0.6468,0.643216,0.7488,0.757523,0.748583,0.751045
6,0.6441,0.628722,0.7615,0.764304,0.761118,0.761229
7,0.6391,0.637557,0.7623,0.763555,0.762134,0.758696
8,0.6347,0.629836,0.7638,0.767722,0.763609,0.762905
9,0.6343,0.63878,0.7588,0.763888,0.758444,0.758439
10,0.6293,0.628897,0.7603,0.76504,0.760092,0.759457


[I 2025-03-27 16:00:24,558] Trial 123 finished with value: 0.7594574307335968 and parameters: {'learning_rate': 0.001058889824072625, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 124 with params: {'learning_rate': 0.0001940851681358074, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9953,0.744495,0.7052,0.716855,0.704765,0.70442
2,0.7179,0.68889,0.7283,0.736969,0.727268,0.72789
3,0.684,0.672488,0.7393,0.739071,0.738529,0.737013


[I 2025-03-27 16:02:53,685] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0008116030849273489, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7874,0.684528,0.73,0.75233,0.729819,0.730686
2,0.668,0.656792,0.7479,0.754048,0.747402,0.746455
3,0.6543,0.657406,0.7504,0.757396,0.749513,0.748905
4,0.6493,0.630874,0.7621,0.768518,0.762212,0.76206
5,0.6448,0.64297,0.7504,0.760015,0.750214,0.752542
6,0.6419,0.629445,0.7615,0.764739,0.761196,0.761481
7,0.6373,0.635004,0.7629,0.763547,0.762743,0.759599
8,0.6336,0.629915,0.7635,0.767638,0.763338,0.762584
9,0.6335,0.637625,0.759,0.76338,0.758645,0.75864
10,0.6293,0.629692,0.7615,0.766103,0.76126,0.760589


[I 2025-03-27 16:11:17,796] Trial 125 finished with value: 0.7605892055142423 and parameters: {'learning_rate': 0.0008116030849273489, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 126 with params: {'learning_rate': 0.0005848540313904458, 'weight_decay': 0.01, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8096,0.683292,0.7315,0.752335,0.731071,0.732728
2,0.6716,0.653674,0.7473,0.751455,0.746744,0.746398
3,0.6557,0.659168,0.7504,0.756495,0.749389,0.749312
4,0.6494,0.633077,0.7604,0.764216,0.760498,0.759719
5,0.6444,0.647662,0.7484,0.759246,0.748292,0.750384
6,0.6412,0.630768,0.7603,0.762809,0.760073,0.760157
7,0.6369,0.633225,0.7638,0.763403,0.763651,0.760543
8,0.6338,0.631353,0.7634,0.768444,0.76321,0.762665
9,0.6339,0.638048,0.759,0.763078,0.758641,0.758715
10,0.6304,0.631609,0.7613,0.765829,0.761051,0.760449


[I 2025-03-27 16:19:51,139] Trial 126 finished with value: 0.760448950863455 and parameters: {'learning_rate': 0.0005848540313904458, 'weight_decay': 0.01, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 3.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 127 with params: {'learning_rate': 0.0004369469971736011, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8575,0.689222,0.73,0.748,0.729584,0.731292
2,0.6788,0.657408,0.744,0.749688,0.743445,0.743792
3,0.6597,0.657034,0.7502,0.753427,0.749251,0.749008
4,0.6522,0.636759,0.7591,0.761908,0.759157,0.758174
5,0.6463,0.651254,0.7452,0.754959,0.74512,0.746918
6,0.6429,0.632549,0.7594,0.761265,0.759209,0.759026


[I 2025-03-27 16:25:01,630] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.001552490949972893, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.771,0.685446,0.729,0.755335,0.728915,0.730142
2,0.6726,0.638021,0.7554,0.757485,0.754815,0.754889
3,0.6623,0.645813,0.7548,0.761238,0.754104,0.754519
4,0.6596,0.641139,0.7542,0.765318,0.754605,0.753518
5,0.6532,0.656362,0.7447,0.756609,0.744375,0.74763
6,0.6498,0.631556,0.7595,0.763875,0.75909,0.759231
7,0.6443,0.640622,0.7613,0.761733,0.761221,0.757188
8,0.6381,0.63483,0.7625,0.768151,0.762259,0.761763
9,0.6369,0.642109,0.7599,0.766692,0.759622,0.75963
10,0.6303,0.628604,0.7609,0.765817,0.760684,0.760072


[I 2025-03-27 16:33:30,019] Trial 128 finished with value: 0.7600720554973447 and parameters: {'learning_rate': 0.001552490949972893, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 129 with params: {'learning_rate': 0.0006630040597713372, 'weight_decay': 0.008, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8227,0.683745,0.7309,0.753251,0.730526,0.731972
2,0.6703,0.654675,0.7469,0.751691,0.746351,0.745739
3,0.655,0.659156,0.7501,0.756491,0.749107,0.748802
4,0.6491,0.63198,0.7619,0.766566,0.761977,0.76152
5,0.6443,0.645753,0.7482,0.759106,0.748078,0.75033
6,0.6412,0.630387,0.7609,0.763854,0.760656,0.760867
7,0.6368,0.63352,0.7632,0.763253,0.76305,0.759955
8,0.6335,0.630704,0.7642,0.768903,0.764042,0.763414
9,0.6336,0.637645,0.7593,0.763458,0.758928,0.758967
10,0.6298,0.630748,0.7616,0.766269,0.761361,0.760753


[I 2025-03-27 16:42:00,766] Trial 129 finished with value: 0.7607534392396159 and parameters: {'learning_rate': 0.0006630040597713372, 'weight_decay': 0.008, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 130 with params: {'learning_rate': 0.0006111769102329425, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8175,0.683407,0.7308,0.752304,0.730373,0.731997
2,0.6711,0.653964,0.7473,0.751522,0.74674,0.746288
3,0.6554,0.659333,0.7499,0.756364,0.748889,0.748815
4,0.6493,0.632682,0.7611,0.765217,0.761199,0.760516
5,0.6444,0.64703,0.7482,0.759236,0.748092,0.750269
6,0.6412,0.630652,0.7603,0.762955,0.760069,0.760179
7,0.6369,0.633268,0.7637,0.763378,0.763553,0.760484
8,0.6337,0.631115,0.7634,0.768264,0.763214,0.762636
9,0.6337,0.637871,0.7586,0.76271,0.758243,0.758306
10,0.6302,0.631291,0.7616,0.766214,0.761343,0.760718


[I 2025-03-27 16:50:31,800] Trial 130 finished with value: 0.7607184632727887 and parameters: {'learning_rate': 0.0006111769102329425, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 131 with params: {'learning_rate': 0.0015268750421242984, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7525,0.685058,0.7293,0.755531,0.729207,0.730448
2,0.6721,0.638082,0.7558,0.757676,0.755231,0.755238
3,0.6616,0.646027,0.754,0.76053,0.753317,0.753699
4,0.659,0.640481,0.7549,0.76594,0.755306,0.754187
5,0.6527,0.655326,0.7451,0.756751,0.744785,0.747971
6,0.6494,0.631361,0.7599,0.764193,0.759492,0.759634
7,0.644,0.640543,0.7619,0.762356,0.761802,0.757803
8,0.6378,0.634228,0.7625,0.767947,0.762273,0.761773
9,0.6367,0.641919,0.76,0.766716,0.759709,0.759721
10,0.6302,0.628599,0.761,0.765878,0.76079,0.760149


[I 2025-03-27 16:58:52,811] Trial 131 finished with value: 0.7601490582243913 and parameters: {'learning_rate': 0.0015268750421242984, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 132 with params: {'learning_rate': 0.0008061384010674098, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7852,0.684317,0.7295,0.751837,0.729308,0.730237
2,0.668,0.656688,0.748,0.75413,0.747497,0.746569
3,0.6543,0.657425,0.7504,0.757461,0.749509,0.748945
4,0.6493,0.630881,0.7619,0.768218,0.762017,0.761837
5,0.6447,0.643038,0.7506,0.760306,0.750414,0.752762
6,0.6418,0.629475,0.7615,0.764727,0.761196,0.761477
7,0.6372,0.634923,0.7631,0.763738,0.762939,0.759791
8,0.6336,0.629925,0.7635,0.767692,0.763341,0.762584
9,0.6335,0.637608,0.7592,0.763542,0.758845,0.758848
10,0.6293,0.629711,0.7616,0.766196,0.761359,0.760685


[I 2025-03-27 17:07:17,000] Trial 132 finished with value: 0.7606853249636046 and parameters: {'learning_rate': 0.0008061384010674098, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 133 with params: {'learning_rate': 0.0007720450098720833, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7804,0.683537,0.7295,0.752241,0.729273,0.730319
2,0.6681,0.656076,0.7478,0.753633,0.747291,0.746406
3,0.6542,0.657771,0.7499,0.756858,0.749005,0.748482
4,0.6491,0.631008,0.7623,0.768056,0.762407,0.762165
5,0.6445,0.643538,0.7495,0.759705,0.749337,0.7517
6,0.6416,0.629693,0.7612,0.764335,0.760924,0.761174
7,0.6371,0.63447,0.7625,0.762951,0.762354,0.759182
8,0.6335,0.630049,0.7638,0.768146,0.763632,0.762888
9,0.6335,0.637534,0.7598,0.76407,0.759437,0.759453
10,0.6293,0.629916,0.7606,0.765318,0.760367,0.759735


[I 2025-03-27 17:15:42,034] Trial 133 finished with value: 0.7597349249085978 and parameters: {'learning_rate': 0.0007720450098720833, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 134 with params: {'learning_rate': 6.558978114640059e-05, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2533,0.983497,0.638,0.649059,0.637768,0.635384
2,0.888,0.8093,0.6913,0.694867,0.690283,0.688756
3,0.7811,0.752042,0.7079,0.704238,0.707172,0.702604


[I 2025-03-27 17:18:12,927] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0014617345874829721, 'weight_decay': 0.008, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7668,0.685183,0.7289,0.75514,0.728804,0.730003
2,0.6716,0.639143,0.7556,0.757655,0.754999,0.754834
3,0.6607,0.64673,0.7536,0.760066,0.752897,0.753031
4,0.6581,0.639264,0.7561,0.766729,0.756488,0.755379
5,0.6518,0.653548,0.7451,0.756425,0.744798,0.747927
6,0.6487,0.631054,0.7595,0.763406,0.759108,0.759184
7,0.6433,0.640377,0.7621,0.76258,0.761986,0.757976
8,0.6374,0.633293,0.7623,0.767562,0.762079,0.761587
9,0.6364,0.641544,0.7603,0.766679,0.760005,0.759967
10,0.6301,0.628595,0.7614,0.766254,0.761185,0.76055


[I 2025-03-27 17:26:29,840] Trial 135 finished with value: 0.7605504815338169 and parameters: {'learning_rate': 0.0014617345874829721, 'weight_decay': 0.008, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 136 with params: {'learning_rate': 0.000734597380741018, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7813,0.683003,0.73,0.752781,0.729717,0.730954
2,0.6684,0.655456,0.7468,0.752432,0.746287,0.745512
3,0.6543,0.658244,0.7503,0.75694,0.749369,0.748917
4,0.649,0.631241,0.7625,0.767869,0.762598,0.762338
5,0.6444,0.644208,0.7491,0.759509,0.748953,0.751234
6,0.6414,0.62993,0.7614,0.764326,0.761135,0.76133
7,0.6369,0.634047,0.7631,0.763436,0.762942,0.75982
8,0.6334,0.630226,0.7637,0.768184,0.763522,0.762821
9,0.6334,0.637519,0.7595,0.763736,0.759141,0.759188
10,0.6295,0.630165,0.7611,0.765819,0.760878,0.760264


[I 2025-03-27 17:34:45,922] Trial 136 finished with value: 0.7602638447066656 and parameters: {'learning_rate': 0.000734597380741018, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 137 with params: {'learning_rate': 0.0006604190509024623, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7914,0.682495,0.7301,0.752033,0.72972,0.731187
2,0.6696,0.654275,0.7474,0.752227,0.746863,0.74632
3,0.6547,0.659,0.7499,0.756217,0.748901,0.748622
4,0.649,0.63196,0.7622,0.766741,0.762278,0.761803
5,0.6442,0.645841,0.7486,0.759415,0.748483,0.750701
6,0.6411,0.630354,0.761,0.763851,0.760763,0.760938
7,0.6368,0.6334,0.7635,0.763526,0.763349,0.760266
8,0.6335,0.630693,0.7642,0.768895,0.76404,0.7634
9,0.6335,0.637625,0.7588,0.76294,0.758437,0.758479
10,0.6298,0.630769,0.7619,0.766631,0.761658,0.76105


[I 2025-03-27 17:43:13,910] Trial 137 finished with value: 0.7610496227801449 and parameters: {'learning_rate': 0.0006604190509024623, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 138 with params: {'learning_rate': 0.002819055822915683, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7553,0.65475,0.7495,0.756189,0.749273,0.748717
2,0.6919,0.679951,0.7363,0.750276,0.735215,0.730442
3,0.6861,0.670233,0.7436,0.756474,0.742971,0.742916
4,0.6783,0.650928,0.75,0.759058,0.750279,0.749112
5,0.6716,0.675698,0.7389,0.758123,0.738596,0.743313
6,0.6666,0.634626,0.759,0.765123,0.758627,0.759716
7,0.6577,0.649271,0.7546,0.758869,0.754909,0.749788
8,0.6464,0.646803,0.7547,0.764767,0.754505,0.754317
9,0.6431,0.64111,0.7584,0.764501,0.758076,0.758336
10,0.6335,0.629013,0.7608,0.766224,0.760568,0.759949


[I 2025-03-27 17:51:25,435] Trial 138 finished with value: 0.7599488162505904 and parameters: {'learning_rate': 0.002819055822915683, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 139 with params: {'learning_rate': 0.0002725042135758322, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9062,0.710542,0.7189,0.732165,0.718511,0.719131
2,0.6961,0.672365,0.7367,0.744918,0.73584,0.736636
3,0.6707,0.661849,0.7449,0.744933,0.74416,0.742994


[I 2025-03-27 17:53:57,713] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0010322248435071174, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.764,0.688655,0.729,0.753428,0.728894,0.729639
2,0.6677,0.655088,0.7466,0.753951,0.745932,0.745267
3,0.6553,0.656409,0.7497,0.757693,0.748902,0.748457
4,0.6514,0.63117,0.7623,0.769532,0.762423,0.761997
5,0.6465,0.642782,0.7488,0.757493,0.748589,0.751051
6,0.6438,0.628685,0.7619,0.764813,0.761523,0.761698
7,0.6389,0.637296,0.7625,0.763761,0.762338,0.758937
8,0.6345,0.629776,0.7634,0.76744,0.763219,0.762512
9,0.6342,0.638616,0.7589,0.763955,0.758545,0.758567
10,0.6292,0.628949,0.7605,0.765172,0.760285,0.759634


[I 2025-03-27 18:02:24,157] Trial 140 finished with value: 0.7596340956893952 and parameters: {'learning_rate': 0.0010322248435071174, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 141 with params: {'learning_rate': 0.0006490152825321372, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7888,0.682389,0.7306,0.752226,0.730209,0.731745
2,0.6697,0.654145,0.7473,0.752066,0.746759,0.746235


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7978,0.682615,0.7315,0.752775,0.731072,0.732652
2,0.6706,0.653777,0.7474,0.751751,0.746844,0.74645
3,0.6552,0.6592,0.7491,0.755524,0.748094,0.747981
4,0.6492,0.632621,0.7612,0.765339,0.761292,0.760633
5,0.6443,0.64695,0.7479,0.758973,0.747792,0.749957
6,0.6411,0.630608,0.7608,0.763431,0.760573,0.76068
7,0.6368,0.633194,0.7635,0.763217,0.763354,0.760293
8,0.6336,0.631067,0.7635,0.768395,0.763317,0.762726
9,0.6337,0.637829,0.7588,0.762851,0.758435,0.758487
10,0.6301,0.631262,0.762,0.766595,0.761753,0.761116


[I 2025-03-27 18:13:12,787] Trial 142 finished with value: 0.7611162103372292 and parameters: {'learning_rate': 0.0006140425564316962, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 44 with value: 0.7611287079618219.


Trial 143 with params: {'learning_rate': 0.00031059973821209117, 'weight_decay': 0.01, 'warmup_steps': 7, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.89,0.702387,0.7218,0.736046,0.721416,0.72237
2,0.6903,0.667604,0.7385,0.746091,0.737746,0.738507
3,0.6669,0.659096,0.747,0.747292,0.746233,0.745207


[I 2025-03-27 18:15:42,354] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 5.8193477735771966e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2797,1.021069,0.627,0.637546,0.626814,0.624189
2,0.9203,0.835161,0.6848,0.688216,0.683792,0.681973
3,0.8019,0.76836,0.704,0.699892,0.70327,0.69822
4,0.7565,0.73215,0.7176,0.719838,0.717546,0.7154
5,0.7328,0.73153,0.7162,0.721538,0.715975,0.716728
6,0.7198,0.706686,0.7268,0.727247,0.72646,0.725513


[I 2025-03-27 18:20:49,087] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0005195070959737797, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8102,0.684537,0.731,0.750291,0.730572,0.732352
2,0.6737,0.653976,0.7474,0.751929,0.746835,0.746885
3,0.6569,0.658308,0.7508,0.756078,0.749802,0.749804
4,0.6502,0.634346,0.7597,0.763136,0.759785,0.758881
5,0.6448,0.649197,0.7478,0.758427,0.74771,0.74963
6,0.6416,0.631239,0.7597,0.761822,0.759498,0.759453


[I 2025-03-27 18:25:52,753] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0006551346139940469, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8045,0.68297,0.7307,0.752878,0.730325,0.731846
2,0.67,0.654333,0.7472,0.751927,0.746656,0.746145
3,0.6549,0.659106,0.7501,0.756451,0.749108,0.748821
4,0.649,0.632029,0.7619,0.766435,0.761975,0.76151
5,0.6443,0.645953,0.7484,0.759295,0.748291,0.750511
6,0.6411,0.630398,0.7609,0.763738,0.760658,0.760844
7,0.6368,0.63342,0.7638,0.763772,0.76365,0.760582
8,0.6335,0.630737,0.7642,0.768944,0.764033,0.763408
9,0.6336,0.637662,0.7589,0.763032,0.758533,0.75856
10,0.6298,0.630816,0.7619,0.766571,0.761658,0.76104


[I 2025-03-27 18:34:24,790] Trial 146 finished with value: 0.7610400382532697 and parameters: {'learning_rate': 0.0006551346139940469, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 3.5}. Best is trial 44 with value: 0.7611287079618219.


Trial 147 with params: {'learning_rate': 0.0025651152134400176, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7497,0.655738,0.7465,0.753288,0.746327,0.745358
2,0.6874,0.670533,0.7406,0.7538,0.739413,0.736375
3,0.6811,0.66871,0.7432,0.753931,0.742573,0.742693


[I 2025-03-27 18:36:52,200] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.00030263813075621113, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8995,0.704348,0.721,0.735005,0.720627,0.721515
2,0.6916,0.668573,0.7379,0.745791,0.737124,0.737962
3,0.6677,0.659613,0.7466,0.746758,0.745844,0.744771


[I 2025-03-27 18:39:20,820] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.00036175490456492393, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.886,0.695763,0.7244,0.739898,0.723998,0.725209
2,0.685,0.662976,0.7407,0.747817,0.740021,0.74066
3,0.6634,0.657246,0.7482,0.749375,0.747374,0.746632
4,0.655,0.640252,0.7558,0.758613,0.755834,0.754828
5,0.6486,0.653766,0.7434,0.752654,0.743328,0.745109
6,0.645,0.634844,0.7588,0.76039,0.758581,0.758213


[I 2025-03-27 18:44:15,205] Trial 149 pruned. 


In [36]:
print(best_distill_head)

BestRun(run_id='44', objective=0.7611287079618219, hyperparameters={'learning_rate': 0.00063155918393816, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}, run_summary=None)


In [37]:
base.reset_seed()

## Prohledávání s normálním tréninkem s doučením předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [38]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [39]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [40]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [41]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_mobilenet(10)
)
  

Nastavení prohledávání.

In [42]:
best_base_pretrained = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-03-27 18:44:15,803] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4654,0.263481,0.9092,0.915397,0.908928,0.910108
2,0.1381,0.245159,0.9212,0.92627,0.921046,0.92233
3,0.0705,0.241737,0.9265,0.927986,0.926548,0.926458


[I 2025-03-27 18:48:00,751] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4524,0.343962,0.881,0.892425,0.880545,0.882012
2,0.221,0.328661,0.895,0.907515,0.894994,0.897004
3,0.1441,0.24011,0.9255,0.926371,0.925585,0.92551
4,0.0918,0.257143,0.9261,0.928885,0.926272,0.926259
5,0.0588,0.236863,0.9343,0.93497,0.934541,0.934217
6,0.0337,0.254703,0.9385,0.940228,0.938453,0.938885


[I 2025-03-27 18:55:31,462] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6982,0.319311,0.8906,0.89525,0.890723,0.890534
2,0.2002,0.260382,0.9115,0.916045,0.91131,0.912371
3,0.1046,0.247285,0.9165,0.918605,0.916461,0.916659


[I 2025-03-27 18:59:14,258] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5643,0.419251,0.8591,0.869648,0.858922,0.859315
2,0.3086,0.363419,0.8884,0.901121,0.888681,0.88957
3,0.2103,0.3242,0.8987,0.900872,0.898951,0.897937
4,0.1427,0.278068,0.9142,0.916675,0.9146,0.914201
5,0.0915,0.235788,0.9266,0.92823,0.926923,0.926888
6,0.0544,0.256128,0.9311,0.932482,0.931325,0.931392


[I 2025-03-27 19:06:41,132] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6728,0.550348,0.8199,0.837817,0.819618,0.819475
2,0.3988,0.402301,0.8696,0.88166,0.869612,0.871904
3,0.2829,0.377453,0.8786,0.8825,0.878717,0.878438


[I 2025-03-27 19:10:26,177] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.553,0.26492,0.9083,0.912123,0.908328,0.908358
2,0.146,0.238327,0.9213,0.925133,0.921177,0.921972
3,0.0633,0.235274,0.9275,0.929176,0.927489,0.92763
4,0.0248,0.244032,0.9311,0.932337,0.931347,0.931179
5,0.0111,0.278043,0.9323,0.934468,0.932536,0.932256
6,0.0053,0.281128,0.9342,0.9367,0.934051,0.93456


[I 2025-03-27 19:17:55,917] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4469,0.273704,0.908,0.913063,0.907926,0.908317
2,0.1495,0.273804,0.9153,0.924549,0.915285,0.91696
3,0.0798,0.249981,0.9255,0.927046,0.925611,0.925597
4,0.0484,0.262727,0.9334,0.9349,0.933751,0.933478
5,0.0269,0.24805,0.9396,0.940224,0.939881,0.939651
6,0.0161,0.245347,0.9409,0.94186,0.940928,0.941096


[I 2025-03-27 19:25:24,396] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5797,0.281892,0.9022,0.906514,0.902334,0.902231
2,0.1597,0.233874,0.9208,0.924063,0.920691,0.92145
3,0.0705,0.239582,0.9256,0.927212,0.925604,0.925675
4,0.0268,0.25164,0.9295,0.930448,0.929761,0.929606
5,0.0125,0.274207,0.9311,0.932421,0.931278,0.931221
6,0.0061,0.261962,0.9349,0.936714,0.93484,0.935254
7,0.0033,0.317253,0.9267,0.928087,0.927191,0.926147
8,0.0022,0.306048,0.9338,0.937383,0.934047,0.934107
9,0.0014,0.392361,0.9168,0.924936,0.917004,0.917361
10,0.0011,0.29229,0.9353,0.936187,0.935502,0.935252


[I 2025-03-27 19:37:43,389] Trial 7 finished with value: 0.9352515984368184 and parameters: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}. Best is trial 7 with value: 0.9352515984368184.


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4095,0.30674,0.8964,0.907272,0.896285,0.897309
2,0.1568,0.231565,0.9256,0.928376,0.925675,0.926093
3,0.0861,0.242979,0.9272,0.928487,0.927132,0.927178
4,0.0529,0.240205,0.9355,0.937771,0.935776,0.935874
5,0.0327,0.264316,0.9361,0.936896,0.936373,0.935623
6,0.0166,0.278114,0.9391,0.941389,0.939169,0.939496
7,0.008,0.285715,0.9402,0.940787,0.940652,0.939743
8,0.0038,0.263877,0.9481,0.950787,0.948287,0.948459
9,0.0017,0.325266,0.9375,0.942318,0.937638,0.938001
10,0.0005,0.223224,0.9546,0.954969,0.954779,0.954647


[I 2025-03-27 19:50:02,442] Trial 8 finished with value: 0.9546474828356917 and parameters: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 8 with value: 0.9546474828356917.


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4085,0.303388,0.8979,0.904529,0.897861,0.89852
2,0.1771,0.270424,0.9146,0.922226,0.914745,0.915968
3,0.1056,0.273943,0.9208,0.923387,0.921094,0.920817


[I 2025-03-27 19:53:44,725] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 6.888788881730778e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6279,0.304948,0.8921,0.896054,0.892197,0.892043
2,0.1904,0.250431,0.9156,0.919985,0.915397,0.916406
3,0.0968,0.246477,0.9161,0.91794,0.916098,0.916009
4,0.0443,0.238115,0.9254,0.926364,0.925579,0.925482
5,0.0203,0.261137,0.9246,0.925522,0.924917,0.92448
6,0.0096,0.265957,0.9292,0.931866,0.929083,0.929614


[I 2025-03-27 20:01:08,778] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 8.238154754398708e-05, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5782,0.29444,0.8985,0.903228,0.898582,0.898493
2,0.1695,0.244806,0.9176,0.921643,0.917372,0.918391
3,0.079,0.237531,0.9216,0.923079,0.921586,0.921635
4,0.0317,0.255355,0.926,0.927148,0.926231,0.926095
5,0.0151,0.266413,0.9272,0.928455,0.927501,0.927204
6,0.0064,0.269038,0.9315,0.933926,0.931352,0.931853


[I 2025-03-27 20:08:36,098] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.0004229895735463087, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4194,0.304716,0.8953,0.905495,0.895368,0.895684
2,0.16,0.261998,0.9168,0.922635,0.916822,0.918005
3,0.0904,0.260738,0.9207,0.92234,0.920896,0.920675
4,0.0541,0.227482,0.9349,0.936234,0.935138,0.93522
5,0.0339,0.266975,0.9337,0.93479,0.934007,0.933634
6,0.0184,0.247846,0.9417,0.943473,0.941781,0.942043
7,0.0081,0.283643,0.9388,0.939849,0.939285,0.938498
8,0.0051,0.280631,0.9424,0.946178,0.942759,0.942656
9,0.0018,0.332455,0.9353,0.940403,0.935379,0.935711
10,0.0005,0.229064,0.947,0.947482,0.947264,0.946978


[I 2025-03-27 20:21:01,726] Trial 12 finished with value: 0.94697786824118 and parameters: {'learning_rate': 0.0004229895735463087, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}. Best is trial 8 with value: 0.9546474828356917.


Trial 13 with params: {'learning_rate': 0.0008699664253571104, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4808,0.364158,0.8771,0.887569,0.876802,0.877422
2,0.2373,0.332598,0.8924,0.905846,0.89256,0.893692
3,0.1565,0.268468,0.9178,0.919682,0.917967,0.91777


[I 2025-03-27 20:24:43,667] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0006224242239267788, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4193,0.337617,0.8879,0.896275,0.8878,0.888534
2,0.1971,0.270526,0.9148,0.92239,0.914578,0.916291
3,0.1178,0.265667,0.9188,0.920411,0.919139,0.918587
4,0.0805,0.232883,0.9332,0.934214,0.933537,0.933197
5,0.0456,0.211277,0.9382,0.938962,0.93833,0.938304
6,0.0257,0.240927,0.9418,0.943435,0.941796,0.942126
7,0.0104,0.302284,0.9334,0.9351,0.933885,0.933162
8,0.0066,0.246375,0.9476,0.949507,0.947878,0.947695
9,0.0027,0.344267,0.9358,0.940983,0.935862,0.936149
10,0.0009,0.252189,0.946,0.946624,0.946269,0.945918


[I 2025-03-27 20:37:10,242] Trial 14 finished with value: 0.9459180611592564 and parameters: {'learning_rate': 0.0006224242239267788, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2}. Best is trial 8 with value: 0.9546474828356917.


Trial 15 with params: {'learning_rate': 0.0038125646803423923, 'weight_decay': 0.007, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8397,0.755041,0.7564,0.789266,0.755451,0.756333
2,0.5003,0.4736,0.8469,0.853979,0.846973,0.848059
3,0.3593,0.424796,0.8603,0.866335,0.860484,0.859928


[I 2025-03-27 20:40:54,533] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.00043686185384014543, 'weight_decay': 0.007, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4211,0.29003,0.9006,0.905964,0.90046,0.900993
2,0.1617,0.264198,0.9153,0.923748,0.915222,0.916807
3,0.0904,0.242139,0.9265,0.92811,0.926516,0.926578
4,0.0534,0.237762,0.9347,0.936471,0.934979,0.934913
5,0.0297,0.252312,0.9371,0.938317,0.937357,0.937115
6,0.02,0.270023,0.9385,0.940748,0.938439,0.938688


[I 2025-03-27 20:48:22,186] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0020085822314002493, 'weight_decay': 0.008, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6464,0.512231,0.8311,0.849619,0.830741,0.830316
2,0.3812,0.459296,0.8565,0.879672,0.856721,0.859661
3,0.268,0.36718,0.8798,0.885393,0.879968,0.879028


[I 2025-03-27 20:52:06,911] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0001044907148504563, 'weight_decay': 0.006, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6094,0.269199,0.9072,0.910724,0.907227,0.907172
2,0.1545,0.229915,0.9207,0.923712,0.920642,0.921252
3,0.0667,0.241808,0.9233,0.925154,0.9233,0.923326
4,0.0257,0.250393,0.9316,0.932852,0.931867,0.93161
5,0.0115,0.274575,0.9297,0.931095,0.929937,0.929652
6,0.0056,0.269897,0.934,0.936472,0.933882,0.934309


[I 2025-03-27 20:59:32,889] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0002040965675286908, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4459,0.254644,0.9108,0.914935,0.910613,0.911056
2,0.1274,0.244075,0.922,0.926594,0.921787,0.923001
3,0.0574,0.237718,0.9283,0.930126,0.928217,0.928315
4,0.0276,0.228391,0.9401,0.940985,0.940278,0.940246
5,0.0163,0.262499,0.9374,0.938029,0.937685,0.937194
6,0.0077,0.27093,0.9381,0.940176,0.937998,0.938471


[I 2025-03-27 21:07:01,403] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0007343282446025902, 'weight_decay': 0.006, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4736,0.374225,0.8756,0.886908,0.875582,0.876977
2,0.2153,0.302161,0.9021,0.911089,0.902125,0.903365
3,0.1364,0.29638,0.9121,0.91539,0.912348,0.91176


[I 2025-03-27 21:10:45,906] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.000546454043045551, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4114,0.292173,0.902,0.908191,0.901962,0.9025
2,0.1796,0.285899,0.9094,0.917736,0.909583,0.910775
3,0.1061,0.264715,0.922,0.923512,0.922096,0.921843
4,0.0664,0.256167,0.9313,0.933473,0.931544,0.931347
5,0.0418,0.23501,0.9388,0.939234,0.938919,0.938565
6,0.0243,0.261941,0.9387,0.940872,0.938779,0.939111
7,0.0107,0.30729,0.9368,0.938369,0.937168,0.936501
8,0.006,0.25047,0.9494,0.950081,0.94965,0.949445
9,0.0026,0.387793,0.9328,0.939028,0.933,0.933248
10,0.001,0.231872,0.9498,0.94987,0.950108,0.949753


[I 2025-03-27 21:23:12,303] Trial 21 finished with value: 0.9497533060068288 and parameters: {'learning_rate': 0.000546454043045551, 'weight_decay': 0.01, 'warmup_steps': 2}. Best is trial 8 with value: 0.9546474828356917.


Trial 22 with params: {'learning_rate': 0.0002879322945635685, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3993,0.25626,0.9137,0.918132,0.913519,0.91401
2,0.1309,0.238576,0.9247,0.92983,0.924657,0.925923
3,0.0649,0.214309,0.9367,0.937742,0.93676,0.936786
4,0.0402,0.223057,0.94,0.941453,0.940177,0.94022
5,0.0199,0.27954,0.9332,0.934605,0.933343,0.933002
6,0.0115,0.2608,0.9432,0.945295,0.943201,0.943562
7,0.0042,0.290329,0.938,0.939054,0.938478,0.937502
8,0.0032,0.254991,0.9484,0.950294,0.948572,0.948688
9,0.0014,0.321377,0.9395,0.94284,0.939585,0.939846
10,0.0005,0.24676,0.9499,0.950349,0.950107,0.949875


[I 2025-03-27 21:35:46,105] Trial 22 finished with value: 0.9498753816765519 and parameters: {'learning_rate': 0.0002879322945635685, 'weight_decay': 0.01, 'warmup_steps': 1}. Best is trial 8 with value: 0.9546474828356917.


Trial 23 with params: {'learning_rate': 0.00021932088050101418, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4112,0.240965,0.9151,0.919195,0.915008,0.915469
2,0.1259,0.237263,0.9234,0.927091,0.92351,0.92407
3,0.0574,0.237685,0.9314,0.933294,0.931264,0.931559
4,0.0271,0.241172,0.937,0.938317,0.937106,0.937198
5,0.0159,0.266881,0.9359,0.936645,0.936075,0.935752
6,0.0083,0.281359,0.9374,0.939869,0.937318,0.937778


[I 2025-03-27 21:43:13,589] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.000480314448395321, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3974,0.285647,0.9028,0.910133,0.902781,0.903422
2,0.1692,0.25759,0.9186,0.925347,0.918683,0.919524
3,0.0962,0.251073,0.9219,0.923763,0.922112,0.921911
4,0.064,0.232051,0.9315,0.933213,0.93171,0.931816
5,0.035,0.255858,0.9342,0.935727,0.934331,0.934336
6,0.0214,0.243115,0.9407,0.942675,0.940707,0.940998


[I 2025-03-27 21:50:37,678] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.004985341137518224, 'weight_decay': 0.01, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.899,0.657984,0.7786,0.792455,0.777797,0.778168
2,0.5327,0.495925,0.8334,0.842586,0.83358,0.832895
3,0.3861,0.428781,0.8588,0.861934,0.858967,0.858485
4,0.299,0.37753,0.8752,0.878745,0.87543,0.875481
5,0.2254,0.335803,0.8922,0.894242,0.892301,0.892505
6,0.16,0.335913,0.8957,0.897769,0.895951,0.896032


[I 2025-03-27 21:58:03,574] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.0001563798024174134, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.443,0.248981,0.9168,0.920546,0.916673,0.916974
2,0.1265,0.234291,0.9236,0.927034,0.923639,0.924171
3,0.0541,0.246064,0.9271,0.929239,0.926931,0.927231
4,0.0224,0.254886,0.9317,0.93311,0.931935,0.931785
5,0.0126,0.275249,0.9355,0.936631,0.935702,0.935471
6,0.0059,0.251232,0.9422,0.943294,0.942236,0.942367
7,0.0032,0.304044,0.9341,0.935414,0.934498,0.933675
8,0.0016,0.320741,0.9377,0.941623,0.937912,0.937898
9,0.0011,0.359816,0.9279,0.933115,0.928013,0.928376
10,0.0006,0.277648,0.942,0.942954,0.942186,0.942037


[I 2025-03-27 22:10:30,245] Trial 26 finished with value: 0.9420367341995266 and parameters: {'learning_rate': 0.0001563798024174134, 'weight_decay': 0.007, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 27 with params: {'learning_rate': 0.00021059103361382344, 'weight_decay': 0.001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5078,0.257351,0.9099,0.91443,0.909592,0.91008
2,0.1332,0.255661,0.9186,0.923985,0.918351,0.919468
3,0.0578,0.241868,0.9297,0.931168,0.929739,0.92972
4,0.0303,0.246846,0.9347,0.935703,0.934758,0.934744
5,0.0158,0.250364,0.9355,0.936193,0.935546,0.935391
6,0.0088,0.262753,0.9396,0.941047,0.939549,0.93981
7,0.0033,0.295413,0.9353,0.936407,0.935679,0.934986
8,0.0017,0.277837,0.943,0.945954,0.943021,0.943391
9,0.0011,0.39034,0.9268,0.933116,0.926904,0.927208
10,0.0006,0.269778,0.9444,0.945071,0.944572,0.94436


[I 2025-03-27 22:22:54,993] Trial 27 finished with value: 0.9443601507403996 and parameters: {'learning_rate': 0.00021059103361382344, 'weight_decay': 0.001, 'warmup_steps': 32}. Best is trial 8 with value: 0.9546474828356917.


Trial 28 with params: {'learning_rate': 0.0014409814341199108, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5434,0.46467,0.8405,0.852857,0.840883,0.839346
2,0.315,0.368169,0.8815,0.893644,0.881778,0.883315
3,0.2187,0.320221,0.8982,0.901826,0.898513,0.898511
4,0.1487,0.267111,0.9163,0.919278,0.916524,0.916713
5,0.0966,0.257105,0.9269,0.928364,0.927001,0.926923
6,0.0598,0.284195,0.923,0.925336,0.923248,0.923305


[I 2025-03-27 22:30:26,200] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0002016130651588556, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4574,0.250495,0.9121,0.915893,0.912,0.912286
2,0.1278,0.235353,0.9242,0.927934,0.924123,0.92498
3,0.0563,0.250303,0.9281,0.929197,0.928177,0.927959
4,0.0283,0.227784,0.9399,0.940488,0.940133,0.939931
5,0.0146,0.278225,0.9321,0.934382,0.932263,0.931923
6,0.0067,0.253226,0.9424,0.944176,0.942256,0.942722
7,0.003,0.314792,0.9325,0.93426,0.933034,0.931982
8,0.002,0.257488,0.9464,0.948277,0.94654,0.94673
9,0.0012,0.3449,0.9328,0.93738,0.933032,0.933091
10,0.0006,0.259193,0.943,0.943628,0.943166,0.942914


[I 2025-03-27 22:42:47,848] Trial 29 finished with value: 0.9429141636155745 and parameters: {'learning_rate': 0.0002016130651588556, 'weight_decay': 0.007, 'warmup_steps': 11}. Best is trial 8 with value: 0.9546474828356917.


Trial 30 with params: {'learning_rate': 0.0004859512473652397, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4142,0.308629,0.8971,0.905246,0.897189,0.897616
2,0.1684,0.243797,0.9249,0.929997,0.924893,0.925932
3,0.0973,0.2596,0.9259,0.927826,0.925956,0.925937
4,0.0629,0.241445,0.9295,0.931057,0.929783,0.929524
5,0.0364,0.244281,0.9362,0.937207,0.936439,0.936187
6,0.0213,0.265533,0.9386,0.940904,0.938551,0.938964


[I 2025-03-27 22:50:11,631] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.00046249678403950624, 'weight_decay': 0.01, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4389,0.283418,0.9014,0.908819,0.901411,0.902111
2,0.1722,0.241975,0.9219,0.926129,0.921966,0.923057
3,0.0968,0.256744,0.9248,0.925817,0.924897,0.924699
4,0.0607,0.220352,0.939,0.940199,0.939228,0.939175
5,0.0341,0.27146,0.9357,0.936705,0.935949,0.935595
6,0.0188,0.244109,0.9429,0.944601,0.942835,0.943235
7,0.0081,0.273691,0.9405,0.94124,0.940885,0.940212
8,0.0049,0.271849,0.947,0.948829,0.947316,0.947092
9,0.0017,0.342328,0.9366,0.940979,0.936632,0.936964
10,0.0008,0.247314,0.9483,0.948688,0.948473,0.948237


[I 2025-03-27 23:02:44,436] Trial 31 finished with value: 0.9482370669361637 and parameters: {'learning_rate': 0.00046249678403950624, 'weight_decay': 0.01, 'warmup_steps': 16}. Best is trial 8 with value: 0.9546474828356917.


Trial 32 with params: {'learning_rate': 0.0005006559691443925, 'weight_decay': 0.01, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4652,0.293603,0.9015,0.908257,0.901288,0.902153
2,0.1775,0.274956,0.9118,0.920696,0.911732,0.913544
3,0.1052,0.230676,0.9299,0.931097,0.930086,0.929871
4,0.0612,0.255773,0.9275,0.929467,0.927845,0.927537
5,0.0386,0.216623,0.9417,0.941925,0.941964,0.941612
6,0.0238,0.256999,0.9344,0.937026,0.934331,0.934793


[I 2025-03-27 23:10:16,220] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0011325334878993308, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5269,0.397684,0.8638,0.875438,0.863588,0.86437
2,0.2758,0.306002,0.8949,0.902616,0.894901,0.896409
3,0.1912,0.296453,0.9103,0.912227,0.91063,0.910078
4,0.1278,0.263092,0.9204,0.92318,0.920772,0.920392
5,0.0826,0.278843,0.9236,0.925936,0.923695,0.923785
6,0.0528,0.261605,0.9285,0.930059,0.928648,0.928757


[I 2025-03-27 23:17:44,094] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0006602715105091649, 'weight_decay': 0.01, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4633,0.337536,0.8873,0.895474,0.887202,0.887572
2,0.2027,0.326039,0.8947,0.907573,0.894918,0.896646
3,0.1257,0.243284,0.9245,0.926247,0.9247,0.924833
4,0.0803,0.283139,0.921,0.924294,0.921275,0.921147
5,0.0507,0.230521,0.9366,0.937476,0.936839,0.936591
6,0.0329,0.262162,0.9355,0.937321,0.935448,0.935704


[I 2025-03-27 23:25:07,543] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.0002334903759311196, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4665,0.249465,0.9123,0.917087,0.912233,0.912612
2,0.1305,0.229853,0.9254,0.928835,0.925292,0.926208
3,0.0621,0.255954,0.9258,0.928073,0.925816,0.925888
4,0.0314,0.253068,0.9348,0.935918,0.935099,0.934937
5,0.017,0.264556,0.9398,0.940431,0.939994,0.939595
6,0.0106,0.279906,0.9394,0.941167,0.939377,0.939754
7,0.0044,0.304254,0.9379,0.938941,0.93836,0.937417
8,0.0022,0.272752,0.9453,0.9474,0.945495,0.945606
9,0.0009,0.366975,0.9306,0.936201,0.930724,0.931048
10,0.0005,0.269747,0.9469,0.947637,0.947159,0.946906


[I 2025-03-27 23:37:34,529] Trial 35 finished with value: 0.9469059861229805 and parameters: {'learning_rate': 0.0002334903759311196, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16}. Best is trial 8 with value: 0.9546474828356917.


Trial 36 with params: {'learning_rate': 0.002248224121235652, 'weight_decay': 0.004, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6687,0.614839,0.7902,0.819168,0.790269,0.788323
2,0.3894,0.443026,0.8553,0.874526,0.855583,0.85698
3,0.2795,0.370236,0.882,0.88701,0.881949,0.881402


[I 2025-03-27 23:41:20,492] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0010082525826206983, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4759,0.428288,0.8541,0.869423,0.854019,0.853877
2,0.2535,0.337301,0.8907,0.902032,0.890818,0.89236
3,0.1722,0.315778,0.9044,0.908673,0.904666,0.904792
4,0.1123,0.247375,0.9222,0.923896,0.922427,0.921975
5,0.0739,0.242296,0.9317,0.933013,0.932043,0.93181
6,0.0432,0.275499,0.9313,0.933002,0.931537,0.93162


[I 2025-03-27 23:48:53,440] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0009932336134781208, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4869,0.428867,0.8573,0.870792,0.857305,0.857718
2,0.2584,0.310545,0.8984,0.907864,0.898544,0.899882
3,0.171,0.278642,0.9139,0.915718,0.914238,0.913649
4,0.1116,0.261475,0.9217,0.924919,0.92202,0.921923
5,0.0723,0.272339,0.9237,0.925839,0.923796,0.923785
6,0.045,0.271728,0.929,0.931172,0.929028,0.929263


[I 2025-03-27 23:56:25,307] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.00029043368789957854, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4033,0.255559,0.9149,0.921304,0.914899,0.915543
2,0.1308,0.225183,0.929,0.931708,0.929083,0.929782
3,0.0664,0.260624,0.9248,0.927072,0.924677,0.924864
4,0.039,0.242506,0.9338,0.934643,0.934029,0.933931
5,0.023,0.267041,0.9343,0.935348,0.93459,0.934072
6,0.0116,0.243814,0.9414,0.943271,0.941379,0.941775
7,0.0047,0.294265,0.9375,0.938445,0.937949,0.937047
8,0.0027,0.260635,0.9457,0.948728,0.945781,0.946109
9,0.0016,0.353747,0.9346,0.939522,0.934755,0.934682
10,0.0007,0.248581,0.946,0.946414,0.946291,0.945893


[I 2025-03-28 00:08:52,086] Trial 39 finished with value: 0.9458927885100327 and parameters: {'learning_rate': 0.00029043368789957854, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}. Best is trial 8 with value: 0.9546474828356917.


Trial 40 with params: {'learning_rate': 0.0002768337827053312, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3988,0.267327,0.9095,0.91479,0.909351,0.910018
2,0.1295,0.236236,0.9243,0.927372,0.924382,0.924842
3,0.0643,0.257986,0.9262,0.927418,0.926293,0.926054
4,0.0365,0.258702,0.9345,0.93619,0.934805,0.934519
5,0.0211,0.284291,0.9305,0.931804,0.930792,0.930168
6,0.0109,0.264251,0.942,0.943372,0.941953,0.942275
7,0.0046,0.301731,0.9347,0.935952,0.935205,0.934358
8,0.002,0.292866,0.942,0.94517,0.942163,0.942494
9,0.0009,0.400796,0.9263,0.933994,0.926369,0.926841
10,0.0006,0.252352,0.9471,0.947506,0.947357,0.947055


[I 2025-03-28 00:21:25,376] Trial 40 finished with value: 0.9470546018847369 and parameters: {'learning_rate': 0.0002768337827053312, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}. Best is trial 8 with value: 0.9546474828356917.


Trial 41 with params: {'learning_rate': 0.0003308621240765298, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4089,0.25572,0.9138,0.918127,0.913614,0.914281
2,0.1384,0.262262,0.9185,0.92419,0.91865,0.919418
3,0.0743,0.231861,0.9324,0.934382,0.93239,0.932514
4,0.041,0.211515,0.9413,0.942134,0.941443,0.941418
5,0.0251,0.24594,0.9393,0.939856,0.93955,0.93922
6,0.0122,0.259685,0.9402,0.941572,0.94031,0.940413


[I 2025-03-28 00:28:51,799] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.00021591086654746623, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4237,0.250101,0.9149,0.919343,0.914723,0.915196
2,0.1271,0.237199,0.9246,0.928606,0.924469,0.925375
3,0.0557,0.259254,0.9246,0.926276,0.924612,0.924507
4,0.0297,0.243567,0.9361,0.937519,0.936165,0.936242
5,0.0174,0.246004,0.9391,0.939465,0.939334,0.938982
6,0.0077,0.259094,0.942,0.94345,0.942013,0.942302
7,0.0032,0.337003,0.9286,0.930682,0.929011,0.928005
8,0.0021,0.283865,0.9399,0.943134,0.940071,0.940269
9,0.0013,0.374675,0.9294,0.935361,0.929607,0.929778
10,0.0007,0.258551,0.9457,0.946371,0.945913,0.945657


[I 2025-03-28 00:41:24,063] Trial 42 finished with value: 0.9456566620550992 and parameters: {'learning_rate': 0.00021591086654746623, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2}. Best is trial 8 with value: 0.9546474828356917.


Trial 43 with params: {'learning_rate': 0.0004476138044561108, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3897,0.280588,0.9083,0.914811,0.907982,0.908786
2,0.164,0.250667,0.9217,0.928088,0.921737,0.923192
3,0.0903,0.247148,0.9251,0.926467,0.925294,0.924954
4,0.0561,0.255856,0.9302,0.931644,0.930419,0.930291
5,0.0334,0.248061,0.9385,0.940101,0.938778,0.938355
6,0.0201,0.236097,0.9442,0.946018,0.94411,0.944538
7,0.008,0.267776,0.9442,0.944558,0.944569,0.944006
8,0.0046,0.251979,0.9489,0.950723,0.949077,0.949018
9,0.0015,0.345946,0.9362,0.940918,0.936323,0.936567
10,0.0008,0.220273,0.9519,0.952155,0.952118,0.951875


[I 2025-03-28 00:53:45,235] Trial 43 finished with value: 0.9518745587491695 and parameters: {'learning_rate': 0.0004476138044561108, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 44 with params: {'learning_rate': 7.012112975444019e-05, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6892,0.310267,0.8926,0.89696,0.892742,0.892534
2,0.1934,0.25589,0.9146,0.919503,0.914425,0.915552
3,0.0983,0.249978,0.9169,0.918838,0.916877,0.916928
4,0.0454,0.240041,0.9249,0.925997,0.925065,0.924989
5,0.0202,0.262892,0.9251,0.926236,0.92545,0.92509
6,0.0098,0.274987,0.9281,0.9308,0.927992,0.928501


[I 2025-03-28 01:01:13,279] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0005159851228046056, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3977,0.305402,0.8963,0.905245,0.896371,0.897091
2,0.1746,0.26013,0.9187,0.926269,0.918562,0.91999
3,0.1013,0.282763,0.9193,0.920999,0.919466,0.919257


[I 2025-03-28 01:04:56,060] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00012827851737332596, 'weight_decay': 0.0, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.511,0.253187,0.9133,0.917101,0.913294,0.913497
2,0.1394,0.239584,0.9221,0.92562,0.922043,0.922697
3,0.059,0.238179,0.927,0.928844,0.926989,0.927091
4,0.0234,0.246453,0.9311,0.931844,0.931356,0.931035
5,0.0101,0.273716,0.9313,0.932301,0.931485,0.931209
6,0.0044,0.277112,0.9353,0.937811,0.935216,0.935683


[I 2025-03-28 01:12:18,938] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0005074340163309952, 'weight_decay': 0.007, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4119,0.296194,0.8998,0.906965,0.899781,0.90073
2,0.1725,0.26581,0.9153,0.920728,0.915232,0.916372
3,0.1024,0.242039,0.9271,0.928054,0.927379,0.927108
4,0.0625,0.243177,0.9332,0.935728,0.933417,0.933557
5,0.0408,0.252196,0.9351,0.9367,0.935229,0.935205
6,0.0215,0.268281,0.9373,0.93907,0.937539,0.937559


[I 2025-03-28 01:19:46,522] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.00015466526399697373, 'weight_decay': 0.01, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5126,0.251895,0.912,0.915751,0.911923,0.912176
2,0.1341,0.227344,0.9233,0.926757,0.923314,0.923995
3,0.0558,0.237319,0.9285,0.930187,0.928441,0.928609
4,0.0245,0.250137,0.9335,0.934522,0.933786,0.933571
5,0.0115,0.292077,0.9305,0.932252,0.930654,0.930429
6,0.0052,0.278281,0.9391,0.941478,0.938916,0.939506


[I 2025-03-28 01:27:11,719] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0005041696413811824, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4153,0.301968,0.899,0.904225,0.8989,0.899385
2,0.1705,0.279483,0.9144,0.921598,0.914238,0.915782
3,0.1023,0.279301,0.918,0.921382,0.917996,0.917975
4,0.0638,0.252123,0.9297,0.930396,0.930031,0.92959
5,0.0386,0.25006,0.9355,0.936543,0.935632,0.935419
6,0.023,0.239233,0.9406,0.941969,0.940526,0.940832
7,0.0103,0.277796,0.9407,0.941039,0.941214,0.940192
8,0.0047,0.258947,0.9477,0.949343,0.94799,0.947778
9,0.0018,0.335601,0.937,0.941123,0.937059,0.937279
10,0.001,0.22269,0.9517,0.951948,0.951909,0.951716


[I 2025-03-28 01:39:31,609] Trial 49 finished with value: 0.9517160392685786 and parameters: {'learning_rate': 0.0005041696413811824, 'weight_decay': 0.01, 'warmup_steps': 6}. Best is trial 8 with value: 0.9546474828356917.


Trial 50 with params: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7242,0.560525,0.8142,0.829011,0.813985,0.813607
2,0.4333,0.426316,0.8549,0.864311,0.855031,0.856378
3,0.3076,0.433153,0.8638,0.870704,0.864092,0.863471
4,0.2276,0.295936,0.9012,0.903032,0.901344,0.901332
5,0.1561,0.322146,0.9003,0.903968,0.900658,0.900142
6,0.0976,0.286979,0.9144,0.917294,0.914384,0.915026


[I 2025-03-28 01:46:57,066] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.00047938636507923127, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4259,0.292482,0.9009,0.906983,0.900774,0.901525
2,0.1721,0.240586,0.9244,0.927871,0.924416,0.925165
3,0.0997,0.22202,0.933,0.933618,0.933145,0.932973
4,0.0588,0.233347,0.9355,0.938017,0.935787,0.935708
5,0.0366,0.246227,0.9393,0.939867,0.93949,0.939188
6,0.0204,0.248609,0.9402,0.941885,0.940217,0.940478
7,0.0102,0.268485,0.9392,0.940426,0.939703,0.939021
8,0.004,0.245614,0.9485,0.950484,0.948808,0.948641
9,0.0014,0.341054,0.9357,0.940961,0.935853,0.936119
10,0.0007,0.246933,0.9492,0.949769,0.94948,0.949084


[I 2025-03-28 01:59:27,554] Trial 51 finished with value: 0.9490836287159828 and parameters: {'learning_rate': 0.00047938636507923127, 'weight_decay': 0.01, 'warmup_steps': 10}. Best is trial 8 with value: 0.9546474828356917.


Trial 52 with params: {'learning_rate': 0.004803130612126116, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8769,0.730283,0.7608,0.777152,0.760746,0.759462
2,0.5231,0.475748,0.8401,0.84742,0.840516,0.841004
3,0.3696,0.458342,0.8494,0.852835,0.849749,0.848826
4,0.284,0.352555,0.8854,0.889816,0.88581,0.885255
5,0.2156,0.320966,0.8983,0.900859,0.898307,0.89851
6,0.1505,0.322197,0.9022,0.905942,0.902403,0.902808


[I 2025-03-28 02:06:55,130] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.0005075987029606246, 'weight_decay': 0.009000000000000001, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4154,0.311169,0.8949,0.904549,0.894684,0.895354
2,0.1734,0.238843,0.9257,0.929115,0.925519,0.926227
3,0.1005,0.250297,0.9233,0.924791,0.92333,0.923301


[I 2025-03-28 02:10:37,649] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0002613389170600318, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4205,0.257925,0.9116,0.916183,0.911525,0.91194
2,0.1327,0.247806,0.9227,0.927449,0.922525,0.923619
3,0.0628,0.266089,0.9241,0.926187,0.924064,0.924053
4,0.0344,0.264485,0.9312,0.933134,0.931421,0.931391
5,0.0193,0.254358,0.9373,0.937494,0.937524,0.937179
6,0.0108,0.269958,0.939,0.940394,0.938962,0.939236
7,0.0046,0.31937,0.9329,0.934149,0.933453,0.932212
8,0.0025,0.286006,0.9407,0.943501,0.940802,0.941056
9,0.0012,0.417008,0.9274,0.935357,0.927357,0.928291
10,0.0006,0.258442,0.9452,0.945448,0.945351,0.945111


[I 2025-03-28 02:22:55,372] Trial 54 finished with value: 0.9451110255408812 and parameters: {'learning_rate': 0.0002613389170600318, 'weight_decay': 0.01, 'warmup_steps': 7}. Best is trial 8 with value: 0.9546474828356917.


Trial 55 with params: {'learning_rate': 0.0005906647069821633, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4165,0.298942,0.8992,0.903488,0.898912,0.899443
2,0.19,0.263134,0.915,0.922911,0.915157,0.916084
3,0.113,0.289504,0.9134,0.917144,0.913548,0.913387


[I 2025-03-28 02:26:41,285] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.004913837305728667, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9507,0.826651,0.735,0.763665,0.734478,0.735443
2,0.5654,0.543986,0.8168,0.830128,0.816792,0.818361
3,0.4137,0.4444,0.8539,0.857494,0.853892,0.853442
4,0.3276,0.39487,0.8693,0.872904,0.869767,0.868858
5,0.2502,0.354526,0.8861,0.887918,0.886186,0.88619
6,0.1751,0.350043,0.8884,0.893107,0.888551,0.889262


[I 2025-03-28 02:34:08,286] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0006625255247714729, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4455,0.333446,0.8875,0.897845,0.887219,0.888468
2,0.2013,0.282871,0.907,0.914206,0.907238,0.908033
3,0.1267,0.262506,0.921,0.922543,0.921184,0.920935


[I 2025-03-28 02:37:51,768] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0013407747347825606, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5382,0.467425,0.8417,0.858876,0.841536,0.841145
2,0.3002,0.309059,0.897,0.901243,0.897325,0.897774
3,0.2111,0.303026,0.9057,0.907535,0.905965,0.905568


[I 2025-03-28 02:41:36,882] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.00045965018546025016, 'weight_decay': 0.004, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4094,0.281296,0.9043,0.911206,0.904329,0.904724
2,0.165,0.255804,0.917,0.922499,0.917186,0.917894
3,0.0963,0.233215,0.9287,0.929508,0.928949,0.928481
4,0.0602,0.247275,0.9312,0.933019,0.931555,0.931358
5,0.037,0.276475,0.931,0.933102,0.931516,0.930936
6,0.0208,0.268768,0.9384,0.940473,0.938487,0.938612
7,0.0096,0.268874,0.9413,0.942002,0.941723,0.941079
8,0.0046,0.282993,0.9445,0.947047,0.944877,0.944518
9,0.0018,0.349002,0.9348,0.940587,0.934832,0.935465
10,0.0006,0.231568,0.9501,0.95057,0.950426,0.950064


[I 2025-03-28 02:54:06,294] Trial 59 finished with value: 0.9500638357239997 and parameters: {'learning_rate': 0.00045965018546025016, 'weight_decay': 0.004, 'warmup_steps': 5}. Best is trial 8 with value: 0.9546474828356917.


Trial 60 with params: {'learning_rate': 0.00032329711297850416, 'weight_decay': 0.004, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4183,0.269743,0.9074,0.912619,0.907177,0.907661
2,0.1407,0.243619,0.9262,0.930659,0.926019,0.927261
3,0.0721,0.235738,0.93,0.931327,0.930048,0.929922
4,0.0432,0.251678,0.9328,0.933885,0.933131,0.932731
5,0.0254,0.249753,0.938,0.938626,0.938199,0.938021
6,0.0135,0.270566,0.9382,0.939904,0.938182,0.938479


[I 2025-03-28 03:01:32,033] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.00038663626605069546, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4213,0.267042,0.909,0.9147,0.9089,0.909336
2,0.1498,0.244862,0.9241,0.928914,0.92393,0.925228
3,0.0829,0.263116,0.9238,0.925635,0.923973,0.923742
4,0.0484,0.241664,0.9351,0.935554,0.935484,0.935019
5,0.029,0.223617,0.94,0.940374,0.940191,0.939975
6,0.0169,0.245843,0.9411,0.942762,0.941152,0.941469
7,0.0085,0.345376,0.9301,0.932083,0.93056,0.929423
8,0.0034,0.268267,0.9466,0.948283,0.94677,0.946831
9,0.0015,0.359282,0.9344,0.939725,0.93451,0.934881
10,0.0007,0.253191,0.9478,0.94824,0.948052,0.947716


[I 2025-03-28 03:14:02,272] Trial 61 finished with value: 0.9477164992690307 and parameters: {'learning_rate': 0.00038663626605069546, 'weight_decay': 0.01, 'warmup_steps': 10}. Best is trial 8 with value: 0.9546474828356917.


Trial 62 with params: {'learning_rate': 0.000698982088964504, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4314,0.339748,0.8891,0.897608,0.889068,0.889476
2,0.2136,0.303396,0.8982,0.907847,0.898348,0.89929
3,0.1327,0.24552,0.9247,0.925598,0.924834,0.924697
4,0.0843,0.256577,0.9252,0.928207,0.925506,0.925366
5,0.0528,0.253206,0.9318,0.932783,0.932088,0.93179
6,0.0318,0.254586,0.9349,0.936884,0.934905,0.935195


[I 2025-03-28 03:21:27,895] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.00042548008864523183, 'weight_decay': 0.002, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4113,0.293761,0.9009,0.906808,0.900782,0.901031
2,0.1571,0.269556,0.9142,0.922255,0.914039,0.915851
3,0.0856,0.238396,0.9282,0.92947,0.928296,0.928137
4,0.053,0.209885,0.9399,0.941501,0.940044,0.940217
5,0.0335,0.252027,0.938,0.939043,0.938214,0.93799
6,0.0171,0.280962,0.9354,0.937535,0.935403,0.935822


[I 2025-03-28 03:28:53,224] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.000532191392845721, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3984,0.298725,0.8956,0.902155,0.895672,0.896075
2,0.1789,0.294724,0.9088,0.917703,0.908961,0.910241
3,0.1042,0.270499,0.9199,0.921699,0.919848,0.919974


[I 2025-03-28 03:32:35,849] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0002186572661415552, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.412,0.257454,0.9103,0.915224,0.910161,0.910645
2,0.1226,0.229788,0.9246,0.928383,0.924413,0.925343
3,0.055,0.237747,0.9296,0.931484,0.929545,0.92977
4,0.0284,0.232921,0.9377,0.938417,0.937741,0.937886
5,0.0147,0.259381,0.9378,0.938255,0.93798,0.937805
6,0.0087,0.261216,0.9394,0.940989,0.93932,0.939579
7,0.004,0.312472,0.9332,0.935358,0.933617,0.93305
8,0.0019,0.28667,0.9401,0.943666,0.940282,0.94042
9,0.0009,0.391599,0.9285,0.935241,0.928598,0.928957
10,0.0007,0.266663,0.9444,0.945286,0.944525,0.944392


[I 2025-03-28 03:45:05,944] Trial 65 finished with value: 0.944392200302943 and parameters: {'learning_rate': 0.0002186572661415552, 'weight_decay': 0.004, 'warmup_steps': 1}. Best is trial 8 with value: 0.9546474828356917.


Trial 66 with params: {'learning_rate': 0.0004720110031406646, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4092,0.281476,0.9041,0.908891,0.904057,0.904442
2,0.1662,0.259336,0.9185,0.925309,0.918435,0.91995
3,0.0953,0.246199,0.9254,0.927373,0.925451,0.925557
4,0.0592,0.258483,0.9277,0.929188,0.927955,0.927745
5,0.0362,0.241818,0.9389,0.939463,0.939092,0.938785
6,0.02,0.232437,0.9428,0.943829,0.942879,0.942942
7,0.0105,0.275002,0.9373,0.938245,0.93777,0.936964
8,0.0043,0.25531,0.9475,0.949758,0.947653,0.947659
9,0.0016,0.351905,0.9357,0.941204,0.935814,0.935974
10,0.0006,0.221041,0.9503,0.950566,0.950596,0.950217


[I 2025-03-28 03:57:34,607] Trial 66 finished with value: 0.950216503178407 and parameters: {'learning_rate': 0.0004720110031406646, 'weight_decay': 0.01, 'warmup_steps': 5}. Best is trial 8 with value: 0.9546474828356917.


Trial 67 with params: {'learning_rate': 0.0004510800085055002, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4041,0.305849,0.8971,0.907321,0.897108,0.897978
2,0.1608,0.255371,0.9188,0.924329,0.918754,0.919982
3,0.0923,0.248759,0.9229,0.924955,0.923045,0.923031
4,0.0576,0.229292,0.9333,0.93444,0.9335,0.933247
5,0.0338,0.236323,0.9399,0.940187,0.940132,0.939812
6,0.0199,0.236912,0.9453,0.946508,0.945389,0.94556
7,0.0108,0.328328,0.9297,0.931806,0.930282,0.929154
8,0.0051,0.242911,0.9482,0.949493,0.948437,0.948315
9,0.0015,0.345735,0.9341,0.940327,0.934174,0.934569
10,0.0006,0.228079,0.952,0.952223,0.952178,0.952006


[I 2025-03-28 04:10:00,855] Trial 67 finished with value: 0.9520060538939962 and parameters: {'learning_rate': 0.0004510800085055002, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 8 with value: 0.9546474828356917.


Trial 68 with params: {'learning_rate': 0.0003068727358667909, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3882,0.26739,0.9098,0.916787,0.909597,0.910524
2,0.1332,0.218852,0.9271,0.929877,0.927088,0.927832
3,0.0671,0.247983,0.927,0.929053,0.92707,0.927002
4,0.0407,0.265548,0.9292,0.93154,0.929463,0.9295
5,0.0218,0.249144,0.9393,0.939961,0.939455,0.939284
6,0.0107,0.239628,0.945,0.946209,0.944931,0.945131
7,0.0041,0.317581,0.934,0.936326,0.93448,0.933929
8,0.0026,0.280379,0.945,0.947904,0.945083,0.945363
9,0.0008,0.369625,0.9288,0.935028,0.928887,0.929258
10,0.0005,0.257877,0.9463,0.946908,0.946546,0.946265


[I 2025-03-28 04:22:23,593] Trial 68 finished with value: 0.9462654433887879 and parameters: {'learning_rate': 0.0003068727358667909, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 69 with params: {'learning_rate': 0.0002840585395951591, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.42,0.277043,0.9044,0.912933,0.904024,0.904849
2,0.1344,0.22505,0.9274,0.930902,0.927446,0.928096
3,0.0651,0.249193,0.928,0.930049,0.928085,0.927954
4,0.0377,0.248798,0.935,0.936656,0.93519,0.935197
5,0.0235,0.255885,0.9375,0.938149,0.937789,0.937385
6,0.0127,0.277772,0.9407,0.943218,0.940612,0.941142
7,0.0046,0.293507,0.9382,0.938868,0.938682,0.937814
8,0.0026,0.259312,0.9457,0.947662,0.945953,0.945901
9,0.0013,0.339438,0.9365,0.940759,0.936677,0.936931
10,0.0007,0.258334,0.948,0.948517,0.948236,0.947935


[I 2025-03-28 04:34:49,788] Trial 69 finished with value: 0.9479349928273019 and parameters: {'learning_rate': 0.0002840585395951591, 'weight_decay': 0.007, 'warmup_steps': 8}. Best is trial 8 with value: 0.9546474828356917.


Trial 70 with params: {'learning_rate': 0.00010398848472142685, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5409,0.270998,0.9094,0.913324,0.90946,0.909474
2,0.1531,0.246886,0.9176,0.921817,0.917465,0.91835
3,0.0668,0.228465,0.926,0.9273,0.926046,0.926084
4,0.0258,0.25153,0.9268,0.927998,0.926972,0.926887
5,0.0111,0.267533,0.9333,0.934101,0.933555,0.933302
6,0.0057,0.275783,0.9336,0.936015,0.933435,0.93395


[I 2025-03-28 04:42:18,182] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0005677759712789497, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4131,0.304843,0.8983,0.903375,0.898177,0.898474
2,0.1848,0.367049,0.8924,0.907638,0.892379,0.894539
3,0.1124,0.269308,0.9233,0.92481,0.923339,0.923283
4,0.0694,0.234625,0.9306,0.933048,0.930855,0.93086
5,0.0443,0.245134,0.9326,0.932687,0.932904,0.932092
6,0.0255,0.282764,0.9343,0.937532,0.934229,0.934835


[I 2025-03-28 04:49:42,273] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0004537454948759931, 'weight_decay': 0.01, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4059,0.26654,0.9073,0.912175,0.907328,0.90746
2,0.161,0.260568,0.9142,0.920995,0.914077,0.915589
3,0.0924,0.24866,0.9245,0.926009,0.924506,0.924494


[I 2025-03-28 04:53:23,566] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 5.953168512495511e-05, 'weight_decay': 0.01, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7508,0.32935,0.8851,0.889089,0.885228,0.884958
2,0.2155,0.263539,0.9112,0.916138,0.910959,0.912179
3,0.1162,0.24898,0.915,0.917063,0.914974,0.915085
4,0.0593,0.240266,0.922,0.923435,0.92212,0.922042
5,0.0297,0.26112,0.9214,0.922723,0.921733,0.92133
6,0.0147,0.26996,0.9251,0.928173,0.92504,0.925549


[I 2025-03-28 05:00:49,099] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0016326330184682103, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5848,0.470254,0.8387,0.854075,0.838162,0.838888
2,0.3347,0.377669,0.873,0.882232,0.87356,0.872735
3,0.2394,0.338,0.8896,0.892825,0.889614,0.889134


[I 2025-03-28 05:04:34,543] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0007349383673701311, 'weight_decay': 0.007, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4386,0.32793,0.8907,0.897796,0.890367,0.89111
2,0.218,0.304751,0.904,0.912028,0.904012,0.90494
3,0.1388,0.298848,0.909,0.911576,0.90905,0.908651
4,0.0894,0.236248,0.929,0.930355,0.929299,0.929092
5,0.0586,0.212486,0.9385,0.938934,0.938757,0.938297
6,0.0326,0.242519,0.9369,0.938586,0.936848,0.93719


[I 2025-03-28 05:12:04,443] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.0002955686606542878, 'weight_decay': 0.01, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4055,0.271481,0.9097,0.916286,0.909714,0.910379
2,0.129,0.235715,0.926,0.929465,0.926052,0.926799
3,0.0656,0.240253,0.9313,0.932412,0.931422,0.931168
4,0.0386,0.248689,0.9348,0.936172,0.935108,0.934877
5,0.0206,0.254936,0.9366,0.937349,0.936832,0.93643
6,0.0133,0.244298,0.9407,0.942453,0.940628,0.940936
7,0.0055,0.296082,0.9375,0.938918,0.937875,0.93722
8,0.003,0.253946,0.9472,0.949458,0.947347,0.947426
9,0.0008,0.343117,0.9339,0.939474,0.933975,0.934538
10,0.0006,0.241428,0.9488,0.949203,0.948991,0.94876


[I 2025-03-28 05:24:31,762] Trial 76 finished with value: 0.9487603354263638 and parameters: {'learning_rate': 0.0002955686606542878, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 8 with value: 0.9546474828356917.


Trial 77 with params: {'learning_rate': 0.0005663297775827886, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4425,0.320884,0.8927,0.900823,0.892616,0.893134
2,0.1866,0.281384,0.9085,0.917198,0.908501,0.910281
3,0.1114,0.263621,0.9209,0.922309,0.921048,0.920793
4,0.0719,0.251176,0.927,0.929094,0.927308,0.927039
5,0.0424,0.244535,0.9333,0.934588,0.933455,0.933422
6,0.0244,0.262195,0.9372,0.940323,0.937073,0.937673


[I 2025-03-28 05:31:58,779] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.000818924692261964, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4613,0.392593,0.8654,0.88012,0.865005,0.86561
2,0.2327,0.331052,0.8977,0.909903,0.897746,0.89914
3,0.1474,0.254241,0.9199,0.921235,0.920239,0.919799
4,0.0974,0.223133,0.9318,0.933372,0.932076,0.931942
5,0.0617,0.221527,0.9366,0.937259,0.936785,0.936824
6,0.0359,0.250953,0.9347,0.935936,0.934869,0.93489


[I 2025-03-28 05:39:22,316] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0014880457110926018, 'weight_decay': 0.004, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5696,0.509152,0.8309,0.849455,0.830466,0.830521
2,0.3236,0.335149,0.8905,0.898551,0.890716,0.891909
3,0.2256,0.307982,0.9007,0.903997,0.900955,0.900804


[I 2025-03-28 05:43:03,070] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0005072281775179684, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4081,0.314271,0.8947,0.903024,0.894522,0.895304
2,0.1742,0.277089,0.9117,0.919213,0.91164,0.913227
3,0.0989,0.250868,0.9256,0.926557,0.925703,0.925625
4,0.0634,0.255965,0.9305,0.932602,0.930607,0.930516
5,0.0391,0.263521,0.9316,0.933536,0.93157,0.931417
6,0.0231,0.249455,0.9425,0.943745,0.942657,0.942758
7,0.0111,0.308719,0.9344,0.935817,0.934939,0.933853
8,0.0042,0.253796,0.9477,0.948539,0.94801,0.947754
9,0.0015,0.344046,0.9349,0.940778,0.934933,0.935515
10,0.0007,0.239981,0.9489,0.94965,0.949125,0.949059


[I 2025-03-28 05:55:25,001] Trial 80 finished with value: 0.9490591790909921 and parameters: {'learning_rate': 0.0005072281775179684, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 8 with value: 0.9546474828356917.


Trial 81 with params: {'learning_rate': 0.0003300145821096509, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4157,0.275497,0.9025,0.90929,0.902469,0.902962
2,0.1373,0.228801,0.9273,0.931052,0.927195,0.928302
3,0.0707,0.224967,0.9347,0.935954,0.934623,0.934778
4,0.0423,0.226377,0.9396,0.940193,0.939847,0.939585
5,0.0229,0.253911,0.9355,0.936498,0.935765,0.935553
6,0.0143,0.249152,0.9421,0.94392,0.942138,0.94249
7,0.0068,0.309039,0.9334,0.935317,0.933991,0.932847
8,0.0029,0.273853,0.9462,0.948307,0.946333,0.946446
9,0.0014,0.35434,0.9364,0.941388,0.936467,0.936968
10,0.0005,0.246918,0.9468,0.947561,0.947011,0.946783


[I 2025-03-28 06:07:50,536] Trial 81 finished with value: 0.9467832340946073 and parameters: {'learning_rate': 0.0003300145821096509, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8}. Best is trial 8 with value: 0.9546474828356917.


Trial 82 with params: {'learning_rate': 0.0012315261197628753, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5571,0.441436,0.8531,0.86754,0.8531,0.853326
2,0.2942,0.342661,0.8876,0.896864,0.888054,0.888226
3,0.1993,0.276045,0.9085,0.909613,0.908728,0.908343
4,0.1343,0.267157,0.916,0.919068,0.916382,0.916111
5,0.0886,0.288902,0.9189,0.921639,0.919113,0.918996
6,0.0519,0.245869,0.9345,0.935421,0.934588,0.934664


[I 2025-03-28 06:15:16,408] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0005994336145534409, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4357,0.319818,0.8912,0.89951,0.891067,0.891839
2,0.1905,0.3134,0.903,0.916833,0.903059,0.905463
3,0.1166,0.25042,0.9239,0.925186,0.924128,0.923835


[I 2025-03-28 06:18:56,831] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.0007635396886371493, 'weight_decay': 0.004, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4602,0.344316,0.8842,0.894205,0.883819,0.884976
2,0.2221,0.309274,0.9033,0.912927,0.903357,0.905133
3,0.1432,0.272238,0.9167,0.918539,0.917041,0.916576
4,0.0906,0.266989,0.9233,0.926877,0.923364,0.923733
5,0.0576,0.239244,0.9345,0.935433,0.934758,0.934792
6,0.0348,0.262714,0.9345,0.936858,0.93453,0.934914


[I 2025-03-28 06:26:16,125] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.00047971575176549596, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4184,0.301816,0.8961,0.906376,0.896172,0.897108
2,0.1715,0.259747,0.9184,0.923479,0.918367,0.919435
3,0.0977,0.221357,0.9345,0.935077,0.934595,0.934388
4,0.0587,0.211627,0.9387,0.93905,0.938962,0.938679
5,0.0338,0.26114,0.9362,0.936683,0.936461,0.935935
6,0.0232,0.231946,0.9437,0.945184,0.943822,0.944012
7,0.0105,0.305204,0.9352,0.936626,0.935672,0.934896
8,0.0039,0.257058,0.9478,0.949038,0.947877,0.947872
9,0.0013,0.400356,0.9279,0.934786,0.927989,0.928474
10,0.0007,0.249514,0.9483,0.948825,0.948445,0.948287


[I 2025-03-28 06:38:43,723] Trial 85 finished with value: 0.9482872091726524 and parameters: {'learning_rate': 0.00047971575176549596, 'weight_decay': 0.007, 'warmup_steps': 9}. Best is trial 8 with value: 0.9546474828356917.


Trial 86 with params: {'learning_rate': 0.0002956732090215189, 'weight_decay': 0.008, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.403,0.280718,0.9048,0.912152,0.904677,0.905176
2,0.1326,0.241247,0.9221,0.926905,0.922019,0.923258
3,0.0647,0.259542,0.9286,0.930527,0.928577,0.928791
4,0.0356,0.232718,0.9398,0.940428,0.93994,0.939871
5,0.0229,0.258197,0.9362,0.937668,0.936295,0.936191
6,0.0137,0.254934,0.9413,0.942721,0.941355,0.941561
7,0.0051,0.298849,0.9348,0.936439,0.935224,0.934452
8,0.0028,0.27895,0.9439,0.947236,0.944049,0.944203
9,0.001,0.385317,0.9282,0.935416,0.928385,0.928658
10,0.0007,0.246853,0.9469,0.947258,0.947152,0.946781


[I 2025-03-28 06:51:08,714] Trial 86 finished with value: 0.9467805757151316 and parameters: {'learning_rate': 0.0002956732090215189, 'weight_decay': 0.008, 'warmup_steps': 3}. Best is trial 8 with value: 0.9546474828356917.


Trial 87 with params: {'learning_rate': 0.00011687896909748753, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5002,0.260287,0.9119,0.915123,0.911961,0.911974
2,0.1427,0.236223,0.9226,0.926118,0.922458,0.923214
3,0.0622,0.229446,0.9258,0.927067,0.925823,0.925828
4,0.0248,0.248525,0.9314,0.932243,0.931537,0.9314
5,0.0107,0.27407,0.9291,0.930744,0.9293,0.929
6,0.006,0.259064,0.9342,0.935867,0.934179,0.934456


[I 2025-03-28 06:58:36,373] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0005470236184570548, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4281,0.308587,0.8968,0.904895,0.89698,0.89731
2,0.1796,0.281284,0.9128,0.920413,0.912852,0.914449
3,0.1071,0.249512,0.9261,0.927333,0.926238,0.926007
4,0.0679,0.240382,0.9319,0.933393,0.932089,0.93201
5,0.0433,0.242369,0.937,0.9382,0.937245,0.936787
6,0.0247,0.259291,0.9396,0.941523,0.939546,0.939882


[I 2025-03-28 07:05:59,782] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.00034363408190808676, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3867,0.273094,0.9083,0.914244,0.908037,0.90862
2,0.1409,0.240842,0.9217,0.925363,0.92179,0.922698
3,0.073,0.243945,0.9278,0.929305,0.927959,0.927839
4,0.0455,0.248079,0.934,0.936084,0.934089,0.934263
5,0.0266,0.259326,0.9356,0.937392,0.935903,0.935275
6,0.0138,0.257952,0.9389,0.940633,0.938852,0.939181
7,0.0056,0.306761,0.9358,0.936974,0.936274,0.935348
8,0.0031,0.276829,0.9436,0.945055,0.943925,0.94374
9,0.0014,0.350287,0.9355,0.940426,0.935674,0.935835
10,0.0006,0.244915,0.9487,0.948955,0.948906,0.948648


[I 2025-03-28 07:18:15,374] Trial 89 finished with value: 0.9486477248993561 and parameters: {'learning_rate': 0.00034363408190808676, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 90 with params: {'learning_rate': 0.0005524120019078961, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4283,0.317581,0.8912,0.896553,0.891454,0.89132
2,0.1856,0.258377,0.9162,0.922275,0.916095,0.91769
3,0.1105,0.260323,0.9204,0.922613,0.920513,0.920377
4,0.0685,0.233739,0.9309,0.932324,0.931092,0.931119
5,0.0428,0.25742,0.933,0.933902,0.933162,0.932999
6,0.0221,0.238565,0.9408,0.941892,0.940934,0.941088
7,0.0138,0.280079,0.9391,0.939814,0.939527,0.938817
8,0.0054,0.24603,0.9489,0.950006,0.949164,0.949019
9,0.0017,0.342865,0.9342,0.939153,0.934371,0.934465
10,0.0007,0.230816,0.9512,0.951223,0.951487,0.951205


[I 2025-03-28 07:30:40,296] Trial 90 finished with value: 0.9512051775600648 and parameters: {'learning_rate': 0.0005524120019078961, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 8 with value: 0.9546474828356917.


Trial 91 with params: {'learning_rate': 0.0008226731444616614, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.458,0.359599,0.8776,0.888291,0.877282,0.87769
2,0.228,0.305534,0.9003,0.911365,0.900381,0.902006
3,0.1457,0.263335,0.9191,0.920453,0.919044,0.919072


[I 2025-03-28 07:34:22,960] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.00031401753697238267, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4223,0.257505,0.9116,0.917204,0.911441,0.912175
2,0.139,0.253284,0.9194,0.92551,0.919177,0.920924
3,0.0687,0.25406,0.9277,0.929578,0.927811,0.927859
4,0.0418,0.24588,0.9383,0.938949,0.938598,0.938236
5,0.0245,0.26666,0.9359,0.936955,0.936125,0.935659
6,0.0127,0.276979,0.9365,0.938519,0.936514,0.936711


[I 2025-03-28 07:41:48,778] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.00047315886562439855, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4291,0.310177,0.8949,0.903653,0.894843,0.895403
2,0.1685,0.252009,0.9173,0.924436,0.917217,0.918743
3,0.0986,0.230499,0.9294,0.930737,0.929624,0.929443
4,0.0613,0.235668,0.9323,0.933672,0.932609,0.93252
5,0.0378,0.264229,0.9313,0.932822,0.931643,0.931382
6,0.0218,0.266071,0.9355,0.937936,0.935704,0.935738


[I 2025-03-28 07:49:15,229] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0005416619637952765, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4178,0.297272,0.8968,0.905075,0.896679,0.897585
2,0.1817,0.250517,0.9195,0.924594,0.91947,0.920692
3,0.1053,0.248095,0.924,0.92539,0.924169,0.923842
4,0.0671,0.249291,0.9301,0.932258,0.930478,0.930197
5,0.0398,0.232025,0.9407,0.941513,0.940823,0.940672
6,0.0227,0.265065,0.9393,0.941157,0.939312,0.939716
7,0.0128,0.280757,0.9388,0.939646,0.939177,0.938587
8,0.0059,0.246322,0.946,0.946762,0.946244,0.946155
9,0.0026,0.351212,0.9345,0.939812,0.934645,0.935083
10,0.0008,0.237209,0.9495,0.949911,0.949669,0.949581


[I 2025-03-28 08:02:37,654] Trial 94 finished with value: 0.949581293290626 and parameters: {'learning_rate': 0.0005416619637952765, 'weight_decay': 0.008, 'warmup_steps': 5}. Best is trial 8 with value: 0.9546474828356917.


Trial 95 with params: {'learning_rate': 0.00044936256641181483, 'weight_decay': 0.005, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4032,0.315327,0.8961,0.906776,0.895983,0.896496
2,0.1585,0.252405,0.9239,0.928652,0.923781,0.92489
3,0.0943,0.250442,0.928,0.929455,0.928191,0.928011
4,0.0585,0.244489,0.9343,0.935987,0.934616,0.934415
5,0.0344,0.261075,0.9325,0.934199,0.932654,0.932555
6,0.0192,0.287639,0.9355,0.938711,0.935648,0.935935


[I 2025-03-28 08:10:03,715] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.0004121826156730455, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4098,0.281157,0.9042,0.909804,0.904144,0.90477
2,0.155,0.277061,0.9131,0.920435,0.912857,0.914857
3,0.0881,0.268314,0.92,0.921921,0.920116,0.919888
4,0.0544,0.243045,0.9347,0.936938,0.934892,0.93509
5,0.0286,0.256826,0.9356,0.9365,0.935597,0.935539
6,0.0189,0.264034,0.9409,0.942668,0.940818,0.941114
7,0.0077,0.319964,0.9335,0.934626,0.933964,0.933034
8,0.0049,0.296819,0.9431,0.946221,0.943395,0.943347
9,0.0016,0.35555,0.9319,0.937668,0.932114,0.932296
10,0.0008,0.238782,0.9498,0.950175,0.949992,0.949714


[I 2025-03-28 08:22:27,079] Trial 96 finished with value: 0.9497139427593508 and parameters: {'learning_rate': 0.0004121826156730455, 'weight_decay': 0.008, 'warmup_steps': 5}. Best is trial 8 with value: 0.9546474828356917.


Trial 97 with params: {'learning_rate': 0.0005398797188270255, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3982,0.283404,0.9062,0.912645,0.906263,0.906834
2,0.1761,0.260688,0.9206,0.925743,0.920646,0.921655
3,0.1041,0.259049,0.924,0.925859,0.924193,0.924073


[I 2025-03-28 08:26:08,577] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.00029436420127103367, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4162,0.260541,0.9123,0.918238,0.912179,0.912735
2,0.1314,0.245668,0.9222,0.92766,0.922187,0.923317
3,0.0683,0.224886,0.9334,0.934085,0.933593,0.933339
4,0.0372,0.241034,0.9362,0.937514,0.936357,0.936427
5,0.0236,0.245157,0.9393,0.939895,0.939474,0.939027
6,0.0118,0.268342,0.9399,0.942078,0.939626,0.940146


[I 2025-03-28 08:33:38,456] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0002482043130426969, 'weight_decay': 0.007, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4123,0.261268,0.9107,0.916295,0.910342,0.911036
2,0.1263,0.251256,0.9221,0.927424,0.922117,0.923115
3,0.0588,0.252214,0.925,0.928024,0.924827,0.925331
4,0.0338,0.244984,0.9332,0.934396,0.933311,0.933175
5,0.0187,0.257968,0.9377,0.938853,0.937873,0.9376
6,0.0091,0.278547,0.9374,0.939651,0.93751,0.937714


[I 2025-03-28 08:41:06,703] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0009034285520172361, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4576,0.355535,0.8781,0.885527,0.877853,0.878684
2,0.2411,0.269603,0.9117,0.917427,0.911862,0.912596
3,0.1578,0.254759,0.9185,0.919989,0.918586,0.918469
4,0.1034,0.247607,0.9269,0.929954,0.927151,0.92734
5,0.0669,0.24073,0.9309,0.931871,0.931011,0.930897
6,0.0377,0.269524,0.9317,0.93334,0.931897,0.931919


[I 2025-03-28 08:48:33,295] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0005674904090383423, 'weight_decay': 0.008, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4224,0.278056,0.9052,0.90814,0.905152,0.905385
2,0.1878,0.310234,0.9046,0.91794,0.90441,0.906723
3,0.1109,0.253524,0.9221,0.924017,0.922167,0.922019


[I 2025-03-28 08:52:15,786] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 7.584519112116413e-05, 'weight_decay': 0.005, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6201,0.297741,0.8982,0.902187,0.898317,0.898205
2,0.181,0.253117,0.9136,0.9182,0.913363,0.914504
3,0.0893,0.239652,0.9228,0.924388,0.922776,0.922856
4,0.0385,0.239468,0.9266,0.927614,0.926791,0.926632
5,0.0179,0.265846,0.928,0.928887,0.928309,0.927874
6,0.0085,0.277483,0.9283,0.931203,0.928244,0.92869


[I 2025-03-28 08:59:38,376] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0006619558328804626, 'weight_decay': 0.007, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.44,0.362324,0.8759,0.891253,0.875695,0.877312
2,0.203,0.37611,0.8865,0.904772,0.886498,0.888881
3,0.1253,0.277315,0.9178,0.919483,0.917889,0.917812


[I 2025-03-28 09:03:23,181] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.0004514231156702277, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3935,0.290301,0.9019,0.907282,0.901775,0.902422
2,0.1623,0.246417,0.9204,0.924233,0.920387,0.921081
3,0.0926,0.240934,0.927,0.929064,0.927124,0.927028
4,0.0565,0.221657,0.94,0.941181,0.940256,0.940119
5,0.0336,0.242315,0.9384,0.938872,0.938643,0.938003
6,0.0223,0.226419,0.943,0.943802,0.943112,0.943181
7,0.0096,0.292338,0.9379,0.939179,0.938373,0.937384
8,0.0047,0.241795,0.9492,0.950495,0.949409,0.949372
9,0.0014,0.405643,0.9241,0.932733,0.924151,0.9246
10,0.001,0.220588,0.9505,0.9512,0.950719,0.95048


[I 2025-03-28 09:15:49,379] Trial 104 finished with value: 0.9504796368825916 and parameters: {'learning_rate': 0.0004514231156702277, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 105 with params: {'learning_rate': 0.0002613516671689113, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3941,0.247946,0.9129,0.917984,0.912778,0.913352
2,0.128,0.232931,0.9263,0.929945,0.926334,0.927204
3,0.063,0.222861,0.9338,0.934962,0.93368,0.933767
4,0.0321,0.234089,0.9361,0.937827,0.936296,0.936255
5,0.0197,0.256773,0.9401,0.94123,0.940258,0.940053
6,0.0105,0.255715,0.9419,0.943781,0.941783,0.942169
7,0.0045,0.304773,0.934,0.93548,0.93448,0.933732
8,0.0025,0.27014,0.9443,0.946171,0.944487,0.944596
9,0.0012,0.396038,0.9271,0.935164,0.92722,0.927672
10,0.0007,0.249403,0.9475,0.947702,0.947749,0.947451


[I 2025-03-28 09:28:14,236] Trial 105 finished with value: 0.9474514404073258 and parameters: {'learning_rate': 0.0002613516671689113, 'weight_decay': 0.008, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 106 with params: {'learning_rate': 0.00040777430073543397, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3854,0.26439,0.9106,0.916225,0.910534,0.911064
2,0.1576,0.217904,0.929,0.931689,0.929035,0.929437
3,0.0878,0.247254,0.9244,0.926253,0.924542,0.924284
4,0.0504,0.234718,0.9356,0.936894,0.93581,0.935471
5,0.0312,0.259489,0.9351,0.936271,0.935319,0.935003
6,0.0192,0.243824,0.9427,0.943934,0.942796,0.942923
7,0.0081,0.297346,0.9351,0.936443,0.935502,0.934727
8,0.0036,0.254413,0.9469,0.948864,0.947107,0.947128
9,0.0017,0.371681,0.9323,0.938771,0.932436,0.932823
10,0.0005,0.232166,0.9527,0.952918,0.952878,0.952655


[I 2025-03-28 09:40:42,418] Trial 106 finished with value: 0.9526547274543983 and parameters: {'learning_rate': 0.00040777430073543397, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 107 with params: {'learning_rate': 0.0009422366688453169, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4709,0.369236,0.8686,0.880419,0.868502,0.868551
2,0.2466,0.365091,0.8791,0.893893,0.879465,0.88091
3,0.1641,0.316954,0.9056,0.90922,0.905841,0.905287


[I 2025-03-28 09:44:26,607] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0003369673052242361, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3857,0.284224,0.9026,0.908463,0.902495,0.902982
2,0.1392,0.205426,0.9343,0.935806,0.934472,0.934755
3,0.0737,0.233159,0.9296,0.931098,0.929789,0.929675
4,0.0411,0.23415,0.9386,0.940137,0.938805,0.938813
5,0.0267,0.256444,0.9393,0.939956,0.939481,0.939237
6,0.0152,0.263938,0.9403,0.942464,0.94015,0.940721
7,0.0045,0.291406,0.939,0.939868,0.939411,0.93873
8,0.0032,0.266042,0.9479,0.950199,0.948149,0.948234
9,0.0014,0.357403,0.9353,0.940782,0.935523,0.935733
10,0.0006,0.252701,0.9477,0.948089,0.947894,0.947622


[I 2025-03-28 09:56:50,140] Trial 108 finished with value: 0.9476220560853014 and parameters: {'learning_rate': 0.0003369673052242361, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 109 with params: {'learning_rate': 0.0003561596816898761, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3878,0.283246,0.9072,0.913704,0.907378,0.907666
2,0.1431,0.22138,0.9297,0.931817,0.929788,0.930264
3,0.0763,0.264029,0.9248,0.927335,0.924917,0.925071
4,0.0415,0.251178,0.9366,0.937736,0.936912,0.936486
5,0.0264,0.240383,0.9386,0.939481,0.938884,0.93845
6,0.0169,0.252042,0.9411,0.942254,0.941291,0.941296
7,0.0084,0.310563,0.9325,0.934609,0.932997,0.932299
8,0.0028,0.270204,0.9475,0.949362,0.94775,0.947764
9,0.0017,0.367933,0.9328,0.938088,0.933061,0.933084
10,0.0006,0.246434,0.9476,0.947937,0.947857,0.947536


[I 2025-03-28 10:09:16,091] Trial 109 finished with value: 0.9475357528228979 and parameters: {'learning_rate': 0.0003561596816898761, 'weight_decay': 0.008, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 110 with params: {'learning_rate': 0.0004080677098091249, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.395,0.294115,0.9023,0.906802,0.902052,0.902552
2,0.1579,0.279363,0.9117,0.920417,0.911455,0.91363
3,0.0872,0.258491,0.921,0.923282,0.921161,0.921008
4,0.0501,0.246273,0.9318,0.93325,0.931945,0.931959
5,0.0283,0.237022,0.942,0.94209,0.942287,0.941837
6,0.0192,0.278452,0.9363,0.938997,0.936372,0.936582


[I 2025-03-28 10:16:43,806] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0006784793877739134, 'weight_decay': 0.008, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4242,0.354179,0.8807,0.887269,0.880398,0.880412
2,0.2088,0.311835,0.902,0.914538,0.90207,0.904097
3,0.1248,0.266512,0.9186,0.92041,0.91895,0.918473


[I 2025-03-28 10:20:27,565] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0003773973444637998, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4021,0.27845,0.9047,0.91028,0.904634,0.905147
2,0.1468,0.233396,0.9247,0.929624,0.924624,0.925796
3,0.0839,0.243522,0.9269,0.929156,0.926913,0.927086
4,0.0492,0.23615,0.9337,0.935332,0.933879,0.93376
5,0.0264,0.249054,0.942,0.942783,0.942212,0.942115
6,0.0147,0.24525,0.9468,0.947817,0.946891,0.947037
7,0.0066,0.303909,0.9362,0.938102,0.936634,0.936068
8,0.0028,0.275689,0.9476,0.950123,0.947805,0.947849
9,0.0015,0.370118,0.9294,0.935709,0.929509,0.930081
10,0.0006,0.243361,0.9493,0.949793,0.949468,0.949359


[I 2025-03-28 10:32:45,768] Trial 112 finished with value: 0.9493590579209494 and parameters: {'learning_rate': 0.0003773973444637998, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 8 with value: 0.9546474828356917.


Trial 113 with params: {'learning_rate': 0.000765509061297872, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4478,0.361318,0.877,0.886406,0.876756,0.877655
2,0.2239,0.282151,0.9064,0.912966,0.906509,0.907798
3,0.1407,0.273405,0.9152,0.916667,0.915472,0.914932


[I 2025-03-28 10:36:29,548] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.00022439378401905357, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4249,0.252593,0.9116,0.916225,0.911318,0.911808
2,0.1238,0.246024,0.9216,0.926515,0.921546,0.922786
3,0.0606,0.255803,0.9267,0.928613,0.926647,0.926682
4,0.0285,0.24144,0.9358,0.936538,0.935929,0.93592
5,0.017,0.258933,0.938,0.938765,0.938225,0.937886
6,0.0089,0.274109,0.9377,0.93925,0.937759,0.937988


[I 2025-03-28 10:44:01,109] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.00048124028952622035, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4085,0.300917,0.8982,0.90519,0.89807,0.898649
2,0.171,0.291032,0.9083,0.916721,0.908355,0.909931
3,0.0983,0.227824,0.9306,0.931861,0.930724,0.930395
4,0.0601,0.224077,0.9362,0.937055,0.936401,0.936069
5,0.0355,0.24663,0.9388,0.939849,0.938854,0.938777
6,0.0209,0.256776,0.9396,0.941423,0.939507,0.939777


[I 2025-03-28 10:51:29,767] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.00042678786196532727, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3903,0.253285,0.9139,0.917987,0.913801,0.914353
2,0.1589,0.223884,0.9252,0.929191,0.925121,0.926142
3,0.089,0.243178,0.9279,0.929084,0.928064,0.927763
4,0.0531,0.256714,0.9321,0.934193,0.932313,0.932134
5,0.0356,0.251217,0.9345,0.935715,0.934798,0.934373
6,0.0185,0.25842,0.9412,0.942604,0.941221,0.941352
7,0.0094,0.281018,0.9407,0.941347,0.941145,0.940292
8,0.004,0.251676,0.9489,0.950382,0.949156,0.949028
9,0.0016,0.335423,0.9367,0.942044,0.936745,0.937254
10,0.0006,0.229756,0.9504,0.950975,0.950623,0.950395


[I 2025-03-28 11:03:58,906] Trial 116 finished with value: 0.9503946180149893 and parameters: {'learning_rate': 0.00042678786196532727, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 117 with params: {'learning_rate': 0.00012486032116326294, 'weight_decay': 0.004, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5584,0.265491,0.9087,0.912984,0.908691,0.908679
2,0.143,0.234425,0.9244,0.927199,0.924381,0.924896
3,0.0595,0.236417,0.927,0.928943,0.926981,0.927085
4,0.0231,0.243118,0.9359,0.936858,0.936089,0.935992
5,0.0102,0.283238,0.9319,0.933149,0.932156,0.931894
6,0.0054,0.276929,0.9371,0.939135,0.937085,0.937432


[I 2025-03-28 11:11:26,698] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.00048560476558865383, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3957,0.289496,0.907,0.91329,0.906634,0.907622
2,0.1713,0.270868,0.9177,0.925396,0.917606,0.919497
3,0.1017,0.230446,0.9288,0.930692,0.928809,0.929037
4,0.0638,0.231143,0.9341,0.936265,0.934296,0.934338
5,0.0379,0.225619,0.94,0.940627,0.940218,0.939868
6,0.0196,0.236173,0.9416,0.942708,0.941732,0.941884
7,0.0107,0.318352,0.9328,0.934665,0.933348,0.932331
8,0.0054,0.252729,0.9466,0.948289,0.946967,0.94675
9,0.0017,0.358522,0.9335,0.939793,0.933566,0.933911
10,0.0008,0.217416,0.9527,0.953099,0.952859,0.952689


[I 2025-03-28 11:23:51,746] Trial 118 finished with value: 0.9526890591442477 and parameters: {'learning_rate': 0.00048560476558865383, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 119 with params: {'learning_rate': 0.00032920314969963775, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3876,0.269757,0.9058,0.909941,0.905765,0.906299
2,0.1371,0.227334,0.9292,0.931638,0.929268,0.929659
3,0.0709,0.226704,0.9343,0.935395,0.934336,0.934426
4,0.042,0.271209,0.9307,0.932211,0.931011,0.930623
5,0.027,0.259127,0.9377,0.938878,0.937928,0.937705
6,0.0133,0.249798,0.9431,0.944239,0.943097,0.943313
7,0.0063,0.283106,0.9386,0.939494,0.939057,0.938303
8,0.0031,0.251162,0.9467,0.948252,0.946952,0.946949
9,0.0009,0.345345,0.9346,0.940083,0.934653,0.935095
10,0.0005,0.237961,0.9484,0.948785,0.948691,0.948336


[I 2025-03-28 11:36:15,766] Trial 119 finished with value: 0.9483362935379673 and parameters: {'learning_rate': 0.00032920314969963775, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 120 with params: {'learning_rate': 0.00033445029397652734, 'weight_decay': 0.006, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4714,0.280221,0.9058,0.914915,0.905861,0.906307
2,0.1468,0.234411,0.9245,0.927857,0.924453,0.925197
3,0.0774,0.235777,0.9303,0.931788,0.930343,0.930244
4,0.0438,0.253087,0.9361,0.938159,0.936455,0.936141
5,0.0281,0.240334,0.9398,0.9406,0.939984,0.939705
6,0.0146,0.263726,0.9394,0.941727,0.939303,0.939718


[I 2025-03-28 11:43:43,270] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.0007830430907999397, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.443,0.370083,0.8704,0.883978,0.870083,0.870442
2,0.2211,0.301167,0.9049,0.91284,0.904972,0.906362
3,0.1423,0.264035,0.9186,0.920288,0.918687,0.918468
4,0.0935,0.255329,0.9223,0.9252,0.922405,0.922339
5,0.0592,0.262389,0.9304,0.931339,0.930745,0.930372
6,0.0342,0.290195,0.9295,0.932477,0.929508,0.92995


[I 2025-03-28 11:51:06,524] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0006147559095289613, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4122,0.30962,0.8924,0.899723,0.89244,0.892809
2,0.1958,0.268804,0.9131,0.920212,0.913305,0.914266
3,0.1171,0.284213,0.9123,0.914246,0.912359,0.912112


[I 2025-03-28 11:54:50,566] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0005311042791442586, 'weight_decay': 0.002, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4677,0.331657,0.8865,0.894721,0.886475,0.886803
2,0.1843,0.289869,0.9102,0.917774,0.910258,0.911776
3,0.1085,0.237648,0.9266,0.927313,0.926778,0.926345
4,0.0658,0.234566,0.9323,0.933417,0.932608,0.932257
5,0.0421,0.25225,0.9334,0.934253,0.933557,0.933277
6,0.0253,0.254328,0.9368,0.938555,0.936838,0.937026


[I 2025-03-28 12:02:28,586] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0004349948597964992, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3968,0.289329,0.9004,0.906951,0.900467,0.900504
2,0.1596,0.264587,0.9169,0.924133,0.917059,0.918042
3,0.0887,0.229851,0.9289,0.930131,0.928956,0.928944
4,0.052,0.234396,0.9353,0.936688,0.935618,0.935332
5,0.0319,0.233206,0.9408,0.941293,0.941126,0.940652
6,0.0193,0.244968,0.9432,0.944271,0.943349,0.943426
7,0.0072,0.311218,0.9351,0.936536,0.935554,0.934702
8,0.0051,0.263977,0.9482,0.949838,0.948535,0.948259
9,0.0014,0.344145,0.935,0.940685,0.935062,0.935508
10,0.0007,0.227239,0.9508,0.950925,0.95106,0.950724


[I 2025-03-28 12:15:09,754] Trial 124 finished with value: 0.9507241677058257 and parameters: {'learning_rate': 0.0004349948597964992, 'weight_decay': 0.01, 'warmup_steps': 1}. Best is trial 8 with value: 0.9546474828356917.


Trial 125 with params: {'learning_rate': 0.00047912077976957495, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3925,0.292447,0.9003,0.908556,0.900087,0.901087
2,0.1692,0.257748,0.9158,0.923202,0.915802,0.91726
3,0.096,0.239203,0.9253,0.92631,0.925438,0.925194
4,0.0569,0.2558,0.9292,0.931146,0.929458,0.929268
5,0.0361,0.231817,0.9399,0.940624,0.940107,0.939927
6,0.0215,0.242902,0.9409,0.942077,0.940894,0.941008
7,0.0093,0.318952,0.9336,0.935219,0.934126,0.933232
8,0.0047,0.274294,0.9456,0.947691,0.946011,0.945479
9,0.0019,0.358313,0.9339,0.939485,0.934009,0.934395
10,0.0008,0.230244,0.9512,0.95174,0.951423,0.951248


[I 2025-03-28 12:27:45,190] Trial 125 finished with value: 0.9512480842376982 and parameters: {'learning_rate': 0.00047912077976957495, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 126 with params: {'learning_rate': 0.0004886610399796205, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3966,0.306646,0.8972,0.905551,0.89699,0.89797
2,0.1688,0.274624,0.9142,0.921681,0.914192,0.915779
3,0.1002,0.235626,0.9271,0.928499,0.927374,0.926964
4,0.0604,0.248955,0.932,0.933962,0.93227,0.932192
5,0.0352,0.257167,0.9333,0.934905,0.933704,0.933109
6,0.0226,0.252788,0.9376,0.93945,0.937815,0.937883


[I 2025-03-28 12:35:21,795] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.001066035230717899, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4999,0.389535,0.8679,0.88129,0.867608,0.867987
2,0.2677,0.349695,0.8838,0.899808,0.88402,0.886048
3,0.1814,0.286683,0.9083,0.910343,0.908427,0.908286
4,0.1216,0.266325,0.9161,0.919802,0.916457,0.916504
5,0.0746,0.280963,0.9232,0.925028,0.923347,0.923365
6,0.0469,0.275922,0.9293,0.932263,0.929266,0.929762


[I 2025-03-28 12:42:46,935] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0007432659869659249, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4356,0.349797,0.8839,0.894868,0.884023,0.884358
2,0.2123,0.301722,0.9039,0.91144,0.904116,0.905301
3,0.1343,0.263252,0.9195,0.92029,0.919785,0.919262
4,0.0881,0.236384,0.9299,0.932122,0.930248,0.930201
5,0.0534,0.260358,0.9305,0.932102,0.930865,0.930643
6,0.032,0.25524,0.9361,0.938339,0.93615,0.936402


[I 2025-03-28 12:50:14,192] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.0005204397126923052, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4005,0.292977,0.8995,0.905246,0.899377,0.899696
2,0.1729,0.274036,0.9128,0.919501,0.912701,0.913886
3,0.1013,0.269964,0.919,0.922902,0.918994,0.919193


[I 2025-03-28 12:54:05,679] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0003944003961500213, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.388,0.286552,0.9046,0.913591,0.904134,0.905074
2,0.1498,0.262912,0.9197,0.926777,0.919491,0.921272
3,0.0833,0.257864,0.9223,0.923753,0.922512,0.92215


[I 2025-03-28 12:57:51,369] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0003063519129353552, 'weight_decay': 0.01, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4053,0.2587,0.9127,0.917596,0.912556,0.913241
2,0.1338,0.240585,0.9242,0.929025,0.924045,0.925081
3,0.0675,0.239927,0.9308,0.931931,0.930933,0.930906
4,0.0399,0.223335,0.9387,0.938913,0.939038,0.938615
5,0.0212,0.255549,0.9376,0.938158,0.937869,0.937459
6,0.0126,0.272317,0.9406,0.941974,0.940668,0.940805
7,0.0052,0.329524,0.9325,0.934406,0.933029,0.931959
8,0.0033,0.285558,0.9436,0.946421,0.943819,0.943949
9,0.0013,0.342207,0.9359,0.940164,0.936153,0.936139
10,0.0009,0.255087,0.9481,0.948361,0.948331,0.947957


[I 2025-03-28 13:10:23,475] Trial 131 finished with value: 0.94795673946316 and parameters: {'learning_rate': 0.0003063519129353552, 'weight_decay': 0.01, 'warmup_steps': 4}. Best is trial 8 with value: 0.9546474828356917.


Trial 132 with params: {'learning_rate': 0.0001526105139122782, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4482,0.259612,0.9101,0.914695,0.909979,0.910165
2,0.1285,0.232921,0.9273,0.930502,0.927297,0.927941
3,0.0549,0.239875,0.9255,0.927104,0.925499,0.925582
4,0.0234,0.246693,0.9345,0.935164,0.934758,0.934462
5,0.0118,0.273242,0.9332,0.934347,0.933341,0.933199
6,0.006,0.283252,0.9337,0.935801,0.933596,0.933999


[I 2025-03-28 13:17:56,595] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0004990431092387819, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4147,0.287858,0.9021,0.907727,0.901991,0.902545
2,0.1731,0.260705,0.922,0.926553,0.921921,0.923136
3,0.1012,0.24885,0.9239,0.924968,0.924004,0.923731
4,0.0633,0.248563,0.9307,0.932834,0.930856,0.931168
5,0.0369,0.259171,0.9323,0.933159,0.932577,0.931859
6,0.0226,0.263085,0.9364,0.938133,0.936438,0.936724


[I 2025-03-28 13:25:29,128] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.00022033065905423778, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4023,0.250024,0.9133,0.918136,0.913185,0.913639
2,0.1247,0.245173,0.9223,0.927783,0.922258,0.923138
3,0.0568,0.248713,0.9267,0.929362,0.926568,0.926926
4,0.0298,0.244397,0.9359,0.936983,0.936047,0.93586
5,0.0152,0.298304,0.9298,0.932059,0.930021,0.929776
6,0.0084,0.264695,0.9393,0.940985,0.939315,0.939572


[I 2025-03-28 13:32:55,155] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0004925770425825767, 'weight_decay': 0.003, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4162,0.292679,0.9,0.908088,0.899753,0.900672
2,0.1711,0.289943,0.9113,0.920448,0.911464,0.912371
3,0.1016,0.262138,0.9201,0.922604,0.92028,0.920008
4,0.0615,0.244757,0.9322,0.933477,0.932426,0.932186
5,0.0358,0.267102,0.9361,0.93645,0.936495,0.935828
6,0.0218,0.259964,0.9362,0.937985,0.936323,0.936564


[I 2025-03-28 13:40:18,914] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 8.251692766362866e-05, 'weight_decay': 0.007, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.626,0.290832,0.8987,0.902948,0.898854,0.898766
2,0.1731,0.241204,0.9188,0.922559,0.918611,0.919581
3,0.0818,0.242028,0.9215,0.923505,0.921471,0.921578


[I 2025-03-28 13:44:07,542] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.00041703598885841286, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3869,0.279312,0.9053,0.912132,0.90511,0.905869
2,0.1539,0.234862,0.9255,0.929952,0.925493,0.926504
3,0.0856,0.239024,0.9284,0.929602,0.92855,0.92836
4,0.0524,0.248154,0.9308,0.932726,0.931106,0.930888
5,0.0306,0.241417,0.9398,0.940008,0.940077,0.939626
6,0.0203,0.237412,0.944,0.94515,0.944022,0.944299
7,0.009,0.306062,0.9353,0.936366,0.935785,0.934748
8,0.0039,0.267167,0.9477,0.95108,0.947799,0.948306
9,0.0016,0.327928,0.939,0.942853,0.939056,0.939183
10,0.0007,0.24275,0.949,0.949541,0.949211,0.94894


[I 2025-03-28 13:56:43,068] Trial 137 finished with value: 0.9489399168029395 and parameters: {'learning_rate': 0.00041703598885841286, 'weight_decay': 0.008, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 138 with params: {'learning_rate': 0.0003157968800566083, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3892,0.263295,0.9084,0.913551,0.908405,0.908939
2,0.1359,0.232115,0.9269,0.930061,0.927039,0.927372
3,0.0672,0.273038,0.9203,0.923418,0.920413,0.920345
4,0.0394,0.248426,0.9359,0.937155,0.936181,0.936006
5,0.0223,0.29392,0.9313,0.933541,0.931568,0.930718
6,0.0121,0.248015,0.9431,0.944774,0.943171,0.943298
7,0.0069,0.281968,0.9386,0.939283,0.939031,0.938201
8,0.0033,0.272804,0.9469,0.949454,0.947176,0.947123
9,0.0013,0.333666,0.9365,0.941257,0.936624,0.936824
10,0.0006,0.238608,0.9497,0.950336,0.949907,0.949672


[I 2025-03-28 14:09:14,905] Trial 138 finished with value: 0.9496723021703272 and parameters: {'learning_rate': 0.0003157968800566083, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 139 with params: {'learning_rate': 0.0004033478838309081, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3865,0.284433,0.9032,0.910042,0.903087,0.903601
2,0.1512,0.24894,0.9217,0.927785,0.921657,0.923006
3,0.0849,0.24499,0.9265,0.928426,0.926622,0.92648
4,0.0508,0.225575,0.9355,0.936997,0.935743,0.935593
5,0.0292,0.278222,0.9344,0.934908,0.934808,0.934136
6,0.0179,0.24716,0.942,0.943204,0.942017,0.942126
7,0.0075,0.313272,0.9347,0.936517,0.935204,0.934558
8,0.003,0.280215,0.9466,0.949953,0.946837,0.946957
9,0.0016,0.358249,0.935,0.940067,0.935092,0.935516
10,0.0006,0.238778,0.9498,0.950187,0.949987,0.9498


[I 2025-03-28 14:21:47,273] Trial 139 finished with value: 0.9497997228441001 and parameters: {'learning_rate': 0.0004033478838309081, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 140 with params: {'learning_rate': 0.00037319873190025876, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4102,0.291525,0.9017,0.908475,0.901507,0.901803
2,0.147,0.231113,0.925,0.928942,0.924984,0.925947
3,0.0785,0.24757,0.9275,0.929111,0.927553,0.927489
4,0.0463,0.241575,0.9367,0.937282,0.936975,0.936673
5,0.0276,0.22793,0.9398,0.940244,0.939918,0.939818
6,0.0148,0.257038,0.9401,0.940893,0.940224,0.94021
7,0.0073,0.333142,0.9307,0.931842,0.931211,0.930079
8,0.0033,0.277166,0.9419,0.944042,0.942192,0.942181
9,0.0011,0.355269,0.9329,0.937872,0.933034,0.933371
10,0.0007,0.269516,0.9437,0.94445,0.944029,0.943639


[I 2025-03-28 14:34:21,080] Trial 140 finished with value: 0.9436392207527732 and parameters: {'learning_rate': 0.00037319873190025876, 'weight_decay': 0.01, 'warmup_steps': 5}. Best is trial 8 with value: 0.9546474828356917.


Trial 141 with params: {'learning_rate': 0.0004347256118730771, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3922,0.290502,0.9029,0.910311,0.902741,0.903776
2,0.1579,0.263059,0.9159,0.922324,0.915783,0.917196
3,0.0896,0.261669,0.9218,0.923793,0.922042,0.921784


[I 2025-03-28 14:38:07,743] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.00022755806663338316, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4348,0.26503,0.9068,0.912499,0.906679,0.906867
2,0.1291,0.251459,0.921,0.926041,0.920845,0.921971
3,0.0593,0.236195,0.931,0.932156,0.931087,0.930978
4,0.0302,0.236362,0.937,0.937721,0.937177,0.937003
5,0.0163,0.28606,0.9328,0.934524,0.93294,0.932895
6,0.0095,0.260057,0.9389,0.940404,0.938842,0.939186
7,0.0033,0.297091,0.9363,0.937167,0.936823,0.93593
8,0.0019,0.282769,0.9425,0.944661,0.942683,0.942682
9,0.001,0.443479,0.9217,0.932004,0.921784,0.922543
10,0.0007,0.262328,0.9458,0.946473,0.946006,0.945756


[I 2025-03-28 14:50:43,110] Trial 142 finished with value: 0.94575564726977 and parameters: {'learning_rate': 0.00022755806663338316, 'weight_decay': 0.01, 'warmup_steps': 5}. Best is trial 8 with value: 0.9546474828356917.


Trial 143 with params: {'learning_rate': 0.0007237356110417293, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4284,0.367269,0.8752,0.892268,0.875013,0.87656
2,0.2168,0.260247,0.9149,0.921531,0.915003,0.915804
3,0.1374,0.260915,0.9207,0.92209,0.920861,0.920644


[I 2025-03-28 14:54:25,739] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.000620958459159803, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4196,0.324379,0.8905,0.899117,0.890296,0.890877
2,0.1937,0.270981,0.9109,0.917376,0.91102,0.912137
3,0.1202,0.263333,0.9173,0.918968,0.917579,0.917155
4,0.075,0.226562,0.932,0.932977,0.932436,0.932022
5,0.0476,0.239766,0.9351,0.935792,0.935216,0.935224
6,0.0274,0.254757,0.9357,0.937174,0.935834,0.935922


[I 2025-03-28 15:01:56,107] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.00039374295130310046, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3961,0.279207,0.9039,0.910039,0.903856,0.904427
2,0.1513,0.225737,0.9299,0.932798,0.929944,0.930827
3,0.0815,0.22895,0.9294,0.930765,0.929386,0.929224
4,0.0475,0.258181,0.9312,0.932618,0.931457,0.931328
5,0.0321,0.246873,0.9376,0.937913,0.937794,0.937414
6,0.0164,0.263958,0.9391,0.940386,0.939101,0.939248


[I 2025-03-28 15:09:28,242] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0003083290851549294, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3872,0.270419,0.9094,0.913951,0.909353,0.909933
2,0.1349,0.229168,0.9268,0.930154,0.926821,0.92761
3,0.0713,0.258372,0.927,0.928322,0.926996,0.927054
4,0.0387,0.248204,0.9333,0.935051,0.933544,0.933436
5,0.0242,0.255249,0.9385,0.939648,0.938641,0.9386
6,0.0127,0.252151,0.9402,0.942214,0.94026,0.940618
7,0.0053,0.291685,0.9347,0.935375,0.935251,0.934187
8,0.0028,0.25377,0.9462,0.948776,0.946361,0.946561
9,0.0012,0.356813,0.9323,0.938796,0.932423,0.932727
10,0.0007,0.228157,0.9506,0.950996,0.950757,0.950609


[I 2025-03-28 15:21:55,135] Trial 146 finished with value: 0.9506088931094145 and parameters: {'learning_rate': 0.0003083290851549294, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


Trial 147 with params: {'learning_rate': 0.0004544446364862843, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3924,0.251436,0.9155,0.919396,0.91521,0.915917
2,0.1629,0.271535,0.914,0.921118,0.913935,0.91547
3,0.0931,0.265248,0.9193,0.922324,0.919295,0.91935


[I 2025-03-28 15:25:39,053] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.00029121261295695196, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4143,0.265083,0.9111,0.917484,0.910908,0.911401
2,0.1377,0.234567,0.9229,0.926557,0.922982,0.923564
3,0.0696,0.238316,0.9294,0.931321,0.929404,0.929504
4,0.0399,0.24536,0.9363,0.93716,0.936575,0.936407
5,0.0234,0.281287,0.9326,0.934035,0.932914,0.932195
6,0.0115,0.24356,0.9427,0.944244,0.942627,0.942886
7,0.0055,0.28815,0.938,0.938962,0.938394,0.937605
8,0.0027,0.281722,0.9418,0.945409,0.942048,0.942251
9,0.0011,0.365279,0.9348,0.940302,0.93487,0.935369
10,0.0006,0.247187,0.9475,0.948062,0.947748,0.947483


[I 2025-03-28 15:38:04,314] Trial 148 finished with value: 0.9474828711537885 and parameters: {'learning_rate': 0.00029121261295695196, 'weight_decay': 0.01, 'warmup_steps': 6}. Best is trial 8 with value: 0.9546474828356917.


Trial 149 with params: {'learning_rate': 0.0002267720244026558, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4085,0.23599,0.9181,0.921483,0.917981,0.918525
2,0.1266,0.214763,0.9291,0.931925,0.929065,0.929614
3,0.0591,0.229177,0.9296,0.931048,0.929611,0.92972
4,0.032,0.235309,0.935,0.936381,0.935039,0.935149
5,0.0173,0.269431,0.9338,0.935966,0.933962,0.933891
6,0.0075,0.251777,0.9425,0.944039,0.942462,0.942724
7,0.0036,0.305077,0.9345,0.936185,0.934911,0.934131
8,0.0021,0.285053,0.9427,0.945562,0.942895,0.943079
9,0.0009,0.362798,0.9323,0.937638,0.932388,0.932737
10,0.0007,0.246884,0.9477,0.948363,0.947828,0.947719


[I 2025-03-28 15:50:30,732] Trial 149 finished with value: 0.947718598761725 and parameters: {'learning_rate': 0.0002267720244026558, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 8 with value: 0.9546474828356917.


In [43]:
print(best_base_pretrained)

BestRun(run_id='8', objective=0.9546474828356917, hyperparameters={'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}, run_summary=None)


In [44]:
base.reset_seed()

## Prohledávání s destilací s doučením předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [45]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-KD_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-KD_hp-search",  remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [46]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [47]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [48]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_mobilenet(10)
)

Nastavení prohledávání.

In [49]:
best_distil_pretrained = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-03-28 15:50:31,280] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3962,0.271806,0.9077,0.916947,0.907262,0.908197
2,0.1973,0.231288,0.9301,0.93412,0.930146,0.931004
3,0.1566,0.225188,0.9327,0.934921,0.932737,0.932686


[I 2025-03-28 15:54:16,137] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4945,0.29338,0.9007,0.909806,0.900536,0.901258
2,0.2215,0.246409,0.926,0.929773,0.925879,0.92689
3,0.1719,0.236334,0.9275,0.929411,0.927523,0.927662
4,0.1474,0.224286,0.9303,0.932597,0.930485,0.930401
5,0.1351,0.210354,0.9379,0.938884,0.938128,0.938068
6,0.1284,0.206919,0.9389,0.941188,0.938831,0.939326


[I 2025-03-28 16:01:44,684] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5928,0.336511,0.8798,0.887536,0.879711,0.879793
2,0.2682,0.276449,0.9126,0.916556,0.912407,0.913386
3,0.2111,0.259909,0.9161,0.917975,0.916151,0.916202


[I 2025-03-28 16:05:30,275] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4589,0.287778,0.9034,0.912047,0.903081,0.903899
2,0.2147,0.241793,0.9274,0.931297,0.927296,0.928276
3,0.1669,0.229483,0.9301,0.931851,0.930118,0.930242
4,0.1432,0.215557,0.936,0.937892,0.936184,0.936056
5,0.1326,0.206035,0.9388,0.939793,0.93898,0.938935
6,0.1265,0.201217,0.9427,0.944857,0.94259,0.943072


[I 2025-03-28 16:12:58,143] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3806,0.357684,0.8701,0.885121,0.869951,0.8703
2,0.2402,0.25604,0.9183,0.923394,0.918401,0.918918
3,0.1893,0.240348,0.9275,0.929608,0.927636,0.927794


[I 2025-03-28 16:16:41,538] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4646,0.418103,0.8317,0.851362,0.831987,0.831112
2,0.3007,0.334308,0.8833,0.897902,0.883319,0.88544
3,0.2363,0.276306,0.9036,0.905984,0.903857,0.903723
4,0.1987,0.242873,0.9198,0.922387,0.919949,0.920072
5,0.1664,0.231949,0.9255,0.927186,0.925723,0.925561
6,0.1457,0.21241,0.9361,0.937041,0.93629,0.93626


[I 2025-03-28 16:24:08,147] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3716,0.340276,0.8759,0.891779,0.875578,0.876437
2,0.2348,0.264305,0.9141,0.922081,0.9143,0.915427
3,0.1881,0.236104,0.9279,0.929824,0.928095,0.928061
4,0.1589,0.215435,0.936,0.938923,0.936207,0.93619
5,0.1388,0.210604,0.9383,0.939963,0.938294,0.938589
6,0.1294,0.190636,0.9475,0.948773,0.94758,0.947775


[I 2025-03-28 16:31:36,854] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4707,0.433092,0.8253,0.845241,0.825221,0.825529
2,0.3067,0.324493,0.8887,0.897578,0.888923,0.88958
3,0.2446,0.331391,0.8761,0.885491,0.876226,0.876161
4,0.2048,0.245977,0.9197,0.921681,0.919874,0.919618
5,0.1731,0.239969,0.9212,0.923025,0.921342,0.92131
6,0.1502,0.221744,0.9301,0.932596,0.930339,0.930496


[I 2025-03-28 16:39:04,157] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4642,0.296477,0.9005,0.908005,0.900349,0.900762
2,0.2284,0.254184,0.9214,0.92549,0.921259,0.922237
3,0.1781,0.238266,0.9263,0.927871,0.926351,0.926417
4,0.152,0.224743,0.9311,0.933083,0.931232,0.931178
5,0.1386,0.216387,0.9357,0.936947,0.935933,0.935877
6,0.1312,0.211271,0.9362,0.93887,0.936145,0.936641


[I 2025-03-28 16:46:27,448] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4017,0.367094,0.8689,0.884206,0.86916,0.868335
2,0.2545,0.299755,0.898,0.909194,0.897952,0.900085
3,0.2021,0.250359,0.9178,0.919635,0.918023,0.917972


[I 2025-03-28 16:50:16,123] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0004285183260552018, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3509,0.296824,0.897,0.908862,0.896837,0.898265
2,0.2036,0.239252,0.928,0.932227,0.927927,0.929052
3,0.1641,0.224521,0.9324,0.933799,0.932554,0.93241
4,0.1427,0.202798,0.9432,0.945437,0.943381,0.943528
5,0.1311,0.184446,0.9517,0.952378,0.951845,0.951767
6,0.1229,0.185121,0.9499,0.951113,0.949959,0.950133
7,0.1192,0.189006,0.9506,0.951069,0.950931,0.950477
8,0.1171,0.174392,0.9536,0.954759,0.953865,0.953777
9,0.1155,0.191439,0.9458,0.948219,0.945938,0.9461
10,0.1144,0.174702,0.9543,0.954815,0.954484,0.954369


[I 2025-03-28 17:02:43,559] Trial 10 finished with value: 0.9543691515263601 and parameters: {'learning_rate': 0.0004285183260552018, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 10 with value: 0.9543691515263601.


Trial 11 with params: {'learning_rate': 0.0014321301966915287, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4224,0.387658,0.8456,0.86133,0.845785,0.84556
2,0.2795,0.300816,0.8943,0.902268,0.894373,0.8961
3,0.2201,0.271475,0.9064,0.909337,0.906654,0.906576


[I 2025-03-28 17:06:27,466] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 9.686152689152715e-05, 'weight_decay': 0.002, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4608,0.292435,0.9002,0.908352,0.900057,0.900654
2,0.2233,0.248243,0.924,0.927756,0.923787,0.924786
3,0.1734,0.236965,0.9265,0.928345,0.926514,0.926624
4,0.1488,0.222985,0.9306,0.932795,0.930749,0.930726
5,0.1363,0.213485,0.9372,0.938483,0.937439,0.937386
6,0.1293,0.208222,0.9385,0.940739,0.938469,0.938899


[I 2025-03-28 17:13:55,276] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0004052254440503788, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3746,0.276852,0.9086,0.91586,0.908387,0.909164
2,0.2035,0.24433,0.9233,0.929176,0.923188,0.924605
3,0.1625,0.219905,0.9351,0.936557,0.935269,0.93514
4,0.142,0.200723,0.9455,0.947098,0.945723,0.945747
5,0.1303,0.190953,0.9493,0.949891,0.949581,0.949406
6,0.1232,0.185497,0.9498,0.951046,0.949908,0.950073


[I 2025-03-28 17:21:22,734] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0002967370539368567, 'weight_decay': 0.004, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3753,0.279092,0.9071,0.918224,0.906882,0.907679
2,0.196,0.224621,0.9329,0.936122,0.932868,0.933723
3,0.1564,0.220684,0.9352,0.936646,0.935286,0.935209
4,0.1376,0.196959,0.9446,0.945998,0.944755,0.944805
5,0.128,0.188508,0.949,0.949758,0.949089,0.949116
6,0.1229,0.188823,0.948,0.949525,0.947985,0.948265
7,0.1192,0.199367,0.9445,0.945342,0.944882,0.944287
8,0.1173,0.176785,0.9529,0.954243,0.953059,0.953087
9,0.1158,0.200913,0.941,0.944133,0.941038,0.94126
10,0.1148,0.178491,0.9535,0.953859,0.953639,0.953502


[I 2025-03-28 17:33:51,396] Trial 14 finished with value: 0.9535019334769033 and parameters: {'learning_rate': 0.0002967370539368567, 'weight_decay': 0.004, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 10 with value: 0.9543691515263601.


Trial 15 with params: {'learning_rate': 0.0009349007798192055, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3914,0.34536,0.8725,0.88561,0.8725,0.873134
2,0.2461,0.274946,0.9101,0.917358,0.910429,0.910938
3,0.1961,0.257432,0.9165,0.919,0.916602,0.91638


[I 2025-03-28 17:37:36,152] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.00022429163078221243, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3681,0.278208,0.9053,0.915322,0.905004,0.905934
2,0.1958,0.233827,0.9307,0.934499,0.930697,0.931547
3,0.155,0.217972,0.9352,0.937602,0.935293,0.935336
4,0.1369,0.204401,0.9404,0.942077,0.94068,0.940465
5,0.1274,0.188543,0.9489,0.949616,0.949049,0.949077
6,0.1225,0.188896,0.9484,0.950161,0.948404,0.948743
7,0.1195,0.2009,0.9433,0.944251,0.943732,0.943105
8,0.1177,0.183226,0.9484,0.951221,0.948563,0.948775
9,0.1165,0.209651,0.9397,0.943333,0.939778,0.940195
10,0.1153,0.184378,0.9488,0.949708,0.948969,0.948893


[I 2025-03-28 17:50:05,083] Trial 16 finished with value: 0.9488926820510981 and parameters: {'learning_rate': 0.00022429163078221243, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 6.0}. Best is trial 10 with value: 0.9543691515263601.


Trial 17 with params: {'learning_rate': 0.0006412609358779237, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3747,0.31124,0.8892,0.898941,0.889163,0.889693
2,0.2201,0.245969,0.9267,0.932144,0.926772,0.927779
3,0.1785,0.227592,0.9316,0.932462,0.931804,0.931589
4,0.1517,0.208091,0.9381,0.941202,0.9384,0.938475
5,0.135,0.198101,0.9458,0.946871,0.945915,0.945844
6,0.1258,0.1892,0.9456,0.947253,0.945783,0.945881


[I 2025-03-28 17:57:33,343] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 5.957853392927128e-05, 'weight_decay': 0.004, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5656,0.326376,0.8865,0.893914,0.886406,0.886563
2,0.2599,0.269449,0.9152,0.919248,0.915046,0.916089
3,0.2042,0.255346,0.9199,0.922016,0.919937,0.920111


[I 2025-03-28 18:01:18,771] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00045046258144846343, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3596,0.293515,0.8986,0.911171,0.898427,0.89982
2,0.2051,0.245396,0.9228,0.928433,0.922651,0.924079
3,0.1641,0.221927,0.9315,0.932767,0.931808,0.931509
4,0.1444,0.201427,0.9414,0.943084,0.941709,0.941657
5,0.1307,0.183625,0.9506,0.951305,0.950732,0.950726
6,0.1232,0.187926,0.9479,0.95002,0.947908,0.948242
7,0.1192,0.190191,0.9498,0.950082,0.950159,0.949549
8,0.1173,0.176563,0.9542,0.955573,0.954428,0.954324
9,0.1157,0.194301,0.9459,0.948839,0.945887,0.946228
10,0.1147,0.173459,0.9556,0.956201,0.955735,0.955745


[I 2025-03-28 18:13:47,023] Trial 19 finished with value: 0.9557450765155089 and parameters: {'learning_rate': 0.00045046258144846343, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 19 with value: 0.9557450765155089.


Trial 20 with params: {'learning_rate': 0.00042547607186766345, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3702,0.273573,0.9099,0.915809,0.909773,0.910289
2,0.2038,0.22966,0.9317,0.935437,0.931641,0.932475
3,0.1627,0.224629,0.9329,0.934398,0.933078,0.932919
4,0.142,0.208214,0.9381,0.941895,0.93838,0.938707
5,0.1302,0.187502,0.9496,0.950491,0.949725,0.949767
6,0.1226,0.183989,0.9505,0.95182,0.95061,0.950728
7,0.1195,0.196314,0.9453,0.946299,0.945634,0.945095
8,0.1171,0.174226,0.9542,0.955388,0.954399,0.954406
9,0.1155,0.191514,0.9488,0.951511,0.94889,0.949009
10,0.1144,0.17302,0.9553,0.955858,0.955484,0.955395


[I 2025-03-28 18:26:13,488] Trial 20 finished with value: 0.9553946151927425 and parameters: {'learning_rate': 0.00042547607186766345, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 19 with value: 0.9557450765155089.


Trial 21 with params: {'learning_rate': 0.00017048302356543796, 'weight_decay': 0.005, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4315,0.276208,0.9099,0.918525,0.909633,0.910699
2,0.2014,0.230699,0.9318,0.935172,0.93172,0.932559
3,0.1579,0.22747,0.9313,0.93321,0.931376,0.931434
4,0.1377,0.213146,0.9357,0.938157,0.935933,0.93589
5,0.1286,0.196475,0.9431,0.944205,0.943265,0.943325
6,0.1232,0.19642,0.9444,0.946289,0.944417,0.944679
7,0.1206,0.204666,0.9407,0.941459,0.94109,0.940472
8,0.1187,0.19029,0.9457,0.948694,0.945896,0.946117
9,0.1175,0.217421,0.9374,0.941163,0.937421,0.937852
10,0.1163,0.190462,0.9459,0.946663,0.946025,0.945926


[I 2025-03-28 18:38:42,402] Trial 21 finished with value: 0.945925982949173 and parameters: {'learning_rate': 0.00017048302356543796, 'weight_decay': 0.005, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}. Best is trial 19 with value: 0.9557450765155089.


Trial 22 with params: {'learning_rate': 0.000588906057047636, 'weight_decay': 0.004, 'warmup_steps': 10, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3663,0.297311,0.897,0.905646,0.896949,0.897749
2,0.2168,0.247866,0.9248,0.929618,0.924851,0.925959
3,0.1726,0.235092,0.9289,0.930597,0.928911,0.928893
4,0.1494,0.209864,0.9386,0.942046,0.938709,0.938997
5,0.1337,0.194141,0.9471,0.947863,0.947279,0.947253
6,0.1258,0.186538,0.9496,0.951524,0.949649,0.950022


[I 2025-03-28 18:46:09,461] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0006562519709440268, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3884,0.316929,0.8911,0.901493,0.891155,0.891671
2,0.2252,0.252424,0.9226,0.92705,0.92278,0.923528
3,0.1798,0.23596,0.9288,0.929979,0.928956,0.928837
4,0.1532,0.206301,0.9391,0.941652,0.939298,0.939465
5,0.1365,0.186257,0.9491,0.94959,0.949268,0.949257
6,0.1263,0.187264,0.948,0.94934,0.948097,0.948299
7,0.1208,0.193472,0.944,0.944521,0.94445,0.943752
8,0.1183,0.173713,0.9558,0.956948,0.956073,0.955949
9,0.1161,0.195172,0.945,0.948217,0.945135,0.945372
10,0.1149,0.17405,0.9554,0.95576,0.955586,0.955424


[I 2025-03-28 18:58:33,083] Trial 23 finished with value: 0.9554239353792248 and parameters: {'learning_rate': 0.0006562519709440268, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 19 with value: 0.9557450765155089.


Trial 24 with params: {'learning_rate': 0.0018344853039920228, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4745,0.43521,0.8358,0.858715,0.835389,0.834958
2,0.2999,0.298048,0.8923,0.898355,0.892503,0.893578
3,0.2348,0.271679,0.9091,0.911582,0.909361,0.909275


[I 2025-03-28 19:02:15,236] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0006034350725371133, 'weight_decay': 0.0, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.389,0.300908,0.8934,0.904278,0.893366,0.893778
2,0.2206,0.262606,0.9162,0.92419,0.916367,0.917958
3,0.1757,0.224256,0.9311,0.932432,0.931323,0.931121
4,0.1507,0.205963,0.9403,0.942289,0.940541,0.94044
5,0.1337,0.191984,0.9449,0.945996,0.94511,0.945048
6,0.1248,0.184988,0.9507,0.951472,0.95077,0.95082
7,0.1202,0.196783,0.9454,0.94613,0.945839,0.945099
8,0.1176,0.176448,0.9538,0.955257,0.954014,0.953987
9,0.1157,0.196074,0.9442,0.947515,0.944244,0.944558
10,0.1145,0.175052,0.9537,0.954248,0.953898,0.953719


[I 2025-03-28 19:14:42,128] Trial 25 finished with value: 0.9537193367427335 and parameters: {'learning_rate': 0.0006034350725371133, 'weight_decay': 0.0, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 19 with value: 0.9557450765155089.


Trial 26 with params: {'learning_rate': 0.0006998557000547919, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3898,0.334318,0.8787,0.895134,0.878639,0.880068
2,0.2289,0.2511,0.9223,0.92732,0.922201,0.923242
3,0.1817,0.232196,0.93,0.931771,0.930017,0.929921
4,0.1559,0.218551,0.9352,0.93757,0.935307,0.93539
5,0.1369,0.198726,0.9445,0.945731,0.944617,0.944697
6,0.1263,0.183462,0.9502,0.951298,0.950207,0.950472
7,0.1212,0.199355,0.9457,0.946536,0.946164,0.94554
8,0.1179,0.177441,0.9532,0.954839,0.953525,0.953375
9,0.1161,0.196044,0.9455,0.94864,0.945585,0.945923
10,0.1147,0.176626,0.9545,0.955373,0.954695,0.954654


[I 2025-03-28 19:27:09,318] Trial 26 finished with value: 0.9546541248743063 and parameters: {'learning_rate': 0.0006998557000547919, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 19 with value: 0.9557450765155089.


Trial 27 with params: {'learning_rate': 0.00021213614241312118, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4275,0.277532,0.9099,0.919729,0.909565,0.910645
2,0.1972,0.233694,0.9315,0.935108,0.9315,0.932223
3,0.1558,0.221741,0.9354,0.936981,0.935404,0.93547
4,0.1376,0.203724,0.9406,0.942414,0.940817,0.940819
5,0.1284,0.189171,0.9483,0.949,0.948417,0.948381
6,0.1226,0.19313,0.9467,0.948555,0.946718,0.946996
7,0.1196,0.202546,0.9444,0.94538,0.944808,0.944246
8,0.1181,0.183628,0.9493,0.951681,0.949446,0.949604
9,0.1168,0.20808,0.9406,0.944008,0.940682,0.941004
10,0.1156,0.183398,0.9503,0.95099,0.950368,0.950343


[I 2025-03-28 19:39:39,266] Trial 27 finished with value: 0.9503428339538644 and parameters: {'learning_rate': 0.00021213614241312118, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}. Best is trial 19 with value: 0.9557450765155089.


Trial 28 with params: {'learning_rate': 0.0011600806468685743, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.412,0.349999,0.8702,0.884518,0.870339,0.870037
2,0.2641,0.28313,0.9046,0.911786,0.904961,0.905706
3,0.2074,0.251762,0.918,0.920137,0.918138,0.918005
4,0.1749,0.221707,0.9326,0.934454,0.93286,0.932729
5,0.1477,0.209832,0.9372,0.938333,0.937441,0.937225
6,0.1336,0.196271,0.9458,0.947115,0.945947,0.946001


[I 2025-03-28 19:47:05,832] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0027055971463531107, 'weight_decay': 0.007, 'warmup_steps': 20, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5121,0.4614,0.8156,0.841504,0.815886,0.814351
2,0.3292,0.319953,0.8869,0.894003,0.886774,0.888581
3,0.262,0.292447,0.8955,0.899584,0.895814,0.895936


[I 2025-03-28 19:50:48,229] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0007243732057988554, 'weight_decay': 0.0, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3966,0.355613,0.8682,0.884122,0.868466,0.868191
2,0.2304,0.26554,0.9135,0.920391,0.913601,0.914627
3,0.1841,0.246313,0.9216,0.923506,0.921816,0.921717


[I 2025-03-28 19:54:30,809] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0002607932226172979, 'weight_decay': 0.003, 'warmup_steps': 24, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4039,0.267528,0.9108,0.918436,0.910498,0.911348
2,0.1961,0.228395,0.9296,0.934302,0.929605,0.930625
3,0.1556,0.21564,0.9368,0.938011,0.93693,0.93685
4,0.1367,0.20032,0.9439,0.945545,0.94406,0.944013
5,0.1278,0.188519,0.9489,0.949588,0.949036,0.949061
6,0.1221,0.186527,0.9487,0.950101,0.948714,0.948974
7,0.1194,0.197576,0.9451,0.945964,0.945427,0.944982
8,0.1175,0.17946,0.9528,0.954712,0.95288,0.953079
9,0.116,0.202072,0.9426,0.945782,0.942648,0.942976
10,0.1151,0.179416,0.9519,0.952356,0.952059,0.951952


[I 2025-03-28 20:06:55,111] Trial 31 finished with value: 0.9519516942559431 and parameters: {'learning_rate': 0.0002607932226172979, 'weight_decay': 0.003, 'warmup_steps': 24, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 19 with value: 0.9557450765155089.


Trial 32 with params: {'learning_rate': 0.000380061063067411, 'weight_decay': 0.001, 'warmup_steps': 25, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3876,0.278052,0.9095,0.917697,0.909369,0.910204
2,0.2006,0.243935,0.926,0.931647,0.925939,0.927375
3,0.1614,0.21914,0.9347,0.936656,0.934965,0.934867
4,0.1422,0.207167,0.9394,0.94236,0.939662,0.939676
5,0.1295,0.184551,0.9506,0.951443,0.95072,0.950749
6,0.1226,0.180974,0.9513,0.952765,0.951357,0.951641
7,0.1192,0.189959,0.949,0.949618,0.949392,0.948852
8,0.1174,0.176209,0.9546,0.956128,0.954872,0.954721
9,0.1157,0.197311,0.9452,0.948155,0.945303,0.945544
10,0.1146,0.173914,0.9557,0.956145,0.955906,0.955796


[I 2025-03-28 20:19:17,962] Trial 32 finished with value: 0.9557958553332699 and parameters: {'learning_rate': 0.000380061063067411, 'weight_decay': 0.001, 'warmup_steps': 25, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 32 with value: 0.9557958553332699.


Trial 33 with params: {'learning_rate': 0.0002329700851528144, 'weight_decay': 0.001, 'warmup_steps': 18, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4016,0.281076,0.9062,0.917393,0.905729,0.907142
2,0.1956,0.23581,0.9284,0.933441,0.928357,0.929421
3,0.155,0.21891,0.9348,0.936468,0.934912,0.935013
4,0.1362,0.203254,0.9423,0.944302,0.942453,0.942616
5,0.1273,0.192853,0.9462,0.946969,0.946442,0.946357
6,0.1222,0.191588,0.9453,0.946768,0.945307,0.945546


[I 2025-03-28 20:26:43,473] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.00048798063753357655, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3873,0.29825,0.8981,0.907229,0.898052,0.898409
2,0.2104,0.245867,0.9236,0.92888,0.923513,0.924862
3,0.1676,0.223967,0.9324,0.933222,0.932677,0.932225
4,0.1467,0.208209,0.9405,0.943377,0.940797,0.940734
5,0.1323,0.193066,0.9462,0.947087,0.94636,0.94627
6,0.1237,0.181797,0.9521,0.953683,0.95216,0.952462
7,0.1196,0.193945,0.9482,0.948971,0.948624,0.947998
8,0.1176,0.173168,0.9561,0.956977,0.956317,0.956188
9,0.1157,0.190237,0.9469,0.949641,0.947001,0.947217
10,0.1146,0.172956,0.9551,0.95575,0.955283,0.955138


[I 2025-03-28 20:39:06,315] Trial 34 finished with value: 0.9551382430735874 and parameters: {'learning_rate': 0.00048798063753357655, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.4, 'temperature': 6.5}. Best is trial 32 with value: 0.9557958553332699.


Trial 35 with params: {'learning_rate': 0.0005239408289563699, 'weight_decay': 0.006, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3905,0.312792,0.8892,0.902076,0.888921,0.890036
2,0.2137,0.254093,0.9212,0.926146,0.921327,0.922352
3,0.172,0.219866,0.9344,0.935499,0.934613,0.93458
4,0.1465,0.202504,0.9424,0.944266,0.942631,0.942657
5,0.1321,0.189961,0.9474,0.948007,0.947615,0.947501
6,0.1238,0.185286,0.9492,0.950442,0.949258,0.949502
7,0.1196,0.193038,0.9479,0.948459,0.948332,0.947696
8,0.1173,0.174534,0.9546,0.955981,0.954838,0.954843
9,0.1158,0.195233,0.9456,0.948207,0.945738,0.945838
10,0.1146,0.17368,0.9557,0.956415,0.95593,0.955812


[I 2025-03-28 20:51:25,914] Trial 35 finished with value: 0.9558118642927973 and parameters: {'learning_rate': 0.0005239408289563699, 'weight_decay': 0.006, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 35 with value: 0.9558118642927973.


Trial 36 with params: {'learning_rate': 0.0017469579212181603, 'weight_decay': 0.008, 'warmup_steps': 19, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4585,0.440793,0.8224,0.848337,0.822053,0.822722
2,0.2982,0.318225,0.8943,0.903217,0.894465,0.895782
3,0.2311,0.254171,0.9163,0.917304,0.916345,0.916281


[I 2025-03-28 20:55:10,445] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.00044417576151721354, 'weight_decay': 0.007, 'warmup_steps': 30, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3947,0.285411,0.9008,0.912609,0.900597,0.901671
2,0.2071,0.233133,0.9289,0.93347,0.929031,0.929686
3,0.1654,0.213712,0.9382,0.93865,0.938439,0.93814
4,0.1436,0.203902,0.9419,0.943988,0.942054,0.94214
5,0.132,0.194639,0.9468,0.947595,0.947007,0.946802
6,0.124,0.18388,0.9506,0.952027,0.950638,0.950816
7,0.1198,0.195287,0.9462,0.946671,0.946597,0.945916
8,0.1178,0.17522,0.9517,0.952872,0.951987,0.951794
9,0.116,0.196663,0.9437,0.947327,0.943734,0.944054
10,0.1148,0.172995,0.9557,0.956086,0.955893,0.955757


[I 2025-03-28 21:07:37,554] Trial 37 finished with value: 0.9557568200457676 and parameters: {'learning_rate': 0.00044417576151721354, 'weight_decay': 0.007, 'warmup_steps': 30, 'lambda_param': 0.0, 'temperature': 3.0}. Best is trial 35 with value: 0.9558118642927973.


Trial 38 with params: {'learning_rate': 0.0004524739088663337, 'weight_decay': 0.006, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.394,0.291352,0.9009,0.910471,0.900667,0.901431
2,0.2065,0.23286,0.9332,0.936095,0.933305,0.933771
3,0.1648,0.224731,0.935,0.935898,0.935161,0.934888
4,0.1457,0.202598,0.9412,0.943471,0.941373,0.941476
5,0.1314,0.184022,0.9524,0.953009,0.952458,0.952449
6,0.1235,0.183519,0.9515,0.952677,0.951583,0.951769
7,0.1191,0.188031,0.9493,0.949744,0.949703,0.949091
8,0.1171,0.174518,0.9565,0.958002,0.956723,0.95665
9,0.1155,0.188954,0.9492,0.951357,0.949314,0.949456
10,0.1145,0.173902,0.955,0.955316,0.955198,0.955066


[I 2025-03-28 21:20:06,543] Trial 38 finished with value: 0.9550663069051714 and parameters: {'learning_rate': 0.0004524739088663337, 'weight_decay': 0.006, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 3.0}. Best is trial 35 with value: 0.9558118642927973.


Trial 39 with params: {'learning_rate': 0.0007885136787012324, 'weight_decay': 0.008, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3954,0.321831,0.8888,0.896946,0.888819,0.888946
2,0.2332,0.310171,0.8974,0.910636,0.897467,0.899576
3,0.187,0.237787,0.9267,0.928139,0.926958,0.926713


[I 2025-03-28 21:23:49,946] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.00046440688974124867, 'weight_decay': 0.008, 'warmup_steps': 23, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3821,0.303454,0.8941,0.907535,0.893706,0.894744
2,0.2099,0.24889,0.9221,0.928512,0.922085,0.92351
3,0.1662,0.230366,0.9307,0.932295,0.930768,0.930625
4,0.1449,0.200545,0.9446,0.947287,0.94475,0.945034
5,0.1313,0.193497,0.948,0.948774,0.948239,0.948058
6,0.1236,0.184583,0.9501,0.951353,0.950043,0.950352
7,0.1194,0.192586,0.9455,0.946297,0.945895,0.945362
8,0.117,0.175407,0.9525,0.954499,0.952726,0.952771
9,0.1154,0.201013,0.9426,0.946529,0.942648,0.942975
10,0.1144,0.173779,0.9544,0.955052,0.954547,0.954514


[I 2025-03-28 21:36:20,169] Trial 40 finished with value: 0.9545136636695835 and parameters: {'learning_rate': 0.00046440688974124867, 'weight_decay': 0.008, 'warmup_steps': 23, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}. Best is trial 35 with value: 0.9558118642927973.


Trial 41 with params: {'learning_rate': 0.001143197736906318, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4192,0.377614,0.8531,0.874845,0.853126,0.852843
2,0.2628,0.304195,0.8976,0.907832,0.897835,0.899147
3,0.2071,0.2586,0.9132,0.915216,0.913309,0.913006


[I 2025-03-28 21:40:04,223] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.00017587884752628572, 'weight_decay': 0.004, 'warmup_steps': 20, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4258,0.275884,0.9083,0.917343,0.90794,0.909034
2,0.1994,0.226769,0.9298,0.932966,0.929812,0.930489
3,0.1562,0.218292,0.9367,0.937941,0.93673,0.936767
4,0.1371,0.208885,0.9386,0.941394,0.938807,0.938836
5,0.1283,0.198468,0.9438,0.944852,0.943935,0.943997
6,0.1232,0.192899,0.9469,0.948485,0.946905,0.947163


[I 2025-03-28 21:47:32,978] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.002533360612643033, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5174,0.461922,0.8044,0.835767,0.80417,0.804273
2,0.3292,0.32004,0.8856,0.890558,0.885729,0.886216
3,0.2589,0.294518,0.8954,0.899689,0.895758,0.895819


[I 2025-03-28 21:51:18,849] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0004014407821893915, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3903,0.286616,0.9032,0.913436,0.902869,0.903843
2,0.2029,0.238004,0.9273,0.933846,0.927214,0.928678
3,0.1617,0.230895,0.9307,0.932281,0.930915,0.930558
4,0.1418,0.198702,0.9438,0.945914,0.94397,0.944305
5,0.1298,0.187211,0.9497,0.950547,0.949761,0.949954
6,0.1227,0.18221,0.9513,0.952412,0.951312,0.951515
7,0.1191,0.19891,0.9464,0.946942,0.946804,0.946165
8,0.1172,0.177592,0.9527,0.954268,0.95286,0.95293
9,0.1156,0.199417,0.9445,0.94749,0.944547,0.944793
10,0.1145,0.175763,0.9543,0.954666,0.954467,0.954334


[I 2025-03-28 22:03:48,459] Trial 44 finished with value: 0.9543343502029199 and parameters: {'learning_rate': 0.0004014407821893915, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 35 with value: 0.9558118642927973.


Trial 45 with params: {'learning_rate': 0.00012958400718673065, 'weight_decay': 0.007, 'warmup_steps': 32, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.474,0.282984,0.9034,0.911356,0.903132,0.90393
2,0.2116,0.237267,0.9281,0.931654,0.928031,0.92894
3,0.1637,0.227338,0.9308,0.932336,0.93084,0.930902
4,0.1416,0.217245,0.9324,0.93461,0.932614,0.932432
5,0.1311,0.204431,0.9415,0.942565,0.941714,0.941693
6,0.1255,0.199425,0.9422,0.944018,0.942226,0.942492


[I 2025-03-28 22:11:19,763] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 6.605280237872519e-05, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5661,0.321345,0.8882,0.89679,0.888094,0.888454
2,0.2528,0.264431,0.9155,0.919494,0.915346,0.916403
3,0.1978,0.25061,0.9204,0.922311,0.920491,0.920532
4,0.1681,0.233441,0.9265,0.929269,0.926662,0.926601
5,0.1507,0.225315,0.9328,0.933943,0.933008,0.932899
6,0.1405,0.221417,0.9314,0.93429,0.931361,0.931867


[I 2025-03-28 22:18:47,312] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0001968180418092724, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3737,0.278117,0.9078,0.917549,0.907487,0.908462
2,0.1949,0.242518,0.9275,0.93242,0.927601,0.928286
3,0.1558,0.218465,0.9354,0.936982,0.935511,0.935542
4,0.1364,0.206198,0.9383,0.940121,0.93853,0.938405
5,0.1277,0.193026,0.9466,0.947239,0.946785,0.946649
6,0.1228,0.195352,0.9453,0.94715,0.945277,0.945568


[I 2025-03-28 22:26:12,177] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0005166998649607866, 'weight_decay': 0.01, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3927,0.299063,0.8958,0.905232,0.895749,0.896476
2,0.2136,0.23666,0.9287,0.9332,0.928802,0.929672
3,0.1698,0.227837,0.9338,0.935339,0.933959,0.933844
4,0.1466,0.205609,0.9406,0.943804,0.940717,0.941091
5,0.1324,0.186675,0.9532,0.953496,0.953438,0.953244
6,0.1246,0.189049,0.9475,0.948936,0.947589,0.947762


[I 2025-03-28 22:33:37,496] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0002562393166563217, 'weight_decay': 0.008, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3957,0.284292,0.8989,0.910943,0.898492,0.900021
2,0.1972,0.233124,0.9309,0.935108,0.93098,0.931784
3,0.1558,0.211129,0.941,0.942152,0.941109,0.941212
4,0.1374,0.201632,0.942,0.944027,0.942228,0.942262
5,0.1276,0.189032,0.9484,0.949349,0.948516,0.948572
6,0.1221,0.186295,0.9484,0.950076,0.948431,0.948742
7,0.1191,0.194068,0.9465,0.947172,0.946856,0.946414
8,0.1174,0.180166,0.9523,0.954009,0.952443,0.952582
9,0.1161,0.2012,0.9421,0.945374,0.942172,0.942458
10,0.1149,0.180746,0.9512,0.951762,0.951375,0.95125


[I 2025-03-28 22:46:03,544] Trial 49 finished with value: 0.9512503544261401 and parameters: {'learning_rate': 0.0002562393166563217, 'weight_decay': 0.008, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 2.5}. Best is trial 35 with value: 0.9558118642927973.


Trial 50 with params: {'learning_rate': 0.0004006810309668447, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3927,0.272644,0.9117,0.919006,0.911468,0.912282
2,0.2046,0.239829,0.9283,0.93325,0.928314,0.929328
3,0.1645,0.211928,0.9409,0.941616,0.941059,0.940944
4,0.1432,0.207372,0.9405,0.943559,0.940781,0.940914
5,0.1303,0.184529,0.9509,0.951306,0.951085,0.951031
6,0.1226,0.180183,0.952,0.953327,0.952097,0.952294
7,0.1191,0.195651,0.9481,0.948857,0.948477,0.947899
8,0.1171,0.174959,0.9538,0.955498,0.954006,0.954002
9,0.1158,0.191203,0.9483,0.950253,0.948386,0.948481
10,0.1147,0.174245,0.9559,0.956403,0.956118,0.95598


[I 2025-03-28 22:58:28,491] Trial 50 finished with value: 0.9559801609681784 and parameters: {'learning_rate': 0.0004006810309668447, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 50 with value: 0.9559801609681784.


Trial 51 with params: {'learning_rate': 0.0008766798964134671, 'weight_decay': 0.006, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4045,0.343178,0.8759,0.889256,0.875621,0.875532
2,0.2416,0.282917,0.9054,0.913038,0.9054,0.907119
3,0.1937,0.240724,0.9252,0.92675,0.925506,0.925014


[I 2025-03-28 23:02:11,148] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.000282148288039935, 'weight_decay': 0.005, 'warmup_steps': 28, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4025,0.277062,0.9086,0.916228,0.908387,0.90919
2,0.1957,0.228266,0.9329,0.936536,0.93288,0.933648
3,0.156,0.214011,0.9369,0.937928,0.936994,0.936936
4,0.1381,0.204017,0.9406,0.943093,0.940771,0.940865
5,0.1278,0.189557,0.948,0.948828,0.948137,0.948196
6,0.1224,0.187116,0.9475,0.9489,0.947575,0.94778


[I 2025-03-28 23:09:39,101] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.000619592349371242, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.367,0.309608,0.8864,0.899045,0.886267,0.887007
2,0.2194,0.265234,0.9121,0.920457,0.912256,0.913785
3,0.1768,0.229271,0.9308,0.931732,0.930928,0.930581
4,0.1518,0.206356,0.9426,0.944335,0.942744,0.942784
5,0.1357,0.189884,0.9494,0.950054,0.949525,0.94946
6,0.1248,0.184794,0.9497,0.951075,0.949678,0.949951


[I 2025-03-28 23:17:04,840] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0027026130785766608, 'weight_decay': 0.01, 'warmup_steps': 32, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5231,0.463023,0.8099,0.834307,0.809985,0.808968
2,0.3338,0.357313,0.8673,0.88138,0.867531,0.869312
3,0.2664,0.291932,0.8981,0.901267,0.898315,0.898516
4,0.2258,0.253817,0.9148,0.916561,0.915007,0.914893
5,0.1883,0.239491,0.9232,0.924143,0.923522,0.923161
6,0.1619,0.226942,0.9287,0.931169,0.928854,0.929103


[I 2025-03-28 23:24:30,047] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.004251166826739927, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.586,0.499691,0.7873,0.816283,0.786989,0.78701
2,0.3801,0.380679,0.8566,0.867998,0.856819,0.858219
3,0.3075,0.344531,0.8704,0.87607,0.870493,0.87067


[I 2025-03-28 23:28:13,866] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0009610231888568146, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4019,0.364198,0.8685,0.883617,0.86857,0.869326
2,0.2485,0.252159,0.9193,0.923517,0.919553,0.920227
3,0.1991,0.245119,0.9213,0.923235,0.921525,0.921439
4,0.1651,0.214807,0.9373,0.939794,0.937598,0.93745
5,0.1411,0.202122,0.9419,0.94322,0.941962,0.942049
6,0.1306,0.196456,0.9459,0.947409,0.945943,0.946237


[I 2025-03-28 23:35:35,810] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00031086408829559674, 'weight_decay': 0.007, 'warmup_steps': 32, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4021,0.274654,0.9122,0.920326,0.911957,0.912827
2,0.1985,0.23911,0.9244,0.930716,0.924474,0.92566
3,0.1585,0.216675,0.9357,0.937025,0.935794,0.935769
4,0.1391,0.210577,0.9388,0.941089,0.938896,0.938929
5,0.1285,0.185526,0.9493,0.949815,0.949454,0.949379
6,0.1231,0.183429,0.9506,0.952145,0.95072,0.950891
7,0.1191,0.19491,0.9463,0.947099,0.946669,0.946102
8,0.1173,0.179128,0.9524,0.954474,0.952541,0.952728
9,0.1159,0.199057,0.944,0.947073,0.944058,0.944314
10,0.1148,0.176645,0.9533,0.953786,0.953411,0.953407


[I 2025-03-28 23:47:59,827] Trial 57 finished with value: 0.9534072243298157 and parameters: {'learning_rate': 0.00031086408829559674, 'weight_decay': 0.007, 'warmup_steps': 32, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 50 with value: 0.9559801609681784.


Trial 58 with params: {'learning_rate': 0.0004163405237607461, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3991,0.286715,0.9025,0.911791,0.902078,0.903396
2,0.206,0.230481,0.9313,0.935237,0.931346,0.932321
3,0.1634,0.223748,0.933,0.934193,0.933083,0.932956
4,0.1423,0.209476,0.9399,0.943422,0.940158,0.940254
5,0.1301,0.18826,0.9497,0.95078,0.949855,0.949909
6,0.1231,0.181262,0.9532,0.954133,0.953264,0.953332
7,0.1193,0.192852,0.9479,0.948589,0.948372,0.947689
8,0.1172,0.174778,0.9553,0.956845,0.955503,0.955567
9,0.1156,0.201947,0.9403,0.944061,0.940347,0.940672
10,0.1144,0.173924,0.9546,0.955014,0.954777,0.95464


[I 2025-03-29 00:00:25,995] Trial 58 finished with value: 0.9546396716516818 and parameters: {'learning_rate': 0.0004163405237607461, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 6.5}. Best is trial 50 with value: 0.9559801609681784.


Trial 59 with params: {'learning_rate': 0.00015993691919571215, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4447,0.278551,0.9072,0.915741,0.906905,0.907811
2,0.2023,0.238153,0.9276,0.931561,0.927571,0.928509
3,0.1582,0.216969,0.9363,0.937422,0.936379,0.936376
4,0.1382,0.208969,0.9372,0.939309,0.937448,0.93737
5,0.1287,0.198302,0.944,0.944975,0.944142,0.944142
6,0.1234,0.195201,0.9435,0.945507,0.943481,0.943852


[I 2025-03-29 00:07:50,979] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.0003176202208979516, 'weight_decay': 0.003, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3628,0.268272,0.913,0.919904,0.912602,0.913397
2,0.1955,0.235919,0.9296,0.934282,0.929558,0.930558
3,0.156,0.221288,0.9323,0.933837,0.932391,0.932335
4,0.1387,0.199309,0.9453,0.94794,0.945473,0.945793
5,0.1288,0.188008,0.9487,0.949632,0.948834,0.948905
6,0.1222,0.188008,0.9471,0.948822,0.947104,0.947421


[I 2025-03-29 00:15:19,791] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0004877061719648659, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3704,0.297588,0.8987,0.907842,0.898409,0.899158
2,0.2105,0.238451,0.9281,0.931624,0.928324,0.928787
3,0.1673,0.232527,0.9268,0.928704,0.927005,0.926878
4,0.1458,0.206447,0.9413,0.943576,0.941482,0.941714
5,0.1334,0.187543,0.9499,0.950619,0.950012,0.950077
6,0.1242,0.18582,0.9502,0.951455,0.950162,0.950353
7,0.1195,0.193417,0.9485,0.949004,0.948858,0.948276
8,0.1175,0.17741,0.9532,0.954785,0.953471,0.953359
9,0.1157,0.197775,0.9439,0.947389,0.943911,0.944339
10,0.1145,0.175228,0.9554,0.955921,0.955594,0.955472


[I 2025-03-29 00:27:45,348] Trial 61 finished with value: 0.9554723391210445 and parameters: {'learning_rate': 0.0004877061719648659, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 5.5}. Best is trial 50 with value: 0.9559801609681784.


Trial 62 with params: {'learning_rate': 0.00034701167721701265, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3986,0.275753,0.9099,0.916678,0.909639,0.910376
2,0.2001,0.240347,0.9251,0.929981,0.925135,0.926257
3,0.1608,0.223128,0.933,0.934528,0.933119,0.933055
4,0.1407,0.198441,0.945,0.946599,0.945115,0.945351
5,0.1293,0.183279,0.9493,0.94963,0.949438,0.949367
6,0.123,0.185041,0.9498,0.950988,0.949844,0.950044
7,0.1192,0.195746,0.9466,0.94716,0.946966,0.946381
8,0.1173,0.17611,0.9544,0.955371,0.954606,0.954634
9,0.1158,0.196593,0.9446,0.947287,0.9447,0.944941
10,0.1146,0.178836,0.9531,0.953473,0.953301,0.953166


[I 2025-03-29 00:40:07,960] Trial 62 finished with value: 0.9531662829483312 and parameters: {'learning_rate': 0.00034701167721701265, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 50 with value: 0.9559801609681784.


Trial 63 with params: {'learning_rate': 0.0006420820970475254, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3707,0.329872,0.8843,0.895434,0.884324,0.884853
2,0.2221,0.260622,0.917,0.923297,0.917033,0.918338
3,0.1784,0.23484,0.9301,0.932342,0.930288,0.930328
4,0.1532,0.204527,0.9404,0.94176,0.940644,0.940553
5,0.1366,0.193635,0.9459,0.946759,0.946057,0.946014
6,0.1261,0.188505,0.9477,0.949197,0.947721,0.94797
7,0.1201,0.198533,0.9439,0.944517,0.94436,0.943592
8,0.1177,0.175566,0.9542,0.955482,0.95444,0.954422
9,0.116,0.194385,0.9463,0.949311,0.946324,0.946671
10,0.1147,0.173923,0.956,0.956577,0.95619,0.956102


[I 2025-03-29 00:52:32,857] Trial 63 finished with value: 0.9561022303412703 and parameters: {'learning_rate': 0.0006420820970475254, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 63 with value: 0.9561022303412703.


Trial 64 with params: {'learning_rate': 0.0005498706897887974, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3512,0.297502,0.9015,0.912141,0.901415,0.902346
2,0.2113,0.235018,0.9273,0.931121,0.9275,0.927932
3,0.17,0.222352,0.9335,0.934365,0.933757,0.933509
4,0.1491,0.206739,0.9409,0.943635,0.940994,0.941135
5,0.1328,0.18989,0.9488,0.949624,0.948941,0.948994
6,0.1243,0.182608,0.951,0.952224,0.951072,0.951219
7,0.1194,0.191628,0.9482,0.948941,0.948558,0.948104
8,0.1173,0.175576,0.9529,0.954419,0.95312,0.953079
9,0.1158,0.191638,0.9471,0.949458,0.947159,0.947382
10,0.1145,0.173745,0.9567,0.957069,0.956895,0.956751


[I 2025-03-29 01:04:58,777] Trial 64 finished with value: 0.9567513619864837 and parameters: {'learning_rate': 0.0005498706897887974, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}. Best is trial 64 with value: 0.9567513619864837.


Trial 65 with params: {'learning_rate': 0.001611726830248623, 'weight_decay': 0.002, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4427,0.424068,0.838,0.860793,0.837522,0.837944
2,0.2919,0.31124,0.8914,0.90046,0.891516,0.893291
3,0.2278,0.280767,0.902,0.905418,0.902276,0.902291
4,0.1903,0.231043,0.9265,0.928231,0.926716,0.926752
5,0.1618,0.220601,0.9341,0.935226,0.934194,0.934255
6,0.1419,0.216607,0.934,0.936045,0.934179,0.934359


[I 2025-03-29 01:12:23,884] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0025417615613742037, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5117,0.480079,0.7988,0.8295,0.798509,0.798057
2,0.3331,0.332817,0.878,0.884746,0.877999,0.879454
3,0.2661,0.317987,0.8825,0.888259,0.88265,0.882692


[I 2025-03-29 01:16:04,387] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0004962875069899694, 'weight_decay': 0.004, 'warmup_steps': 7, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3596,0.316223,0.887,0.901786,0.886617,0.888156
2,0.2088,0.245456,0.9266,0.93126,0.926815,0.927351
3,0.17,0.23623,0.93,0.931821,0.930284,0.929892
4,0.1468,0.204273,0.9411,0.943192,0.941182,0.941433
5,0.1321,0.189113,0.9483,0.949014,0.948321,0.948397
6,0.1235,0.182551,0.9499,0.951356,0.949989,0.950155
7,0.12,0.192277,0.9472,0.94762,0.947571,0.947003
8,0.1172,0.177881,0.9526,0.95494,0.952847,0.952871
9,0.1157,0.193305,0.9485,0.951542,0.948504,0.948871
10,0.1146,0.174667,0.954,0.95452,0.954158,0.954069


[I 2025-03-29 01:28:24,100] Trial 67 finished with value: 0.9540694535366605 and parameters: {'learning_rate': 0.0004962875069899694, 'weight_decay': 0.004, 'warmup_steps': 7, 'lambda_param': 0.1, 'temperature': 5.0}. Best is trial 64 with value: 0.9567513619864837.


Trial 68 with params: {'learning_rate': 0.0005484552761799348, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3529,0.308843,0.888,0.901181,0.888121,0.888885
2,0.2141,0.258449,0.9167,0.922309,0.916783,0.917827
3,0.174,0.232601,0.9282,0.92923,0.928452,0.928183
4,0.1483,0.208996,0.9399,0.94177,0.940199,0.940109
5,0.1343,0.19151,0.9478,0.948556,0.947886,0.947928
6,0.125,0.188639,0.9488,0.950557,0.948886,0.949126
7,0.1199,0.196579,0.9455,0.946211,0.94596,0.945287
8,0.1177,0.176491,0.9542,0.955391,0.954412,0.954338
9,0.116,0.19398,0.9442,0.947131,0.944232,0.944565
10,0.1147,0.17528,0.9548,0.955112,0.954944,0.954833


[I 2025-03-29 01:40:43,832] Trial 68 finished with value: 0.9548332675027238 and parameters: {'learning_rate': 0.0005484552761799348, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 64 with value: 0.9567513619864837.


Trial 69 with params: {'learning_rate': 0.0003165922229062012, 'weight_decay': 0.0, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3735,0.272022,0.9109,0.919482,0.910627,0.911474
2,0.1967,0.240346,0.9249,0.930335,0.924756,0.92613
3,0.1573,0.218954,0.9363,0.937739,0.936492,0.936403
4,0.1389,0.208771,0.9391,0.941297,0.939271,0.93925
5,0.1283,0.189422,0.9493,0.950243,0.949419,0.949473
6,0.1221,0.185645,0.9486,0.949921,0.948614,0.948889


[I 2025-03-29 01:48:11,596] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0004579009378921728, 'weight_decay': 0.006, 'warmup_steps': 20, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3813,0.293276,0.8985,0.908456,0.898562,0.899492
2,0.209,0.242632,0.9255,0.931494,0.925409,0.926594
3,0.1652,0.212035,0.9373,0.938072,0.937542,0.937306
4,0.1452,0.205545,0.9442,0.945999,0.944369,0.944529
5,0.1309,0.180883,0.9521,0.952674,0.952185,0.952248
6,0.1231,0.186981,0.9483,0.949891,0.948283,0.948579


[I 2025-03-29 01:55:35,961] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0007470325347726769, 'weight_decay': 0.004, 'warmup_steps': 16, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.383,0.332571,0.8787,0.89254,0.878778,0.878856
2,0.2301,0.271912,0.9116,0.921518,0.911732,0.913339
3,0.1848,0.231623,0.9284,0.929559,0.928709,0.928236
4,0.1565,0.211361,0.94,0.941635,0.94022,0.940214
5,0.138,0.192358,0.946,0.94645,0.946174,0.94608
6,0.1265,0.192001,0.9448,0.946234,0.94498,0.945035


[I 2025-03-29 02:03:00,105] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0006305013768235046, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3723,0.304092,0.8944,0.904385,0.894313,0.895429
2,0.2183,0.267095,0.9154,0.924231,0.915488,0.916796
3,0.1779,0.226737,0.9326,0.933684,0.932834,0.9326
4,0.1518,0.216177,0.9354,0.938436,0.935764,0.935551
5,0.1345,0.190879,0.9466,0.947902,0.946862,0.946861
6,0.1251,0.189605,0.9479,0.949284,0.948032,0.948101


[I 2025-03-29 02:10:25,622] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.0004506830404570648, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3448,0.296414,0.8952,0.907313,0.895012,0.896282
2,0.2049,0.232078,0.93,0.934124,0.930004,0.930915
3,0.1644,0.216164,0.9369,0.938212,0.937006,0.936982
4,0.1438,0.209669,0.9405,0.942975,0.940819,0.940671
5,0.1301,0.188055,0.9493,0.950243,0.949322,0.949404
6,0.1234,0.186094,0.9507,0.952165,0.950744,0.950952
7,0.1192,0.193164,0.9471,0.948005,0.947495,0.947006
8,0.1171,0.175548,0.9544,0.9559,0.954673,0.954621
9,0.1155,0.194522,0.9454,0.948884,0.945465,0.945879
10,0.1144,0.172648,0.9564,0.956874,0.956603,0.95646


[I 2025-03-29 02:22:44,616] Trial 73 finished with value: 0.9564598504293231 and parameters: {'learning_rate': 0.0004506830404570648, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}. Best is trial 64 with value: 0.9567513619864837.


Trial 74 with params: {'learning_rate': 0.0010717258505902054, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3891,0.357568,0.8703,0.882363,0.869997,0.87113
2,0.2561,0.275742,0.9092,0.914025,0.909569,0.909932
3,0.2046,0.24655,0.9204,0.92147,0.920575,0.920232


[I 2025-03-29 02:26:27,094] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00047441555586286496, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3626,0.29637,0.9008,0.911282,0.900686,0.90193
2,0.2072,0.257016,0.9179,0.923877,0.917955,0.91905
3,0.1673,0.219837,0.9343,0.935552,0.934471,0.934399
4,0.1451,0.21098,0.9396,0.94252,0.939801,0.939927
5,0.1319,0.190419,0.9486,0.949244,0.948759,0.948662
6,0.1235,0.180528,0.9528,0.953901,0.952836,0.953106
7,0.1196,0.197141,0.9446,0.94564,0.945086,0.944401
8,0.1174,0.174279,0.9544,0.95612,0.954585,0.954669
9,0.1157,0.19491,0.9454,0.947994,0.945436,0.94569
10,0.1145,0.173393,0.9574,0.957777,0.95757,0.957494


[I 2025-03-29 02:38:53,836] Trial 75 finished with value: 0.9574944708821433 and parameters: {'learning_rate': 0.00047441555586286496, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 75 with value: 0.9574944708821433.


Trial 76 with params: {'learning_rate': 0.00019487045103705824, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3818,0.273885,0.91,0.918732,0.909531,0.910465
2,0.1966,0.229416,0.9324,0.93651,0.932374,0.933344
3,0.155,0.215598,0.9358,0.936897,0.935923,0.935917
4,0.1364,0.204911,0.9396,0.941435,0.939755,0.939853
5,0.1277,0.195838,0.9448,0.945912,0.944864,0.944956
6,0.1227,0.194163,0.9463,0.947793,0.94634,0.946566


[I 2025-03-29 02:46:21,602] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.00041679053444085474, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3474,0.281949,0.9106,0.918285,0.910393,0.911122
2,0.2012,0.236933,0.931,0.935218,0.930939,0.932041
3,0.1642,0.222744,0.9328,0.934386,0.933104,0.93289
4,0.143,0.200093,0.9437,0.945639,0.943777,0.943956
5,0.1308,0.185876,0.9501,0.950971,0.950215,0.950187
6,0.1232,0.186334,0.9494,0.951089,0.949385,0.949675
7,0.1195,0.193004,0.9472,0.947925,0.947591,0.946972
8,0.1174,0.173792,0.9559,0.95775,0.956092,0.95616
9,0.1159,0.196324,0.9452,0.949077,0.94527,0.945613
10,0.1147,0.170893,0.9583,0.95867,0.958445,0.958376


[I 2025-03-29 02:58:48,643] Trial 77 finished with value: 0.958376301709421 and parameters: {'learning_rate': 0.00041679053444085474, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}. Best is trial 77 with value: 0.958376301709421.


Trial 78 with params: {'learning_rate': 0.0004937558138383237, 'weight_decay': 0.001, 'warmup_steps': 7, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.362,0.3166,0.8906,0.903573,0.890246,0.891574
2,0.2091,0.260386,0.9179,0.927005,0.917757,0.919627
3,0.1688,0.241887,0.9225,0.925153,0.922528,0.92271
4,0.146,0.203861,0.9391,0.941873,0.939332,0.939422
5,0.1312,0.186719,0.9503,0.951191,0.950487,0.95048
6,0.1233,0.184307,0.9504,0.951785,0.950463,0.950708
7,0.1192,0.194372,0.9463,0.947233,0.946648,0.946232
8,0.1171,0.171494,0.9564,0.95718,0.956663,0.956549
9,0.1156,0.193625,0.9466,0.949635,0.946614,0.946916
10,0.1144,0.172607,0.956,0.956597,0.956164,0.956124


[I 2025-03-29 03:11:14,885] Trial 78 finished with value: 0.9561238935412023 and parameters: {'learning_rate': 0.0004937558138383237, 'weight_decay': 0.001, 'warmup_steps': 7, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 77 with value: 0.958376301709421.


Trial 79 with params: {'learning_rate': 0.00010967129602388196, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4449,0.285365,0.9055,0.913348,0.905312,0.906
2,0.2158,0.239045,0.9277,0.930811,0.927528,0.928333
3,0.1677,0.2294,0.9311,0.932612,0.931138,0.931206
4,0.1444,0.218787,0.9335,0.935675,0.933574,0.933666
5,0.1333,0.208144,0.9396,0.940921,0.939787,0.939849
6,0.1273,0.203999,0.9397,0.941838,0.939708,0.940078


[I 2025-03-29 03:18:44,586] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.00014244143045233322, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4063,0.277997,0.9069,0.914753,0.906531,0.90721
2,0.2042,0.238495,0.928,0.931843,0.927926,0.928888
3,0.1601,0.221905,0.936,0.937307,0.935984,0.936126
4,0.1394,0.211958,0.9369,0.938605,0.937102,0.937045
5,0.1298,0.204012,0.941,0.942234,0.941131,0.941184
6,0.1245,0.199927,0.9414,0.943108,0.941454,0.941712


[I 2025-03-29 03:26:08,561] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.00041875519106209365, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3585,0.279664,0.9076,0.917249,0.907385,0.908731
2,0.2024,0.245265,0.9246,0.930375,0.924533,0.925901
3,0.1612,0.220043,0.9348,0.936178,0.934857,0.934885
4,0.1435,0.20095,0.944,0.94655,0.944308,0.944341
5,0.1304,0.187469,0.9496,0.950213,0.949723,0.949704
6,0.1232,0.181957,0.9511,0.952144,0.951182,0.951348
7,0.1192,0.192008,0.9481,0.948736,0.948464,0.947946
8,0.1171,0.177196,0.9531,0.955007,0.953389,0.953334
9,0.1157,0.194322,0.9471,0.949848,0.947172,0.947396
10,0.1146,0.174553,0.9556,0.95607,0.955775,0.955689


[I 2025-03-29 03:38:31,321] Trial 81 finished with value: 0.9556891342519327 and parameters: {'learning_rate': 0.00041875519106209365, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 77 with value: 0.958376301709421.


Trial 82 with params: {'learning_rate': 0.0001746455687351895, 'weight_decay': 0.002, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4053,0.277846,0.9067,0.915232,0.906506,0.907489
2,0.1991,0.240556,0.9277,0.932775,0.927737,0.928698
3,0.1566,0.221793,0.9353,0.936922,0.935348,0.935396
4,0.1378,0.208429,0.9385,0.94072,0.938677,0.938799
5,0.1284,0.196124,0.9456,0.94659,0.945781,0.945789
6,0.1231,0.192866,0.9468,0.948313,0.946825,0.947076


[I 2025-03-29 03:45:56,811] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.00042894602520748414, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3551,0.292257,0.897,0.907294,0.896893,0.89798
2,0.2042,0.244359,0.9254,0.931959,0.925167,0.926892
3,0.1624,0.220006,0.9369,0.937825,0.936966,0.936833
4,0.1424,0.204232,0.9415,0.943434,0.941652,0.941633
5,0.13,0.182405,0.9534,0.953961,0.953622,0.953541
6,0.1229,0.182731,0.9512,0.952257,0.95128,0.951392
7,0.1193,0.189135,0.9481,0.94869,0.948472,0.947943
8,0.1172,0.175399,0.9549,0.956443,0.955166,0.955034
9,0.1157,0.19507,0.943,0.946424,0.943082,0.94339
10,0.1145,0.172386,0.9565,0.957051,0.956673,0.956579


[I 2025-03-29 03:58:13,812] Trial 83 finished with value: 0.9565792214431041 and parameters: {'learning_rate': 0.00042894602520748414, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 77 with value: 0.958376301709421.


Trial 84 with params: {'learning_rate': 0.0006096513920254087, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3559,0.344117,0.8746,0.889288,0.87461,0.875209
2,0.2201,0.247489,0.9227,0.927097,0.922944,0.923363
3,0.1748,0.221933,0.9368,0.937402,0.937042,0.936872
4,0.151,0.206669,0.9402,0.942589,0.940371,0.940545
5,0.1352,0.192447,0.9475,0.948363,0.947674,0.947569
6,0.1254,0.184461,0.9504,0.951731,0.950468,0.950661
7,0.1202,0.191753,0.9481,0.94874,0.94851,0.947892
8,0.1178,0.175247,0.9539,0.955366,0.954059,0.954155
9,0.116,0.193646,0.945,0.948044,0.94502,0.945174
10,0.1149,0.172772,0.9564,0.956756,0.956588,0.956431


[I 2025-03-29 04:10:34,917] Trial 84 finished with value: 0.9564313797790355 and parameters: {'learning_rate': 0.0006096513920254087, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}. Best is trial 77 with value: 0.958376301709421.


Trial 85 with params: {'learning_rate': 0.0006379904246575134, 'weight_decay': 0.003, 'warmup_steps': 6, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3657,0.322568,0.8882,0.900423,0.887865,0.888736
2,0.2203,0.246859,0.9242,0.929657,0.924277,0.925033
3,0.1778,0.226666,0.9306,0.93166,0.930832,0.930651


[I 2025-03-29 04:14:17,747] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0006266910709117714, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3656,0.295796,0.8948,0.903946,0.894895,0.895067
2,0.2191,0.259749,0.9161,0.923978,0.916037,0.917769
3,0.1768,0.225423,0.9327,0.93383,0.933037,0.932766
4,0.1508,0.222778,0.9315,0.934475,0.931695,0.931885
5,0.1339,0.197165,0.9436,0.9451,0.943694,0.943814
6,0.125,0.190766,0.9484,0.950153,0.948459,0.948684
7,0.1203,0.191419,0.9465,0.946956,0.946839,0.946341
8,0.1178,0.179909,0.9517,0.953814,0.951995,0.951915
9,0.1161,0.19437,0.9422,0.9453,0.942289,0.942477
10,0.1148,0.175532,0.9539,0.954525,0.954069,0.953908


[I 2025-03-29 04:26:41,645] Trial 86 finished with value: 0.9539078871368435 and parameters: {'learning_rate': 0.0006266910709117714, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 77 with value: 0.958376301709421.


Trial 87 with params: {'learning_rate': 0.0004061937171051526, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3481,0.288208,0.9031,0.913085,0.902855,0.904001
2,0.2013,0.240916,0.9241,0.928738,0.924092,0.925106
3,0.1621,0.223192,0.9341,0.935967,0.934215,0.934195
4,0.1433,0.210984,0.9381,0.942743,0.938244,0.938867
5,0.1301,0.187666,0.947,0.947846,0.947112,0.94716
6,0.1225,0.188376,0.9475,0.949283,0.94747,0.947744


[I 2025-03-29 04:34:12,104] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0009011295512349717, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3752,0.360837,0.8679,0.885379,0.867913,0.868371
2,0.2409,0.259203,0.9155,0.921469,0.915709,0.916176
3,0.195,0.242143,0.9246,0.927017,0.925055,0.924838


[I 2025-03-29 04:37:56,805] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.00044498409036288346, 'weight_decay': 0.005, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3645,0.285222,0.906,0.915372,0.905868,0.906838
2,0.2047,0.231322,0.9302,0.933548,0.930343,0.930931
3,0.1645,0.223121,0.9331,0.934275,0.933377,0.932936
4,0.1444,0.196875,0.9453,0.948063,0.945457,0.945704
5,0.1307,0.183294,0.9518,0.95212,0.951993,0.951862
6,0.1226,0.183939,0.9518,0.952804,0.951797,0.951938
7,0.1192,0.187987,0.9504,0.951253,0.950706,0.950287
8,0.117,0.173103,0.9561,0.95762,0.956295,0.956261
9,0.1154,0.187638,0.95,0.952321,0.950052,0.950229
10,0.1144,0.171788,0.9581,0.958773,0.958254,0.958165


[I 2025-03-29 04:50:21,250] Trial 89 finished with value: 0.9581649911811109 and parameters: {'learning_rate': 0.00044498409036288346, 'weight_decay': 0.005, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 77 with value: 0.958376301709421.


Trial 90 with params: {'learning_rate': 0.0004239611382778768, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3592,0.286674,0.904,0.913065,0.903922,0.904352
2,0.2047,0.229205,0.932,0.935053,0.932002,0.932664
3,0.1623,0.22266,0.9326,0.934325,0.932677,0.932725
4,0.1442,0.201277,0.9431,0.945063,0.943312,0.943363
5,0.1308,0.188867,0.9474,0.947887,0.947619,0.947452
6,0.1228,0.187801,0.9487,0.949977,0.948703,0.948958
7,0.1191,0.196618,0.9463,0.946739,0.946723,0.946056
8,0.1172,0.179515,0.9519,0.953694,0.952237,0.952017
9,0.1157,0.197154,0.9449,0.948027,0.944989,0.945189
10,0.1144,0.175529,0.9557,0.956012,0.95593,0.955758


[I 2025-03-29 05:02:46,808] Trial 90 finished with value: 0.9557580442690918 and parameters: {'learning_rate': 0.0004239611382778768, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 77 with value: 0.958376301709421.


Trial 91 with params: {'learning_rate': 0.0003736754954906599, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3562,0.282536,0.9067,0.916096,0.9064,0.907507
2,0.1986,0.23237,0.933,0.936571,0.932963,0.933737
3,0.1596,0.219483,0.9368,0.937802,0.936909,0.936819
4,0.1419,0.19675,0.9463,0.948222,0.946534,0.946516
5,0.1289,0.188266,0.9498,0.950595,0.949981,0.949863
6,0.1222,0.181787,0.9513,0.952593,0.951424,0.951682
7,0.1191,0.194628,0.9472,0.947694,0.947625,0.946903
8,0.1169,0.175867,0.9526,0.953991,0.952878,0.952778
9,0.1157,0.197394,0.9445,0.947991,0.944593,0.944802
10,0.1146,0.175627,0.9547,0.955293,0.9549,0.954734


[I 2025-03-29 05:15:14,011] Trial 91 finished with value: 0.9547336408122362 and parameters: {'learning_rate': 0.0003736754954906599, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 77 with value: 0.958376301709421.


Trial 92 with params: {'learning_rate': 0.0008788164058632974, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3729,0.350917,0.8733,0.888822,0.873327,0.873708
2,0.2396,0.298723,0.8929,0.903682,0.893264,0.894612
3,0.1897,0.239947,0.9241,0.925855,0.924501,0.924185
4,0.1598,0.220048,0.9305,0.934095,0.930739,0.930956
5,0.1407,0.202404,0.9403,0.941609,0.940441,0.940353
6,0.1289,0.193228,0.945,0.946897,0.945075,0.9454


[I 2025-03-29 05:22:39,491] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.0018954709857489603, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4616,0.428361,0.8307,0.849343,0.831044,0.830208
2,0.307,0.323697,0.8806,0.891862,0.880706,0.882358
3,0.2385,0.267805,0.9072,0.909152,0.907377,0.90716


[I 2025-03-29 05:26:20,680] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0007240533290201113, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3641,0.310586,0.8865,0.896097,0.886484,0.887009
2,0.229,0.251728,0.9203,0.92474,0.920503,0.921216
3,0.1836,0.242308,0.9248,0.926029,0.924998,0.924578
4,0.1557,0.22095,0.9342,0.936958,0.934561,0.934485
5,0.1363,0.191227,0.9474,0.94878,0.94757,0.947658
6,0.126,0.189758,0.9477,0.949347,0.947774,0.947979


[I 2025-03-29 05:33:42,647] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0006203618811695965, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3707,0.30277,0.8936,0.906405,0.893466,0.894503
2,0.2203,0.273458,0.9091,0.920179,0.909013,0.911311
3,0.1772,0.229976,0.9314,0.932399,0.931652,0.931497
4,0.1514,0.20767,0.9393,0.940927,0.939549,0.939487
5,0.1347,0.187201,0.9494,0.950408,0.949554,0.949579
6,0.1259,0.189298,0.9478,0.949397,0.947759,0.948141


[I 2025-03-29 05:41:07,591] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.00010813685437626896, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4476,0.288929,0.9021,0.910387,0.901895,0.902506
2,0.217,0.240042,0.9282,0.931089,0.928087,0.928819
3,0.169,0.231968,0.9298,0.931571,0.929802,0.929895
4,0.1454,0.217976,0.9336,0.935899,0.933795,0.933819
5,0.1339,0.209104,0.9381,0.939353,0.938272,0.938278
6,0.1276,0.202774,0.9411,0.943183,0.941097,0.941478


[I 2025-03-29 05:48:33,675] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.00044863843369383124, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3542,0.299794,0.8969,0.907893,0.896645,0.897515
2,0.2068,0.247017,0.9239,0.929833,0.923953,0.925266
3,0.165,0.227194,0.9339,0.935263,0.934077,0.933865
4,0.1431,0.197733,0.944,0.945821,0.944243,0.944335
5,0.1308,0.189768,0.9477,0.948658,0.947847,0.947854
6,0.1232,0.180578,0.9524,0.953563,0.95247,0.952647
7,0.1196,0.190071,0.9474,0.947518,0.947793,0.947082
8,0.1174,0.173697,0.9535,0.955122,0.953733,0.953708
9,0.1157,0.194105,0.946,0.949141,0.946058,0.946322
10,0.1146,0.173244,0.9534,0.954009,0.953561,0.953504


[I 2025-03-29 06:00:54,395] Trial 97 finished with value: 0.9535040978116139 and parameters: {'learning_rate': 0.00044863843369383124, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 77 with value: 0.958376301709421.


Trial 98 with params: {'learning_rate': 0.000270831029075884, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3609,0.281725,0.9018,0.913625,0.901534,0.902889
2,0.1938,0.230361,0.9325,0.937452,0.932609,0.933321
3,0.1549,0.22009,0.9351,0.93688,0.935191,0.935165
4,0.1365,0.19464,0.9469,0.947675,0.947095,0.947042
5,0.128,0.190559,0.9487,0.949544,0.948807,0.948879
6,0.1221,0.185506,0.9495,0.950873,0.949593,0.949749


[I 2025-03-29 06:08:15,578] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 8.710007471084877e-05, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4975,0.297085,0.9006,0.908229,0.900456,0.90091
2,0.2308,0.252233,0.9218,0.925718,0.921645,0.922687
3,0.1796,0.237497,0.9263,0.928107,0.926356,0.926448
4,0.1533,0.224805,0.9309,0.933286,0.93112,0.931051
5,0.1395,0.21525,0.9365,0.937613,0.936729,0.93665
6,0.1315,0.212343,0.9364,0.938913,0.936281,0.936786


[I 2025-03-29 06:15:41,837] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.00026885910198952694, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4085,0.276715,0.9067,0.915649,0.906367,0.907411
2,0.1964,0.231616,0.9294,0.933486,0.92947,0.930267
3,0.156,0.221055,0.9331,0.934907,0.933284,0.933328
4,0.138,0.207828,0.9407,0.942831,0.940923,0.940708
5,0.1271,0.189552,0.9478,0.94834,0.947939,0.947812
6,0.1221,0.192187,0.9452,0.946914,0.945187,0.945524


[I 2025-03-29 06:23:06,275] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0007470908691221411, 'weight_decay': 0.004, 'warmup_steps': 29, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3967,0.332701,0.881,0.8949,0.881004,0.88171
2,0.2296,0.252418,0.9187,0.924911,0.919007,0.919473
3,0.1843,0.228278,0.9325,0.933417,0.932641,0.932696
4,0.1597,0.207914,0.9377,0.939401,0.937916,0.937836
5,0.1382,0.19731,0.9461,0.94698,0.946214,0.946296
6,0.1278,0.18781,0.9466,0.9484,0.946674,0.946914


[I 2025-03-29 06:30:34,054] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0007266047466918563, 'weight_decay': 0.003, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3731,0.313523,0.8897,0.902089,0.889803,0.890715
2,0.228,0.280297,0.9054,0.915008,0.905678,0.906461
3,0.1839,0.230217,0.9305,0.931205,0.930681,0.9303
4,0.1562,0.207256,0.9372,0.939374,0.937376,0.937528
5,0.1382,0.197104,0.9432,0.944142,0.94328,0.943333
6,0.127,0.186949,0.9476,0.948933,0.947702,0.947902


[I 2025-03-29 06:38:02,152] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0007126667777994006, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3722,0.338245,0.8821,0.892699,0.882138,0.8824
2,0.2275,0.257868,0.9151,0.920972,0.915256,0.916433
3,0.1807,0.235807,0.926,0.927755,0.926149,0.926078


[I 2025-03-29 06:41:44,965] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.00029155137296285415, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3627,0.274149,0.9104,0.919385,0.91017,0.9112
2,0.1957,0.224035,0.9314,0.934796,0.93133,0.932247
3,0.1562,0.221648,0.9341,0.93578,0.93417,0.934107
4,0.1377,0.200861,0.9436,0.946469,0.943749,0.943967
5,0.1277,0.188331,0.9478,0.94893,0.947868,0.948096
6,0.1221,0.184586,0.9512,0.952412,0.951236,0.95137
7,0.119,0.194436,0.9476,0.948007,0.948018,0.947392
8,0.1175,0.178161,0.9527,0.954401,0.952949,0.952916
9,0.1159,0.201745,0.9416,0.944903,0.941597,0.94192
10,0.1149,0.179065,0.9523,0.952819,0.952418,0.952344


[I 2025-03-29 06:54:07,927] Trial 104 finished with value: 0.95234385983005 and parameters: {'learning_rate': 0.00029155137296285415, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 77 with value: 0.958376301709421.


Trial 105 with params: {'learning_rate': 0.001394113520827695, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4353,0.409402,0.8365,0.859307,0.836012,0.836343
2,0.2804,0.26723,0.9126,0.914861,0.912722,0.913136
3,0.2199,0.261404,0.9135,0.915624,0.913622,0.913571


[I 2025-03-29 06:57:54,084] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0006395370258033928, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3579,0.306284,0.8907,0.900666,0.890744,0.891231
2,0.2211,0.249122,0.9213,0.926846,0.921333,0.922458
3,0.179,0.223513,0.9334,0.934742,0.93357,0.933519
4,0.1511,0.201856,0.9434,0.944273,0.943656,0.943604
5,0.1339,0.188753,0.949,0.949292,0.949137,0.949045
6,0.1254,0.189695,0.9483,0.949802,0.948317,0.948489
7,0.1207,0.195214,0.9482,0.948947,0.948629,0.948046
8,0.1177,0.178726,0.9518,0.954016,0.952116,0.951887
9,0.116,0.191692,0.9461,0.948134,0.946275,0.946349
10,0.1147,0.174723,0.9551,0.955627,0.955325,0.955201


[I 2025-03-29 07:10:21,161] Trial 106 finished with value: 0.9552006090697642 and parameters: {'learning_rate': 0.0006395370258033928, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 77 with value: 0.958376301709421.


Trial 107 with params: {'learning_rate': 0.0005978281944902294, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3933,0.313865,0.8881,0.899543,0.888051,0.888482
2,0.22,0.250325,0.9217,0.926672,0.921784,0.922597
3,0.1781,0.217761,0.9373,0.937952,0.937542,0.937344
4,0.1505,0.205605,0.9395,0.942506,0.939756,0.939871
5,0.1351,0.186823,0.9504,0.95102,0.950514,0.950505
6,0.1245,0.180942,0.9537,0.954728,0.953803,0.953941
7,0.12,0.191408,0.9484,0.949268,0.948821,0.948263
8,0.1175,0.174208,0.9548,0.956525,0.955105,0.955041
9,0.1158,0.195437,0.9452,0.948365,0.945298,0.94551
10,0.1146,0.171333,0.9575,0.95819,0.957738,0.957566


[I 2025-03-29 07:22:50,247] Trial 107 finished with value: 0.9575655593981087 and parameters: {'learning_rate': 0.0005978281944902294, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 77 with value: 0.958376301709421.


Trial 108 with params: {'learning_rate': 0.0002972913741240456, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3533,0.284878,0.9033,0.913474,0.903001,0.904216
2,0.1952,0.231647,0.9299,0.934041,0.929972,0.93076
3,0.1573,0.21764,0.9373,0.938798,0.937437,0.937398
4,0.1388,0.202763,0.9437,0.945032,0.943837,0.943814
5,0.1281,0.187568,0.9493,0.950145,0.949465,0.949396
6,0.1227,0.184283,0.9506,0.952144,0.950599,0.95087
7,0.1188,0.193327,0.9494,0.950068,0.949753,0.949259
8,0.1174,0.17831,0.952,0.953674,0.952257,0.952147
9,0.1159,0.196081,0.9458,0.948423,0.945923,0.946164
10,0.1148,0.17715,0.9546,0.955283,0.954733,0.954692


[I 2025-03-29 07:35:24,512] Trial 108 finished with value: 0.9546915735318633 and parameters: {'learning_rate': 0.0002972913741240456, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}. Best is trial 77 with value: 0.958376301709421.


Trial 109 with params: {'learning_rate': 0.0002333108575416201, 'weight_decay': 0.005, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3928,0.278353,0.9084,0.917571,0.907993,0.909094
2,0.1964,0.230272,0.9314,0.93484,0.931506,0.932048
3,0.1565,0.219665,0.9362,0.938225,0.936363,0.936311
4,0.1373,0.205555,0.9412,0.944288,0.941492,0.941564
5,0.1272,0.190753,0.9469,0.947744,0.946984,0.947018
6,0.1222,0.188218,0.9487,0.950234,0.948682,0.948953
7,0.1194,0.202285,0.9437,0.944504,0.944091,0.943562
8,0.1177,0.182378,0.9488,0.951032,0.948969,0.949111
9,0.1165,0.20853,0.9373,0.941122,0.937408,0.937654
10,0.1152,0.183209,0.9515,0.95221,0.951625,0.951566


[I 2025-03-29 07:47:51,917] Trial 109 finished with value: 0.9515655904344177 and parameters: {'learning_rate': 0.0002333108575416201, 'weight_decay': 0.005, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 77 with value: 0.958376301709421.


Trial 110 with params: {'learning_rate': 0.0009406379931133675, 'weight_decay': 0.007, 'warmup_steps': 26, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4058,0.360855,0.8693,0.886198,0.869388,0.870614
2,0.2502,0.276366,0.9106,0.917881,0.910773,0.911657
3,0.1965,0.233984,0.9284,0.929704,0.928603,0.928421
4,0.1668,0.226966,0.9281,0.930186,0.928446,0.928237
5,0.1442,0.197103,0.9465,0.946819,0.946623,0.946493
6,0.1304,0.19225,0.9469,0.94885,0.947064,0.947265


[I 2025-03-29 07:55:18,142] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.00037592472806980185, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3942,0.282773,0.9063,0.917004,0.905848,0.906949
2,0.2016,0.231759,0.9293,0.93356,0.929306,0.930082
3,0.1618,0.219761,0.9366,0.93733,0.93677,0.93653
4,0.1411,0.202204,0.9423,0.944807,0.942551,0.942521
5,0.1297,0.182663,0.9534,0.953843,0.95355,0.953507
6,0.1231,0.180594,0.9532,0.954014,0.953322,0.953379
7,0.1193,0.194704,0.9455,0.946207,0.945887,0.945265
8,0.1172,0.177076,0.9542,0.956011,0.9544,0.954482
9,0.1158,0.19794,0.9444,0.947466,0.944437,0.944635
10,0.1146,0.174118,0.9565,0.956935,0.956674,0.956519


[I 2025-03-29 08:07:42,892] Trial 111 finished with value: 0.9565190131356138 and parameters: {'learning_rate': 0.00037592472806980185, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 77 with value: 0.958376301709421.


Trial 112 with params: {'learning_rate': 0.0003946079794552991, 'weight_decay': 0.007, 'warmup_steps': 32, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3963,0.273435,0.9102,0.917934,0.909891,0.910627
2,0.2025,0.235459,0.9282,0.93301,0.928193,0.929273
3,0.1617,0.22167,0.9355,0.936703,0.935647,0.935546
4,0.1419,0.205493,0.9418,0.943984,0.942014,0.941871
5,0.1296,0.187031,0.9508,0.951232,0.950903,0.950873
6,0.1229,0.184069,0.9515,0.952547,0.951548,0.951731
7,0.1194,0.195626,0.9471,0.947727,0.94744,0.946884
8,0.1173,0.176072,0.9531,0.954737,0.953298,0.953336
9,0.1157,0.193962,0.9465,0.948889,0.946583,0.946759
10,0.1145,0.178229,0.954,0.954428,0.954137,0.954047


[I 2025-03-29 08:20:04,263] Trial 112 finished with value: 0.9540472157845524 and parameters: {'learning_rate': 0.0003946079794552991, 'weight_decay': 0.007, 'warmup_steps': 32, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}. Best is trial 77 with value: 0.958376301709421.


Trial 113 with params: {'learning_rate': 0.00011072753842218547, 'weight_decay': 0.01, 'warmup_steps': 22, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.478,0.289426,0.9016,0.910031,0.901298,0.901967
2,0.2171,0.242275,0.9261,0.929876,0.926006,0.926992
3,0.1686,0.230232,0.9329,0.9343,0.932993,0.933039
4,0.1447,0.220414,0.9337,0.935635,0.933878,0.933807
5,0.1332,0.208513,0.9396,0.940786,0.939803,0.939798
6,0.1273,0.204034,0.9422,0.944329,0.942166,0.942572


[I 2025-03-29 08:27:32,076] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0003584330726500191, 'weight_decay': 0.005, 'warmup_steps': 22, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3891,0.286036,0.9029,0.913348,0.902707,0.903982
2,0.2005,0.231547,0.9311,0.935182,0.93114,0.932087
3,0.1607,0.21816,0.9334,0.934726,0.933525,0.933523
4,0.1404,0.203461,0.9394,0.942429,0.939632,0.939738
5,0.1289,0.183705,0.9524,0.953041,0.952517,0.952617
6,0.1227,0.182773,0.9499,0.95118,0.949939,0.95015
7,0.1193,0.200082,0.9443,0.945339,0.944714,0.944175
8,0.1174,0.176242,0.9541,0.955388,0.95437,0.95431
9,0.1158,0.194238,0.9473,0.949679,0.947351,0.947591
10,0.1147,0.174941,0.954,0.954436,0.954183,0.954058


[I 2025-03-29 08:39:53,717] Trial 114 finished with value: 0.9540579117223922 and parameters: {'learning_rate': 0.0003584330726500191, 'weight_decay': 0.005, 'warmup_steps': 22, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 77 with value: 0.958376301709421.


Trial 115 with params: {'learning_rate': 8.275178977772585e-05, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5335,0.307496,0.8965,0.905067,0.896424,0.896865
2,0.2367,0.255241,0.9208,0.924681,0.920736,0.921781
3,0.1836,0.241112,0.9243,0.926291,0.92435,0.924527


[I 2025-03-29 08:43:35,797] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0007425116889248565, 'weight_decay': 0.006, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3991,0.344042,0.8784,0.893512,0.878278,0.879193
2,0.233,0.261595,0.9186,0.924744,0.918796,0.919849
3,0.1828,0.243465,0.9231,0.925091,0.923345,0.923117
4,0.1562,0.211659,0.9375,0.938732,0.937797,0.937544
5,0.1378,0.191538,0.9482,0.949483,0.948263,0.94852
6,0.127,0.189493,0.9494,0.950547,0.949452,0.949629
7,0.1214,0.206673,0.9406,0.941358,0.941083,0.940243
8,0.1181,0.178896,0.9529,0.954334,0.953188,0.953115
9,0.1163,0.193665,0.9459,0.948694,0.945981,0.946197
10,0.1149,0.177008,0.9527,0.953071,0.952902,0.952748


[I 2025-03-29 08:56:02,043] Trial 116 finished with value: 0.9527483776488799 and parameters: {'learning_rate': 0.0007425116889248565, 'weight_decay': 0.006, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}. Best is trial 77 with value: 0.958376301709421.


Trial 117 with params: {'learning_rate': 0.0007562412080858317, 'weight_decay': 0.006, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3771,0.340531,0.8795,0.892708,0.879648,0.879945
2,0.2309,0.262415,0.9164,0.922219,0.916367,0.917453
3,0.1858,0.240958,0.9211,0.923034,0.921212,0.921364


[I 2025-03-29 08:59:45,949] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.00033494688314966297, 'weight_decay': 0.001, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3638,0.278735,0.911,0.918867,0.910641,0.911493
2,0.1973,0.239196,0.9268,0.932271,0.926757,0.927864
3,0.1583,0.2206,0.9337,0.935236,0.933785,0.933769
4,0.139,0.216605,0.9329,0.935202,0.9332,0.932969
5,0.1291,0.188658,0.949,0.949828,0.949006,0.949154
6,0.1227,0.186211,0.947,0.948358,0.947021,0.94719


[I 2025-03-29 09:07:14,232] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.00027947153099807146, 'weight_decay': 0.004, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3652,0.27631,0.9098,0.91882,0.909424,0.910502
2,0.1933,0.236183,0.9277,0.934243,0.927656,0.929045
3,0.1543,0.221571,0.9325,0.934221,0.93263,0.932512
4,0.1371,0.204042,0.9413,0.943376,0.94152,0.941517
5,0.1275,0.188483,0.9486,0.94938,0.94882,0.948727
6,0.1219,0.186307,0.9489,0.950382,0.948903,0.949122
7,0.1188,0.197517,0.9439,0.94436,0.944331,0.943597
8,0.1171,0.179204,0.9517,0.953568,0.951883,0.951966
9,0.1158,0.196634,0.9425,0.945635,0.942578,0.942845
10,0.1148,0.178791,0.952,0.952471,0.952168,0.952009


[I 2025-03-29 09:19:43,484] Trial 119 finished with value: 0.9520090620549686 and parameters: {'learning_rate': 0.00027947153099807146, 'weight_decay': 0.004, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 77 with value: 0.958376301709421.


Trial 120 with params: {'learning_rate': 0.00035449532397097307, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3658,0.28359,0.9044,0.915775,0.90416,0.905601
2,0.198,0.227676,0.9331,0.936524,0.933141,0.933999
3,0.1583,0.217002,0.9365,0.937491,0.936678,0.936542
4,0.1414,0.206042,0.9408,0.942932,0.940977,0.941021
5,0.1294,0.18424,0.9509,0.951579,0.951048,0.951079
6,0.1226,0.18565,0.9503,0.951781,0.950395,0.950645
7,0.119,0.194623,0.948,0.948852,0.948426,0.947823
8,0.117,0.176579,0.9526,0.954626,0.952876,0.952822
9,0.1156,0.195652,0.9453,0.947945,0.945383,0.945583
10,0.1146,0.175281,0.9546,0.955088,0.954769,0.954678


[I 2025-03-29 09:32:12,516] Trial 120 finished with value: 0.9546783765112808 and parameters: {'learning_rate': 0.00035449532397097307, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 77 with value: 0.958376301709421.


Trial 121 with params: {'learning_rate': 0.0005476751340124934, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3902,0.29647,0.9016,0.908107,0.901685,0.901807
2,0.2149,0.258696,0.9189,0.926427,0.918942,0.920275
3,0.1713,0.215968,0.9349,0.935566,0.935135,0.934933
4,0.1482,0.20438,0.9407,0.942803,0.940728,0.940867
5,0.134,0.186614,0.9498,0.950361,0.949989,0.949942
6,0.1245,0.183828,0.9523,0.953722,0.952438,0.95259
7,0.1198,0.193975,0.947,0.947443,0.947434,0.946799
8,0.1176,0.174163,0.9563,0.957666,0.956543,0.95645
9,0.1156,0.190427,0.9472,0.949977,0.947306,0.947526
10,0.1146,0.173692,0.958,0.958831,0.9582,0.958105


[I 2025-03-29 09:44:41,406] Trial 121 finished with value: 0.958105087068556 and parameters: {'learning_rate': 0.0005476751340124934, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 77 with value: 0.958376301709421.


Trial 122 with params: {'learning_rate': 0.001012295692031754, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4162,0.364503,0.8598,0.878138,0.859925,0.85977
2,0.2567,0.295144,0.8973,0.907695,0.897396,0.899459
3,0.2028,0.255298,0.9162,0.918095,0.916463,0.916151


[I 2025-03-29 09:48:25,966] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0002038060977002188, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4237,0.281765,0.9065,0.916112,0.906027,0.90725
2,0.1983,0.235488,0.9302,0.934713,0.93015,0.931052
3,0.1553,0.218083,0.9376,0.938792,0.937739,0.937631
4,0.1369,0.202644,0.9419,0.943593,0.942038,0.942118
5,0.1278,0.191158,0.9479,0.948877,0.948014,0.94808
6,0.1225,0.191563,0.9476,0.949181,0.947648,0.947821


[I 2025-03-29 09:55:55,106] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0005784825928334047, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3966,0.302493,0.894,0.905284,0.893852,0.895076
2,0.2175,0.258941,0.918,0.924627,0.917935,0.919514
3,0.1743,0.209035,0.9405,0.940754,0.940692,0.940475
4,0.1502,0.20212,0.9434,0.944816,0.943459,0.943778
5,0.1341,0.189858,0.9488,0.949657,0.948954,0.948971
6,0.1251,0.182523,0.9521,0.953146,0.952086,0.952275
7,0.1204,0.191438,0.9497,0.95003,0.950107,0.949538
8,0.1176,0.178138,0.9533,0.954902,0.953586,0.953414
9,0.1157,0.196318,0.9434,0.947053,0.943439,0.943771
10,0.1145,0.172242,0.958,0.958259,0.958165,0.958047


[I 2025-03-29 10:08:25,749] Trial 124 finished with value: 0.9580472449550796 and parameters: {'learning_rate': 0.0005784825928334047, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 77 with value: 0.958376301709421.


Trial 125 with params: {'learning_rate': 0.000622847035190596, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3967,0.302856,0.8945,0.903665,0.894468,0.895176
2,0.2233,0.26767,0.9144,0.922738,0.91456,0.915889
3,0.1772,0.227745,0.9322,0.932966,0.932498,0.932027
4,0.152,0.211977,0.9376,0.939528,0.937965,0.937732
5,0.1335,0.198657,0.9421,0.943544,0.942383,0.942253
6,0.1252,0.1894,0.9478,0.949262,0.947965,0.947992


[I 2025-03-29 10:15:55,247] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.0008945950030633548, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4044,0.357355,0.8644,0.881995,0.863975,0.864362
2,0.2467,0.264494,0.9139,0.920031,0.914021,0.915245
3,0.1946,0.244412,0.923,0.925076,0.923357,0.9228


[I 2025-03-29 10:19:38,354] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.00038677462003669424, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3499,0.268771,0.9124,0.920122,0.912112,0.913048
2,0.2004,0.240391,0.9267,0.932754,0.926597,0.927923
3,0.1606,0.215769,0.939,0.940376,0.939095,0.939055
4,0.1424,0.20006,0.9445,0.946063,0.944642,0.944677
5,0.1295,0.188391,0.9492,0.95,0.949374,0.94931
6,0.1227,0.183159,0.9522,0.953527,0.952267,0.952487
7,0.1191,0.194452,0.9481,0.948547,0.948522,0.947842
8,0.1172,0.178104,0.9516,0.953538,0.951805,0.95189
9,0.1156,0.189498,0.9484,0.950588,0.948453,0.948654
10,0.1145,0.175074,0.955,0.955582,0.95511,0.955079


[I 2025-03-29 10:32:00,770] Trial 127 finished with value: 0.9550788533122525 and parameters: {'learning_rate': 0.00038677462003669424, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 3.0}. Best is trial 77 with value: 0.958376301709421.


Trial 128 with params: {'learning_rate': 0.0006195089364284436, 'weight_decay': 0.006, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3637,0.313791,0.8885,0.901117,0.888422,0.889254
2,0.2199,0.252543,0.9176,0.923626,0.917667,0.918899
3,0.1772,0.226771,0.9305,0.931915,0.93078,0.930707
4,0.1515,0.217682,0.9351,0.938296,0.935321,0.935415
5,0.1338,0.192438,0.9478,0.948659,0.947902,0.947855
6,0.1251,0.189564,0.9477,0.949157,0.947651,0.947888


[I 2025-03-29 10:39:30,111] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.000128606050919097, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4394,0.282858,0.9062,0.914672,0.905906,0.906748
2,0.2092,0.235844,0.9296,0.932617,0.929491,0.930278
3,0.1627,0.229186,0.9324,0.934028,0.9324,0.932477
4,0.1416,0.213333,0.9356,0.937567,0.935772,0.935773
5,0.1309,0.201903,0.9423,0.943512,0.942472,0.942507
6,0.1253,0.200337,0.9434,0.945436,0.943377,0.943758


[I 2025-03-29 10:46:59,955] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0005247532500999952, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3925,0.284695,0.9039,0.913985,0.903538,0.904696
2,0.2143,0.248485,0.9225,0.928737,0.922367,0.924036
3,0.1732,0.216373,0.935,0.935862,0.935258,0.935109
4,0.1479,0.211016,0.9383,0.94132,0.938483,0.938769
5,0.1338,0.193586,0.9452,0.946238,0.945398,0.94543
6,0.1246,0.1864,0.9493,0.951492,0.949266,0.949697


[I 2025-03-29 10:54:25,746] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0002688056730444472, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3678,0.270019,0.9105,0.918658,0.910024,0.911017
2,0.1938,0.224204,0.9333,0.936817,0.933432,0.933983
3,0.1543,0.231042,0.9287,0.93095,0.92887,0.928684
4,0.1369,0.198292,0.9448,0.946411,0.944926,0.945084
5,0.1273,0.186892,0.9492,0.950224,0.949387,0.949356
6,0.1219,0.187308,0.9493,0.951391,0.949184,0.949674
7,0.1191,0.198826,0.945,0.945659,0.945409,0.944701
8,0.1174,0.180441,0.9523,0.954441,0.952453,0.952528
9,0.1161,0.200005,0.9438,0.947093,0.94385,0.944236
10,0.1149,0.178764,0.9529,0.953498,0.953024,0.952945


[I 2025-03-29 11:06:47,814] Trial 131 finished with value: 0.9529448298242341 and parameters: {'learning_rate': 0.0002688056730444472, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6, 'lambda_param': 0.5, 'temperature': 3.0}. Best is trial 77 with value: 0.958376301709421.


Trial 132 with params: {'learning_rate': 0.0007103661334718447, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3967,0.323139,0.8778,0.892358,0.877858,0.878165
2,0.2301,0.253078,0.9224,0.926693,0.92263,0.923115
3,0.1831,0.250298,0.9215,0.923967,0.921697,0.921588


[I 2025-03-29 11:10:33,125] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.00048218724740500606, 'weight_decay': 0.003, 'warmup_steps': 7, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.359,0.284929,0.9057,0.913337,0.905644,0.906125
2,0.2081,0.248017,0.9266,0.933293,0.926646,0.927957
3,0.1669,0.215643,0.9393,0.940356,0.939541,0.939393
4,0.1462,0.200998,0.946,0.947343,0.946184,0.946136
5,0.1321,0.186497,0.9495,0.95028,0.949672,0.949667
6,0.124,0.182985,0.9516,0.952766,0.95163,0.951821
7,0.1197,0.193353,0.949,0.950074,0.949402,0.948976
8,0.1171,0.174745,0.9541,0.955877,0.954328,0.954339
9,0.1158,0.196269,0.9467,0.94965,0.946746,0.94701
10,0.1145,0.173001,0.9566,0.957175,0.956716,0.956681


[I 2025-03-29 11:23:02,409] Trial 133 finished with value: 0.9566807888373188 and parameters: {'learning_rate': 0.00048218724740500606, 'weight_decay': 0.003, 'warmup_steps': 7, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}. Best is trial 77 with value: 0.958376301709421.


Trial 134 with params: {'learning_rate': 0.00035742600554073794, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3656,0.26849,0.9118,0.918868,0.911433,0.912098
2,0.1987,0.247381,0.9219,0.92791,0.921813,0.923089
3,0.1584,0.21759,0.9359,0.93693,0.936001,0.935909
4,0.1398,0.203079,0.9425,0.944449,0.942687,0.942723
5,0.1295,0.192547,0.9464,0.94692,0.946625,0.946332
6,0.1225,0.184424,0.9491,0.950644,0.949057,0.949475


[I 2025-03-29 11:30:33,061] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0009115106846671904, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3802,0.336839,0.8772,0.887716,0.877434,0.877313
2,0.2441,0.264008,0.9127,0.917813,0.912839,0.913473
3,0.1946,0.23743,0.9248,0.926147,0.925108,0.924961
4,0.1636,0.2172,0.9332,0.935343,0.93341,0.933471
5,0.1416,0.195108,0.9475,0.948036,0.947654,0.947618
6,0.1288,0.191646,0.9459,0.947284,0.946065,0.946158


[I 2025-03-29 11:38:01,832] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0005586267671796526, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3542,0.309622,0.8926,0.903896,0.892536,0.893483
2,0.2133,0.241665,0.9238,0.928417,0.923937,0.924863
3,0.1702,0.238883,0.9234,0.925286,0.923676,0.923565


[I 2025-03-29 11:41:43,949] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0007162886139454726, 'weight_decay': 0.002, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3738,0.319593,0.8871,0.897338,0.887224,0.887404
2,0.2292,0.246107,0.9226,0.924815,0.922743,0.922935
3,0.1819,0.227546,0.9293,0.930565,0.929438,0.929252
4,0.154,0.210051,0.9375,0.939378,0.937609,0.937678
5,0.1372,0.196965,0.9454,0.946507,0.94563,0.945608
6,0.1259,0.190086,0.9496,0.950549,0.949596,0.949742
7,0.1203,0.198361,0.9451,0.946059,0.945501,0.944914
8,0.118,0.177266,0.9538,0.955372,0.954009,0.953956
9,0.1162,0.195641,0.9449,0.948179,0.944948,0.945188
10,0.1148,0.175593,0.9541,0.954486,0.954275,0.954158


[I 2025-03-29 11:54:07,286] Trial 137 finished with value: 0.9541580109709882 and parameters: {'learning_rate': 0.0007162886139454726, 'weight_decay': 0.002, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 77 with value: 0.958376301709421.


Trial 138 with params: {'learning_rate': 0.0007395027074918284, 'weight_decay': 0.001, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3798,0.325206,0.8802,0.894153,0.880294,0.880701
2,0.2301,0.254238,0.9196,0.924032,0.919589,0.920427
3,0.185,0.249835,0.9207,0.923147,0.920981,0.920755
4,0.1547,0.216478,0.934,0.936931,0.934236,0.934457
5,0.1382,0.190869,0.9484,0.949173,0.948505,0.948629
6,0.1271,0.192132,0.9447,0.946061,0.944832,0.944898


[I 2025-03-29 12:01:36,848] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.000664502063544883, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3644,0.31374,0.8916,0.903747,0.891479,0.892559
2,0.2225,0.256599,0.9217,0.925798,0.921818,0.922763
3,0.1794,0.232144,0.9282,0.929717,0.928423,0.928245


[I 2025-03-29 12:05:21,294] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0007553375538609445, 'weight_decay': 0.004, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3759,0.331926,0.882,0.895433,0.881864,0.882546
2,0.2298,0.246584,0.9266,0.932114,0.926521,0.927799
3,0.1844,0.23875,0.9249,0.926585,0.925307,0.924909
4,0.1569,0.217286,0.9333,0.935759,0.933455,0.933479
5,0.1382,0.198632,0.9447,0.945368,0.944828,0.944664
6,0.1262,0.184814,0.9504,0.951716,0.950451,0.950681
7,0.1209,0.198846,0.9436,0.944361,0.944026,0.943299
8,0.1184,0.177004,0.9535,0.955025,0.953754,0.953653
9,0.1163,0.193209,0.9449,0.94818,0.944998,0.945189
10,0.115,0.175366,0.9543,0.954992,0.954501,0.954322


[I 2025-03-29 12:17:52,238] Trial 140 finished with value: 0.9543223304120257 and parameters: {'learning_rate': 0.0007553375538609445, 'weight_decay': 0.004, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 77 with value: 0.958376301709421.


Trial 141 with params: {'learning_rate': 0.004192393259359305, 'weight_decay': 0.008, 'warmup_steps': 29, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5784,0.463699,0.8093,0.826585,0.809066,0.808644
2,0.3743,0.363353,0.8639,0.873852,0.86378,0.865391
3,0.3001,0.351101,0.8645,0.873827,0.864546,0.864344


[I 2025-03-29 12:21:37,777] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0002440308302449549, 'weight_decay': 0.004, 'warmup_steps': 27, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4086,0.281962,0.9064,0.917145,0.906022,0.90731
2,0.198,0.231679,0.9294,0.934047,0.929445,0.930374
3,0.156,0.218708,0.935,0.936739,0.93511,0.935087
4,0.1372,0.206803,0.9391,0.941013,0.939407,0.939153
5,0.1282,0.185181,0.9503,0.950996,0.950437,0.950469
6,0.1226,0.188645,0.9474,0.949364,0.947417,0.947761


[I 2025-03-29 12:29:07,601] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.00034546596858026036, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.402,0.280114,0.9065,0.916348,0.906354,0.90695
2,0.2,0.234534,0.9285,0.932685,0.928557,0.929334
3,0.16,0.221731,0.9343,0.935726,0.934543,0.934247
4,0.1407,0.207655,0.9379,0.941705,0.938019,0.938385
5,0.1279,0.187358,0.9499,0.95086,0.950014,0.950139
6,0.1224,0.186325,0.9484,0.950078,0.948459,0.948685


[I 2025-03-29 12:36:37,503] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.0012094596394122523, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3984,0.362841,0.8655,0.881162,0.865466,0.866201
2,0.2673,0.298824,0.8984,0.909775,0.898526,0.899894
3,0.2097,0.23874,0.9249,0.926086,0.925071,0.924998
4,0.1736,0.227973,0.9268,0.929419,0.927091,0.927046
5,0.1482,0.214435,0.9345,0.935874,0.934642,0.934656
6,0.1337,0.200108,0.9405,0.942293,0.940649,0.940837


[I 2025-03-29 12:44:05,331] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0003697085910572267, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3448,0.280583,0.905,0.914688,0.904716,0.905877
2,0.1994,0.241536,0.9246,0.931446,0.924659,0.925968
3,0.1583,0.223579,0.9331,0.934701,0.933338,0.933128
4,0.1411,0.208925,0.9396,0.942529,0.93984,0.93971
5,0.1288,0.184731,0.9502,0.951261,0.950393,0.950322
6,0.1233,0.187459,0.9493,0.951184,0.949301,0.94968


[I 2025-03-29 12:51:30,862] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0003813943101237863, 'weight_decay': 0.007, 'warmup_steps': 25, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3899,0.27737,0.9103,0.917061,0.910307,0.91082
2,0.2032,0.235891,0.9303,0.935016,0.930307,0.931223
3,0.1612,0.217255,0.9359,0.937046,0.935886,0.935944
4,0.1412,0.205223,0.9435,0.945134,0.943647,0.943641
5,0.1294,0.184886,0.9498,0.95054,0.949815,0.949922
6,0.1229,0.18694,0.9471,0.948987,0.94708,0.947408


[I 2025-03-29 12:58:55,000] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.0005310036448843375, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3543,0.280671,0.9073,0.914219,0.907316,0.90777
2,0.2101,0.243113,0.924,0.928868,0.924088,0.925234
3,0.1691,0.232512,0.9274,0.928679,0.927604,0.927386


[I 2025-03-29 13:02:40,007] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.00020830385717754516, 'weight_decay': 0.002, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4254,0.275556,0.9087,0.917378,0.908306,0.909287
2,0.1985,0.233298,0.9305,0.93537,0.930506,0.931398
3,0.1568,0.222667,0.9338,0.935504,0.933864,0.933911
4,0.1369,0.2054,0.9407,0.942808,0.940845,0.940818
5,0.1282,0.190829,0.9493,0.949988,0.949493,0.949372
6,0.1228,0.193715,0.9481,0.950061,0.948048,0.948407


[I 2025-03-29 13:10:09,680] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0003074746287433143, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3676,0.284001,0.9062,0.917131,0.905908,0.907322
2,0.197,0.238982,0.9287,0.93422,0.928727,0.92979
3,0.157,0.220615,0.9349,0.936435,0.935003,0.934949
4,0.1389,0.200463,0.9422,0.943372,0.942405,0.942269
5,0.1285,0.186519,0.9498,0.950317,0.949866,0.949829
6,0.1226,0.185698,0.9503,0.951754,0.950239,0.950577
7,0.1192,0.199788,0.9461,0.946894,0.946465,0.945954
8,0.1172,0.179723,0.9511,0.953511,0.951299,0.951453
9,0.1159,0.198068,0.9418,0.94465,0.94183,0.942149
10,0.1147,0.177481,0.9529,0.953299,0.952979,0.952929


[I 2025-03-29 13:22:38,712] Trial 149 finished with value: 0.9529290857790734 and parameters: {'learning_rate': 0.0003074746287433143, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.0}. Best is trial 77 with value: 0.958376301709421.


In [50]:
print(best_distil_pretrained)

BestRun(run_id='77', objective=0.958376301709421, hyperparameters={'learning_rate': 0.00041679053444085474, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}, run_summary=None)


In [None]:
print("Best random init training score: ", best_base_random)
print("Best random init distilation trianing score: ", best_distill_random)
print("Best pretrained (head only) training score: ", best_base_head)
print("Best pretrained distilation (head only) training score: ",best_distill_head)
print("Best pretrained training score: ", best_base_pretrained)
print("Best pretrained distilation training score: ", best_distil_pretrained)