# Prohledávání hyperparametrů pro model MobileNetV2 nad datasetem CIFAR100

Tento notebook slouží k nalezení optimálních hyperparametrů nad datasetem CIFAR100 pro model MobileNetV2. Hyperparametry jsou hledány pro všechny varianty modelu (náhodně inicializovaný, předtrénovaný (doučení klasifikační hlavy) a předtrénovaný (doučení celého modelu)). Pro každou z variant jsou hledány hyperparametry pro normální trénink a trénink s destilací. 

K prohledávání je využito knihovny Optuna s algoritmem Hyperband. Nejlepší konfigurace je volena na základě F1-skóre, zkoušeno je 150 kombinací hyperparametrů pro každou variantu modelu. 

## Import knihoven a základní nastavení

In [16]:
from transformers import Trainer
import optuna
import torch
import math
import base
import os

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [17]:
dataset_part = base.get_dataset_part()

Resetování náhodného seedu pro replikovatelnost výsledků.

In [18]:
base.reset_seed()

Ověření dostupnosti GPU.

In [19]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Načtení datasetu a aplikace základních transformací.

In [20]:
DATASET = "cifar100"

In [21]:
transform = base.base_transforms()

train = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.TRAIN, transform=transform, device="cpu")
eval = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.EVAL, transform=transform, device="cpu")
test = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.TEST, transform=transform, device="cpu")

Základní konfigurace tréninku během prohledávání. Optuna nepracuje s epochami, ale s kroky. Níže je prováděn přepočet. 

Minimální délka tréninku jsou tři epochy, maximální deset epoch. Maximální počet kroků pro warm up je nastaven na 10 % první epochy.

In [23]:
num_epochs = 10
batch_size = 128

In [24]:
data_length = len(train)
min_r = math.ceil(data_length/batch_size)*3
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

## Prohledávání s normálním tréninkem náhodně inicializovaného modelu
Definice hledaných hyperparametrů a jejich rozmezí.

In [25]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [26]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [27]:
base.reset_seed()

Konfigurace jednotlivých tréninků.

In [28]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random_hp-search", logging_dir=f"~/logs/{DATASET}/random_hp-search", epochs=num_epochs, batch_size=batch_size)

Konfigurace trenéra pro jednotlivé tréninky. 

In [29]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_random_init_mobilenet(100)
)
  

Nastavení prohledávání.

In [None]:
best_base_random = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-random",
    n_trials=150
)

[I 2025-03-25 07:19:45,527] A new study created in memory with name: Base-random


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2192,3.57719,0.1366,0.12091,0.1366,0.099373
2,3.5074,3.112422,0.2284,0.225727,0.2284,0.204657
3,3.0268,2.673009,0.3094,0.299968,0.3094,0.282376


[I 2025-03-25 07:23:35,183] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1631,3.784604,0.1031,0.082409,0.1031,0.075664
2,3.5828,3.218259,0.1923,0.205883,0.1923,0.170475
3,3.1675,2.835596,0.2652,0.267351,0.2652,0.23408
4,2.8228,2.631736,0.3067,0.323464,0.3067,0.283624
5,2.5424,2.311903,0.3792,0.376757,0.3792,0.361927
6,2.2794,2.181519,0.4062,0.413904,0.4062,0.396003


[I 2025-03-25 07:31:13,095] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.5316,3.999748,0.0881,0.056027,0.0881,0.049563
2,4.0442,3.69048,0.134,0.12059,0.134,0.102269
3,3.7668,3.464325,0.1735,0.169984,0.1735,0.142094


[I 2025-03-25 07:35:03,228] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2838,3.978669,0.0619,0.050403,0.0619,0.035123
2,3.8689,3.680361,0.1145,0.096394,0.1145,0.085115
3,3.5496,3.223945,0.1947,0.179345,0.1947,0.163495
4,3.2069,3.018932,0.2286,0.227935,0.2286,0.200975
5,2.9212,2.701042,0.291,0.29679,0.291,0.272037
6,2.6856,2.52209,0.3306,0.336573,0.3306,0.317471


[I 2025-03-25 07:42:40,442] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3418,4.096611,0.0557,0.027712,0.0557,0.025578
2,3.9986,3.883059,0.0847,0.079342,0.0847,0.056827
3,3.7731,3.563287,0.1325,0.129635,0.1325,0.103761


[I 2025-03-25 07:46:31,018] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.4083,3.833461,0.105,0.088368,0.105,0.069086
2,3.8389,3.460798,0.1741,0.170917,0.1741,0.146028
3,3.4605,3.114818,0.2346,0.215941,0.2346,0.201054
4,3.166,2.967304,0.2594,0.262125,0.2594,0.232116
5,2.9177,2.669131,0.3226,0.315499,0.3226,0.299253
6,2.6904,2.563832,0.3372,0.329941,0.3372,0.314583


[I 2025-03-25 07:54:11,849] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2185,3.645595,0.1163,0.121932,0.1163,0.081641
2,3.5429,3.139251,0.2229,0.214803,0.2229,0.19762
3,3.0445,2.724133,0.2995,0.296493,0.2995,0.271772
4,2.6481,2.514017,0.3377,0.364462,0.3377,0.318225
5,2.335,2.218957,0.4046,0.408027,0.4046,0.388073
6,2.0475,2.05649,0.4419,0.440921,0.4419,0.431969


[I 2025-03-25 08:02:00,206] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.4586,3.894554,0.0982,0.066854,0.0982,0.05825
2,3.9239,3.567949,0.1529,0.142144,0.1529,0.124491
3,3.5676,3.228167,0.2135,0.205379,0.2135,0.179384
4,3.2809,3.049497,0.2474,0.253877,0.2474,0.218195
5,3.049,2.803723,0.2914,0.28198,0.2914,0.268283
6,2.8388,2.695434,0.3087,0.295258,0.3087,0.284095


[I 2025-03-25 08:09:43,769] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1816,3.622636,0.1216,0.103932,0.1216,0.086354
2,3.4638,3.099783,0.2192,0.246984,0.2192,0.199445
3,2.9733,2.639677,0.31,0.301788,0.31,0.284625
4,2.5902,2.470576,0.3364,0.358158,0.3364,0.317785
5,2.2868,2.187423,0.4052,0.408513,0.4052,0.390923
6,2.0088,2.018322,0.4528,0.460703,0.4528,0.444621
7,1.7621,1.934499,0.473,0.476693,0.473,0.464541
8,1.5125,1.955435,0.4721,0.485117,0.4721,0.467014
9,1.298,1.976064,0.4729,0.482549,0.4729,0.464884
10,1.122,1.846891,0.4997,0.50204,0.4997,0.496453


[I 2025-03-25 08:22:44,499] Trial 8 finished with value: 0.4964529221015204 and parameters: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 8 with value: 0.4964529221015204.


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1533,3.672081,0.1143,0.099528,0.1143,0.078995
2,3.4801,3.064983,0.2298,0.246843,0.2298,0.209389
3,2.9659,2.629022,0.311,0.308717,0.311,0.282628
4,2.5904,2.466996,0.343,0.372966,0.343,0.326061
5,2.2998,2.172494,0.4134,0.422145,0.4134,0.40201
6,2.0266,2.032017,0.4447,0.452022,0.4447,0.436792


[I 2025-03-25 08:30:29,992] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 6.888788881730778e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.5091,4.022349,0.0854,0.072281,0.0854,0.048677
2,4.0398,3.668852,0.1384,0.125196,0.1384,0.1055
3,3.7359,3.411476,0.1762,0.156706,0.1762,0.14113


[I 2025-03-25 08:34:21,736] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.00043251957183315523, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1723,3.589748,0.1307,0.112396,0.1307,0.092513
2,3.4371,3.029595,0.232,0.239607,0.232,0.209679
3,2.9451,2.618134,0.3134,0.302538,0.3134,0.28517
4,2.5824,2.476893,0.3448,0.374251,0.3448,0.329362
5,2.2909,2.172251,0.4155,0.418903,0.4155,0.40012
6,2.0161,2.025577,0.4445,0.44365,0.4445,0.436214
7,1.773,1.983884,0.4573,0.463374,0.4573,0.449643
8,1.5217,1.97438,0.4649,0.470235,0.4649,0.456811
9,1.3,2.037655,0.4576,0.462469,0.4576,0.448659
10,1.1165,1.917618,0.4791,0.485505,0.4791,0.476282


[I 2025-03-25 08:47:13,131] Trial 11 finished with value: 0.47628226185075745 and parameters: {'learning_rate': 0.00043251957183315523, 'weight_decay': 0.007, 'warmup_steps': 8}. Best is trial 8 with value: 0.4964529221015204.


Trial 12 with params: {'learning_rate': 0.00044436992445722666, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1593,3.608943,0.122,0.106401,0.122,0.088234
2,3.4674,3.079344,0.2252,0.23949,0.2252,0.204859
3,2.9585,2.601232,0.316,0.313259,0.316,0.291588
4,2.5801,2.483591,0.3365,0.35774,0.3365,0.318102
5,2.2806,2.158807,0.4181,0.416468,0.4181,0.40249
6,2.0026,2.027784,0.4478,0.453065,0.4478,0.442965
7,1.7555,1.968889,0.4605,0.46797,0.4605,0.454114
8,1.5161,1.986393,0.4689,0.479869,0.4689,0.46188
9,1.3004,2.0215,0.4651,0.469796,0.4651,0.455997
10,1.1209,1.91413,0.4819,0.492889,0.4819,0.479861


[I 2025-03-25 09:00:04,205] Trial 12 finished with value: 0.4798613059827639 and parameters: {'learning_rate': 0.00044436992445722666, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11}. Best is trial 8 with value: 0.4964529221015204.


Trial 13 with params: {'learning_rate': 0.002846234086703027, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3089,4.034875,0.0581,0.025079,0.0581,0.026697
2,3.9944,3.826683,0.0884,0.05677,0.0884,0.05638
3,3.7735,3.514061,0.14,0.115189,0.14,0.105334
4,3.5004,3.340209,0.1633,0.159083,0.1633,0.133188
5,3.264,3.072065,0.2171,0.208756,0.2171,0.190424
6,3.0544,2.861221,0.2613,0.250688,0.2613,0.237407


[I 2025-03-25 09:07:46,386] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0002497872291982773, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2238,3.60787,0.1328,0.13104,0.1328,0.100087
2,3.5336,3.103951,0.2277,0.22838,0.2277,0.2035
3,3.0572,2.74019,0.2974,0.295265,0.2974,0.267903


[I 2025-03-25 09:11:36,094] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0005866060096853441, 'weight_decay': 0.01, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1948,3.695865,0.1051,0.111474,0.1051,0.076399
2,3.562,3.191486,0.1992,0.221739,0.1992,0.174428
3,3.0845,2.729572,0.2865,0.280016,0.2865,0.2599
4,2.7032,2.646632,0.3119,0.345771,0.3119,0.292164
5,2.4026,2.223484,0.4001,0.40954,0.4001,0.386238
6,2.13,2.090442,0.4299,0.440577,0.4299,0.421594
7,1.8905,1.976931,0.4575,0.46143,0.4575,0.449023
8,1.648,1.976437,0.4646,0.473221,0.4646,0.456995
9,1.4451,1.992057,0.4651,0.471739,0.4651,0.456928
10,1.2694,1.867616,0.4876,0.491233,0.4876,0.484637


[I 2025-03-25 09:24:25,927] Trial 15 finished with value: 0.48463721766930623 and parameters: {'learning_rate': 0.0005866060096853441, 'weight_decay': 0.01, 'warmup_steps': 14}. Best is trial 8 with value: 0.4964529221015204.


Trial 16 with params: {'learning_rate': 8.85713447869134e-05, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.4644,3.939008,0.0973,0.072978,0.0973,0.060436
2,3.9397,3.569372,0.1518,0.14193,0.1518,0.122642
3,3.581,3.238741,0.2091,0.192168,0.2091,0.17563


[I 2025-03-25 09:28:15,857] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0027741252472900674, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3497,4.108705,0.0579,0.033945,0.0579,0.029759
2,4.0195,3.904187,0.0868,0.063311,0.0868,0.055642
3,3.8053,3.591103,0.1348,0.107546,0.1348,0.101829
4,3.5811,3.406126,0.1636,0.15695,0.1636,0.132753
5,3.3542,3.146435,0.2153,0.204009,0.2153,0.184371
6,3.1231,2.945049,0.2483,0.240115,0.2483,0.22324


[I 2025-03-25 09:35:56,021] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0001044907148504563, 'weight_decay': 0.006, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.4298,3.827905,0.1037,0.073678,0.1037,0.064574
2,3.8492,3.46716,0.1656,0.17318,0.1656,0.136434
3,3.4963,3.179035,0.2234,0.21165,0.2234,0.191949
4,3.2316,3.028168,0.2539,0.255468,0.2539,0.228003
5,3.0023,2.791693,0.2909,0.282517,0.2909,0.266206
6,2.803,2.659736,0.3227,0.305728,0.3227,0.297374


[I 2025-03-25 09:43:38,153] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0009213362135256124, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1685,3.720098,0.1001,0.084965,0.1001,0.069329
2,3.592,3.27588,0.1773,0.18902,0.1773,0.149088
3,3.1597,2.840306,0.262,0.243197,0.262,0.230457


[I 2025-03-25 09:47:28,448] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0007343282446025902, 'weight_decay': 0.006, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2105,3.72645,0.1042,0.077967,0.1042,0.070589
2,3.6062,3.239888,0.1914,0.20602,0.1914,0.166116
3,3.1778,2.820613,0.2648,0.264314,0.2648,0.23853
4,2.833,2.673186,0.2966,0.326838,0.2966,0.276358
5,2.5466,2.357242,0.3648,0.367409,0.3648,0.34785
6,2.2929,2.225081,0.4002,0.401768,0.4002,0.389653
7,2.066,2.083678,0.4389,0.43747,0.4389,0.429363
8,1.8361,2.11935,0.4334,0.44364,0.4334,0.42313
9,1.6335,2.097057,0.4517,0.4492,0.4517,0.440115
10,1.4599,1.961135,0.4737,0.478598,0.4737,0.46948


[I 2025-03-25 10:00:20,418] Trial 20 finished with value: 0.4694798079131437 and parameters: {'learning_rate': 0.0007343282446025902, 'weight_decay': 0.006, 'warmup_steps': 19}. Best is trial 8 with value: 0.4964529221015204.


Trial 21 with params: {'learning_rate': 0.00029905509826124127, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2012,3.63294,0.1214,0.116665,0.1214,0.084952
2,3.4611,3.047059,0.2374,0.239006,0.2374,0.213836
3,2.9551,2.633106,0.3105,0.309201,0.3105,0.285341
4,2.5746,2.499758,0.3392,0.372863,0.3392,0.321857
5,2.2819,2.17911,0.4155,0.41953,0.4155,0.40161
6,1.9931,2.055784,0.4399,0.438581,0.4399,0.429218


[I 2025-03-25 10:08:03,693] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0006973873232263633, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1825,3.720134,0.1118,0.108162,0.1118,0.078278
2,3.5932,3.268588,0.1882,0.200427,0.1882,0.160765
3,3.1253,2.781011,0.2805,0.278025,0.2805,0.254974
4,2.7529,2.690798,0.2985,0.331155,0.2985,0.280218
5,2.4716,2.296706,0.3806,0.387727,0.3806,0.364604
6,2.2236,2.16715,0.4127,0.415417,0.4127,0.399887
7,1.9895,2.043205,0.4476,0.44389,0.4476,0.438039
8,1.7629,2.082671,0.4437,0.461214,0.4437,0.436263
9,1.568,2.069434,0.4537,0.456226,0.4537,0.444648
10,1.3915,1.929693,0.4802,0.480093,0.4802,0.476239


[I 2025-03-25 10:20:58,018] Trial 22 finished with value: 0.4762391707166234 and parameters: {'learning_rate': 0.0006973873232263633, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}. Best is trial 8 with value: 0.4964529221015204.


Trial 23 with params: {'learning_rate': 0.0012461319045202197, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2284,3.861803,0.0848,0.065626,0.0848,0.05387
2,3.8102,3.653679,0.1202,0.114362,0.1202,0.094249
3,3.4765,3.190364,0.2043,0.185325,0.2043,0.173906


[I 2025-03-25 10:24:47,796] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.00034753780627072464, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1907,3.605184,0.1312,0.138824,0.1312,0.102024
2,3.4604,3.005793,0.2486,0.254518,0.2486,0.227205
3,2.9541,2.596359,0.3241,0.320146,0.3241,0.299076
4,2.5614,2.460671,0.3514,0.37754,0.3514,0.334629
5,2.2624,2.147791,0.4174,0.419905,0.4174,0.40534
6,1.9845,2.053774,0.4459,0.448018,0.4459,0.435218
7,1.7396,1.949588,0.4741,0.478462,0.4741,0.468099
8,1.4914,1.980219,0.4699,0.480225,0.4699,0.463931
9,1.2799,2.025383,0.4604,0.468387,0.4604,0.453034
10,1.1119,1.912591,0.4873,0.492589,0.4873,0.484859


[I 2025-03-25 10:37:40,559] Trial 24 finished with value: 0.48485893083447607 and parameters: {'learning_rate': 0.00034753780627072464, 'weight_decay': 0.01, 'warmup_steps': 11}. Best is trial 8 with value: 0.4964529221015204.


Trial 25 with params: {'learning_rate': 0.0027693395374376512, 'weight_decay': 0.0, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3198,4.041061,0.061,0.045425,0.061,0.032727
2,3.9987,3.880883,0.0882,0.06335,0.0882,0.057293
3,3.8274,3.63784,0.1274,0.098164,0.1274,0.095573


[I 2025-03-25 10:41:30,504] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.00015082020327338594, 'weight_decay': 0.01, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3639,3.751584,0.118,0.083514,0.118,0.080433
2,3.7313,3.317598,0.1929,0.199031,0.1929,0.166755
3,3.3051,2.968981,0.2593,0.24814,0.2593,0.229157
4,2.9654,2.752314,0.3016,0.309882,0.3016,0.278198
5,2.6748,2.500704,0.3523,0.352625,0.3523,0.334618
6,2.4308,2.369362,0.3808,0.375861,0.3808,0.364343


[I 2025-03-25 10:49:13,213] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.00021059103361382344, 'weight_decay': 0.001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3139,3.693389,0.1172,0.095333,0.1172,0.082469
2,3.6178,3.171372,0.2231,0.21708,0.2231,0.197721
3,3.1308,2.820118,0.2791,0.272602,0.2791,0.24854
4,2.7499,2.595208,0.3241,0.340641,0.3241,0.300296
5,2.4444,2.291058,0.3955,0.396622,0.3955,0.380209
6,2.1704,2.154239,0.4246,0.424419,0.4246,0.413626


[I 2025-03-25 10:56:54,566] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0005670047333836488, 'weight_decay': 0.01, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1796,3.677643,0.1123,0.110171,0.1123,0.077902
2,3.5405,3.156866,0.21,0.210719,0.21,0.184019
3,3.0648,2.705675,0.2891,0.282219,0.2891,0.26283
4,2.6845,2.563634,0.3205,0.337982,0.3205,0.299978
5,2.3984,2.232936,0.397,0.395241,0.397,0.378746
6,2.1235,2.080709,0.4307,0.434622,0.4307,0.421531


[I 2025-03-25 11:04:35,764] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0022568980984053887, 'weight_decay': 0.008, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3137,4.049115,0.0548,0.03123,0.0548,0.026045
2,3.9563,3.868723,0.0901,0.077016,0.0901,0.062023
3,3.6971,3.449677,0.1548,0.147554,0.1548,0.124435
4,3.4109,3.23617,0.1858,0.193691,0.1858,0.158773
5,3.1427,2.911306,0.2437,0.244319,0.2437,0.218951
6,2.8899,2.725893,0.2849,0.276071,0.2849,0.266


[I 2025-03-25 11:12:15,800] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.00019919417743893332, 'weight_decay': 0.006, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2696,3.671692,0.1209,0.122849,0.1209,0.087473
2,3.5862,3.190203,0.221,0.222155,0.221,0.193215
3,3.1346,2.797121,0.2879,0.275025,0.2879,0.257929
4,2.7728,2.578161,0.328,0.3425,0.328,0.307454
5,2.4665,2.337675,0.3832,0.386036,0.3832,0.367134
6,2.1954,2.180752,0.4137,0.411068,0.4137,0.399915
7,1.9696,2.10322,0.4396,0.437874,0.4396,0.429624
8,1.7535,2.117352,0.4343,0.443019,0.4343,0.426842
9,1.5797,2.149034,0.4332,0.433026,0.4332,0.421844
10,1.4408,2.057783,0.4505,0.453612,0.4505,0.446318


[I 2025-03-25 11:25:02,242] Trial 30 finished with value: 0.4463178443542486 and parameters: {'learning_rate': 0.00019919417743893332, 'weight_decay': 0.006, 'warmup_steps': 7}. Best is trial 8 with value: 0.4964529221015204.


Trial 31 with params: {'learning_rate': 0.0004698758736856694, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1592,3.62254,0.1236,0.107976,0.1236,0.090915
2,3.4705,3.004245,0.2365,0.251048,0.2365,0.215532
3,2.9538,2.584096,0.317,0.314684,0.317,0.29029
4,2.5728,2.48716,0.3406,0.366274,0.3406,0.323198
5,2.2843,2.174359,0.4161,0.417933,0.4161,0.401251
6,2.0152,2.031361,0.4464,0.452778,0.4464,0.438853
7,1.7894,1.948838,0.4707,0.472897,0.4707,0.461128
8,1.553,1.952991,0.4719,0.485103,0.4719,0.465579
9,1.3411,1.974977,0.4751,0.48357,0.4751,0.466536
10,1.1671,1.827275,0.5032,0.508099,0.5032,0.50045


[I 2025-03-25 11:37:56,614] Trial 31 finished with value: 0.500449702292892 and parameters: {'learning_rate': 0.0004698758736856694, 'weight_decay': 0.01, 'warmup_steps': 11}. Best is trial 31 with value: 0.500449702292892.


Trial 32 with params: {'learning_rate': 0.000701443652828122, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1825,3.73937,0.0944,0.076071,0.0944,0.063265
2,3.5896,3.210538,0.1907,0.210357,0.1907,0.160306
3,3.1291,2.810184,0.2679,0.257258,0.2679,0.237872
4,2.7958,2.707558,0.2867,0.304856,0.2867,0.263452
5,2.5198,2.305066,0.3768,0.377249,0.3768,0.359907
6,2.2706,2.142245,0.4181,0.42693,0.4181,0.409236
7,2.0335,2.052842,0.4489,0.449202,0.4489,0.43906
8,1.802,2.081754,0.4457,0.456905,0.4457,0.437238
9,1.6006,2.071903,0.4484,0.44946,0.4484,0.439458
10,1.4203,1.944709,0.4743,0.481058,0.4743,0.472917


[I 2025-03-25 11:50:49,796] Trial 32 finished with value: 0.4729165394198834 and parameters: {'learning_rate': 0.000701443652828122, 'weight_decay': 0.01, 'warmup_steps': 11}. Best is trial 31 with value: 0.500449702292892.


Trial 33 with params: {'learning_rate': 0.0003640216645586834, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2191,3.613508,0.1307,0.108261,0.1307,0.096022
2,3.5268,3.123767,0.225,0.232941,0.225,0.201534
3,3.0225,2.606758,0.3227,0.309911,0.3227,0.29571
4,2.612,2.483788,0.3436,0.360106,0.3436,0.325024
5,2.3,2.177111,0.4141,0.412399,0.4141,0.398057
6,2.0141,2.058282,0.4417,0.44977,0.4417,0.43408
7,1.7628,1.972242,0.4693,0.474749,0.4693,0.461201
8,1.5209,1.973655,0.4722,0.481093,0.4722,0.464918
9,1.307,2.001782,0.4677,0.470349,0.4677,0.458144
10,1.1365,1.88614,0.4862,0.48929,0.4862,0.482948


[I 2025-03-25 12:03:39,305] Trial 33 finished with value: 0.48294783964145865 and parameters: {'learning_rate': 0.0003640216645586834, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11}. Best is trial 31 with value: 0.500449702292892.


Trial 34 with params: {'learning_rate': 0.0003501523855276686, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1851,3.605482,0.1271,0.114193,0.1271,0.09357
2,3.466,3.068637,0.2263,0.236503,0.2263,0.207999
3,2.9501,2.592174,0.3198,0.314373,0.3198,0.29422
4,2.5534,2.401207,0.3658,0.388374,0.3658,0.347761
5,2.2296,2.136181,0.426,0.429511,0.426,0.413218
6,1.947,1.987195,0.4604,0.466655,0.4604,0.450639
7,1.7005,1.937783,0.475,0.48025,0.475,0.467185
8,1.4528,1.955055,0.4783,0.484089,0.4783,0.470904
9,1.2389,1.985237,0.4729,0.481038,0.4729,0.465359
10,1.0695,1.884566,0.4907,0.496931,0.4907,0.488


[I 2025-03-25 12:16:28,335] Trial 34 finished with value: 0.4879996573673686 and parameters: {'learning_rate': 0.0003501523855276686, 'weight_decay': 0.008, 'warmup_steps': 7}. Best is trial 31 with value: 0.500449702292892.


Trial 35 with params: {'learning_rate': 0.0002154325698190696, 'weight_decay': 0.008, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2495,3.637427,0.1293,0.130302,0.1293,0.096347
2,3.5684,3.143695,0.2228,0.208642,0.2228,0.191663
3,3.0987,2.711455,0.3021,0.284376,0.3021,0.27086
4,2.7161,2.570704,0.3317,0.350134,0.3317,0.311219
5,2.4133,2.293329,0.3904,0.395596,0.3904,0.374158
6,2.1365,2.130813,0.4242,0.423546,0.4242,0.41373


[I 2025-03-25 12:24:11,708] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0006664867774927576, 'weight_decay': 0.008, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1927,3.735426,0.0994,0.078023,0.0994,0.067987
2,3.6352,3.285966,0.1814,0.192798,0.1814,0.158198
3,3.1989,2.824618,0.2682,0.2563,0.2682,0.240015
4,2.8184,2.686025,0.2959,0.312402,0.2959,0.27376
5,2.5292,2.361811,0.3687,0.369978,0.3687,0.351919
6,2.2707,2.191145,0.4017,0.409629,0.4017,0.39217


[I 2025-03-25 12:31:51,341] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.000883825540682926, 'weight_decay': 0.006, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2376,3.825746,0.0888,0.065444,0.0888,0.060956
2,3.744,3.458022,0.1549,0.149585,0.1549,0.131162
3,3.3544,3.035267,0.2307,0.228449,0.2307,0.203096


[I 2025-03-25 12:35:41,479] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00023194417638517086, 'weight_decay': 0.008, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.251,3.654112,0.124,0.119032,0.124,0.089448
2,3.5708,3.190683,0.2132,0.219448,0.2132,0.186711
3,3.1198,2.780986,0.2913,0.274941,0.2913,0.260046


[I 2025-03-25 12:39:30,851] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0002473791577123347, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2351,3.614943,0.1288,0.115894,0.1288,0.091587
2,3.5396,3.153373,0.2214,0.221419,0.2214,0.199161
3,3.0471,2.70135,0.3062,0.295048,0.3062,0.277207
4,2.6503,2.517867,0.3397,0.355255,0.3397,0.3186
5,2.3443,2.229753,0.4087,0.413571,0.4087,0.392874
6,2.07,2.122334,0.4357,0.436437,0.4357,0.42514
7,1.8285,2.046907,0.4527,0.455509,0.4527,0.444918
8,1.5906,2.065291,0.4557,0.463245,0.4557,0.448308
9,1.3926,2.104132,0.4485,0.45647,0.4485,0.439995
10,1.2369,2.002312,0.466,0.471208,0.466,0.462985


[I 2025-03-25 12:52:20,099] Trial 39 finished with value: 0.46298493710956123 and parameters: {'learning_rate': 0.0002473791577123347, 'weight_decay': 0.01, 'warmup_steps': 8}. Best is trial 31 with value: 0.500449702292892.


Trial 40 with params: {'learning_rate': 0.0005363712572952317, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1753,3.660749,0.1188,0.093059,0.1188,0.079829
2,3.5355,3.145459,0.214,0.213643,0.214,0.190411
3,3.0668,2.762096,0.2779,0.267309,0.2779,0.250504


[I 2025-03-25 12:56:08,216] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 6.459897452290429e-05, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.5237,4.030699,0.0843,0.058665,0.0843,0.046063
2,4.0677,3.706181,0.1325,0.114054,0.1325,0.099173
3,3.7797,3.442248,0.1762,0.15361,0.1762,0.139391
4,3.5559,3.323176,0.1994,0.198635,0.1994,0.171179
5,3.3755,3.12024,0.2279,0.216533,0.2279,0.199763
6,3.2291,3.022316,0.2472,0.228999,0.2472,0.216055


[I 2025-03-25 13:03:49,289] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0004143616152346527, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1732,3.588184,0.1302,0.113565,0.1302,0.091663
2,3.4237,2.997414,0.2376,0.247379,0.2376,0.212667
3,2.9374,2.618996,0.3109,0.296053,0.3109,0.2802
4,2.5816,2.514181,0.3346,0.363596,0.3346,0.317659
5,2.2838,2.142247,0.4184,0.41471,0.4184,0.403563
6,1.9958,2.053652,0.442,0.449565,0.442,0.434856
7,1.7525,1.931081,0.4779,0.481658,0.4779,0.472205
8,1.5034,1.977246,0.4694,0.4798,0.4694,0.462661
9,1.2837,2.004502,0.4684,0.477259,0.4684,0.462037
10,1.1021,1.882624,0.4874,0.489504,0.4874,0.483638


[I 2025-03-25 13:16:38,353] Trial 42 finished with value: 0.4836378383457607 and parameters: {'learning_rate': 0.0004143616152346527, 'weight_decay': 0.01, 'warmup_steps': 15}. Best is trial 31 with value: 0.500449702292892.


Trial 43 with params: {'learning_rate': 0.0004203807212567375, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.175,3.687806,0.1123,0.091946,0.1123,0.07941
2,3.5091,3.102834,0.2297,0.235691,0.2297,0.20496
3,3.0322,2.676616,0.3056,0.298523,0.3056,0.280358
4,2.6498,2.497924,0.3368,0.350488,0.3368,0.316434
5,2.3557,2.229063,0.4022,0.400763,0.4022,0.386199
6,2.0756,2.068675,0.4339,0.437395,0.4339,0.42333
7,1.8292,1.996524,0.4548,0.456758,0.4548,0.446359
8,1.5838,2.006055,0.4602,0.470152,0.4602,0.453672
9,1.3594,2.030817,0.4594,0.46398,0.4594,0.45085
10,1.1804,1.925355,0.4788,0.488066,0.4788,0.477204


[I 2025-03-25 13:29:26,635] Trial 43 finished with value: 0.4772040407700427 and parameters: {'learning_rate': 0.0004203807212567375, 'weight_decay': 0.01, 'warmup_steps': 8}. Best is trial 31 with value: 0.500449702292892.


Trial 44 with params: {'learning_rate': 7.012112975444019e-05, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.5261,4.008264,0.0877,0.05812,0.0877,0.050174
2,4.0145,3.629148,0.1427,0.131038,0.1427,0.109427
3,3.6871,3.360138,0.1882,0.16685,0.1882,0.156044


[I 2025-03-25 13:33:16,283] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.00022489874391886823, 'weight_decay': 0.007, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2558,3.676367,0.1173,0.107582,0.1173,0.084723
2,3.5607,3.17863,0.2238,0.240068,0.2238,0.199726
3,3.0799,2.705732,0.2995,0.290808,0.2995,0.269932
4,2.691,2.514667,0.342,0.357137,0.342,0.32279
5,2.3878,2.248283,0.3978,0.403165,0.3978,0.382855
6,2.1029,2.121663,0.4272,0.424061,0.4272,0.41551
7,1.8736,2.073483,0.4426,0.44367,0.4426,0.432167
8,1.6413,2.084946,0.4395,0.44254,0.4395,0.429276
9,1.4497,2.121957,0.4368,0.440927,0.4368,0.426005
10,1.2937,2.017602,0.4572,0.462536,0.4572,0.452725


[I 2025-03-25 13:46:06,214] Trial 45 finished with value: 0.45272461763972827 and parameters: {'learning_rate': 0.00022489874391886823, 'weight_decay': 0.007, 'warmup_steps': 16}. Best is trial 31 with value: 0.500449702292892.


Trial 46 with params: {'learning_rate': 0.0013935498531871878, 'weight_decay': 0.01, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2563,3.950342,0.0644,0.053834,0.0644,0.036947
2,3.9397,3.817202,0.0974,0.08419,0.0974,0.068048
3,3.7065,3.444087,0.1583,0.136492,0.1583,0.128929


[I 2025-03-25 13:49:56,443] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.00039790657576095194, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1881,3.682285,0.1169,0.100808,0.1169,0.082997
2,3.4565,2.996126,0.2495,0.262184,0.2495,0.22852
3,2.9505,2.603838,0.3265,0.321947,0.3265,0.300297
4,2.5616,2.450408,0.3581,0.380021,0.3581,0.338193
5,2.2407,2.159928,0.4251,0.428565,0.4251,0.409624
6,1.9671,2.020641,0.4528,0.463009,0.4528,0.444144
7,1.7192,1.94984,0.4697,0.471164,0.4697,0.460536
8,1.4696,1.938088,0.4813,0.485993,0.4813,0.473286
9,1.2574,1.999443,0.4757,0.484626,0.4757,0.468093
10,1.0836,1.877272,0.4912,0.499957,0.4912,0.489454


[I 2025-03-25 14:02:49,565] Trial 47 finished with value: 0.4894539024147783 and parameters: {'learning_rate': 0.00039790657576095194, 'weight_decay': 0.007, 'warmup_steps': 8}. Best is trial 31 with value: 0.500449702292892.


Trial 48 with params: {'learning_rate': 0.0007593405620579207, 'weight_decay': 0.005, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2064,3.829105,0.0838,0.052072,0.0838,0.05199
2,3.7088,3.389661,0.1679,0.161849,0.1679,0.139439
3,3.3002,2.973167,0.2455,0.239582,0.2455,0.214928
4,2.9457,2.801223,0.2772,0.297816,0.2772,0.252627
5,2.655,2.463892,0.3473,0.347254,0.3473,0.328517
6,2.4055,2.296115,0.3847,0.394507,0.3847,0.374533


[I 2025-03-25 14:10:36,709] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.00020084749377619903, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2888,3.713786,0.1141,0.118144,0.1141,0.082418
2,3.589,3.163766,0.2188,0.215421,0.2188,0.194098
3,3.1188,2.773973,0.291,0.286692,0.291,0.260074
4,2.7466,2.618755,0.3232,0.334816,0.3232,0.300633
5,2.45,2.3293,0.3829,0.383066,0.3829,0.365086
6,2.179,2.170112,0.414,0.410078,0.414,0.401287


[I 2025-03-25 14:18:18,988] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0005512880304825463, 'weight_decay': 0.006, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.163,3.626044,0.1198,0.104469,0.1198,0.087701
2,3.5274,3.190134,0.2042,0.217019,0.2042,0.17949
3,3.0435,2.700871,0.29,0.289328,0.29,0.265312
4,2.6791,2.553355,0.3239,0.348776,0.3239,0.306204
5,2.3936,2.246173,0.3974,0.400588,0.3974,0.381931
6,2.1286,2.102015,0.4236,0.427317,0.4236,0.413597


[I 2025-03-25 14:26:01,873] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0003799472731351596, 'weight_decay': 0.007, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1846,3.582568,0.1356,0.121494,0.1356,0.100918
2,3.4916,3.060654,0.2312,0.243223,0.2312,0.210058
3,2.979,2.615731,0.3169,0.308783,0.3169,0.290947
4,2.5837,2.442971,0.3465,0.376339,0.3465,0.329089
5,2.2726,2.148532,0.4202,0.415885,0.4202,0.403068
6,1.9937,2.015374,0.4526,0.457721,0.4526,0.446418
7,1.7487,1.929895,0.4683,0.473659,0.4683,0.462004
8,1.5,1.942819,0.4755,0.484518,0.4755,0.46955
9,1.2902,1.987239,0.4695,0.478753,0.4695,0.461519
10,1.1141,1.858536,0.4933,0.500117,0.4933,0.492035


[I 2025-03-25 14:38:55,559] Trial 51 finished with value: 0.49203520431566355 and parameters: {'learning_rate': 0.0003799472731351596, 'weight_decay': 0.007, 'warmup_steps': 7}. Best is trial 31 with value: 0.500449702292892.


Trial 52 with params: {'learning_rate': 0.00021487797366241817, 'weight_decay': 0.007, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2534,3.670977,0.1259,0.111991,0.1259,0.091559
2,3.6082,3.203912,0.2154,0.214786,0.2154,0.188487
3,3.1377,2.790573,0.2839,0.281108,0.2839,0.254276


[I 2025-03-25 14:42:47,358] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.00033443498967985585, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1986,3.651779,0.1141,0.102399,0.1141,0.084324
2,3.5262,3.107425,0.2193,0.219725,0.2193,0.192213
3,3.0221,2.662472,0.3102,0.312902,0.3102,0.285034
4,2.6344,2.487437,0.3499,0.371921,0.3499,0.32963
5,2.3166,2.192099,0.4085,0.404925,0.4085,0.39095
6,2.0323,2.048011,0.4412,0.439796,0.4412,0.430636
7,1.7765,2.006557,0.4552,0.454881,0.4552,0.444953
8,1.5344,2.036421,0.4553,0.467675,0.4553,0.449911
9,1.319,2.096072,0.4419,0.45273,0.4419,0.432385
10,1.1472,1.97289,0.4679,0.477337,0.4679,0.465791


[I 2025-03-25 14:55:48,798] Trial 53 finished with value: 0.4657910692919126 and parameters: {'learning_rate': 0.00033443498967985585, 'weight_decay': 0.007, 'warmup_steps': 9}. Best is trial 31 with value: 0.500449702292892.


Trial 54 with params: {'learning_rate': 0.00046907516874783835, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1705,3.644269,0.1175,0.111872,0.1175,0.082229
2,3.5282,3.138339,0.2136,0.226329,0.2136,0.18798
3,3.0362,2.708762,0.2906,0.281657,0.2906,0.260844
4,2.6648,2.527172,0.331,0.360497,0.331,0.31457
5,2.3657,2.222926,0.4023,0.406194,0.4023,0.388283
6,2.092,2.066258,0.434,0.437065,0.434,0.425406


[I 2025-03-25 15:03:27,731] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.0008953750478722926, 'weight_decay': 0.0, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2351,3.8523,0.0822,0.066122,0.0822,0.050779
2,3.7628,3.492711,0.1548,0.145896,0.1548,0.129766
3,3.3725,3.012459,0.2336,0.218641,0.2336,0.20187
4,3.0061,2.845229,0.2661,0.279448,0.2661,0.240442
5,2.7119,2.480552,0.341,0.338025,0.341,0.322
6,2.4517,2.325039,0.3766,0.379369,0.3766,0.363282


[I 2025-03-25 15:11:09,585] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.004913837305728667, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.4278,4.128789,0.047,0.026177,0.047,0.02336
2,4.0658,3.892046,0.0846,0.067246,0.0846,0.060871
3,3.8545,3.666049,0.123,0.109015,0.123,0.096949


[I 2025-03-25 15:15:00,857] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0006368174642739085, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1896,3.737426,0.106,0.091745,0.106,0.073786
2,3.5826,3.222629,0.1956,0.216487,0.1956,0.17143
3,3.0976,2.795966,0.2758,0.262405,0.2758,0.246459
4,2.7379,2.560416,0.3197,0.331819,0.3197,0.29676
5,2.4487,2.304548,0.3841,0.389842,0.3841,0.36863
6,2.1814,2.141326,0.4208,0.430962,0.4208,0.412505
7,1.9422,2.05679,0.4424,0.452406,0.4424,0.433574
8,1.7,2.03681,0.4561,0.47058,0.4561,0.449039
9,1.4906,2.041029,0.459,0.461921,0.459,0.448603
10,1.3099,1.92052,0.4851,0.490972,0.4851,0.482547


[I 2025-03-25 15:27:49,324] Trial 57 finished with value: 0.4825471457318035 and parameters: {'learning_rate': 0.0006368174642739085, 'weight_decay': 0.007, 'warmup_steps': 11}. Best is trial 31 with value: 0.500449702292892.


Trial 58 with params: {'learning_rate': 0.0019403519985629898, 'weight_decay': 0.003, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.305,4.043186,0.0666,0.053749,0.0666,0.040332
2,3.9429,3.833367,0.0985,0.08496,0.0985,0.069326
3,3.7065,3.454878,0.1631,0.150714,0.1631,0.128634
4,3.4226,3.186343,0.204,0.20007,0.204,0.171123
5,3.1417,2.927417,0.2515,0.252152,0.2515,0.229981
6,2.8872,2.707884,0.294,0.290721,0.294,0.276969


[I 2025-03-25 15:35:28,177] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0006017708963321431, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1512,3.629631,0.1148,0.090925,0.1148,0.078644
2,3.4818,3.067174,0.2292,0.247029,0.2292,0.208996
3,3.0003,2.662975,0.3001,0.291727,0.3001,0.27308
4,2.626,2.535259,0.3273,0.370023,0.3273,0.30956
5,2.3361,2.182424,0.4065,0.407784,0.4065,0.3913
6,2.0624,2.017711,0.4484,0.454521,0.4484,0.440463
7,1.8264,1.942421,0.4697,0.475421,0.4697,0.461826
8,1.588,1.946629,0.4743,0.483668,0.4743,0.466641
9,1.3784,1.936881,0.4838,0.489492,0.4838,0.47511
10,1.2002,1.834504,0.4986,0.505023,0.4986,0.497101


[I 2025-03-25 15:48:14,910] Trial 59 finished with value: 0.4971005061735886 and parameters: {'learning_rate': 0.0006017708963321431, 'weight_decay': 0.007, 'warmup_steps': 9}. Best is trial 31 with value: 0.500449702292892.


Trial 60 with params: {'learning_rate': 0.0002597323899851586, 'weight_decay': 0.003, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2507,3.631925,0.1268,0.137516,0.1268,0.092014
2,3.5279,3.129776,0.2275,0.234541,0.2275,0.20278
3,3.0146,2.669643,0.3133,0.303926,0.3133,0.284055
4,2.6065,2.467153,0.3513,0.371893,0.3513,0.331144
5,2.2905,2.187457,0.4139,0.409602,0.4139,0.397494
6,2.0083,2.058756,0.4394,0.446432,0.4394,0.430975
7,1.7665,2.0143,0.4589,0.468101,0.4589,0.45297
8,1.5301,2.008106,0.464,0.467407,0.464,0.45491
9,1.33,2.091912,0.4433,0.448954,0.4433,0.431914
10,1.1701,1.957883,0.4756,0.480775,0.4756,0.470924


[I 2025-03-25 16:01:04,529] Trial 60 finished with value: 0.4709237262260549 and parameters: {'learning_rate': 0.0002597323899851586, 'weight_decay': 0.003, 'warmup_steps': 11}. Best is trial 31 with value: 0.500449702292892.


Trial 61 with params: {'learning_rate': 0.0004763715558786438, 'weight_decay': 0.007, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.18,3.687736,0.1091,0.101068,0.1091,0.076277
2,3.545,3.119927,0.2207,0.22202,0.2207,0.199343
3,3.0433,2.651987,0.3077,0.300135,0.3077,0.284156
4,2.6538,2.543464,0.3179,0.347578,0.3179,0.298938
5,2.352,2.180177,0.4072,0.405701,0.4072,0.390099
6,2.0787,2.046084,0.4425,0.445407,0.4425,0.432982
7,1.8383,1.966862,0.4615,0.467724,0.4615,0.454445
8,1.5953,1.956813,0.4721,0.481125,0.4721,0.46497
9,1.3823,1.999678,0.4683,0.475591,0.4683,0.459846
10,1.2091,1.879998,0.4855,0.492617,0.4855,0.482169


[I 2025-03-25 16:13:53,033] Trial 61 finished with value: 0.4821688684168294 and parameters: {'learning_rate': 0.0004763715558786438, 'weight_decay': 0.007, 'warmup_steps': 6}. Best is trial 31 with value: 0.500449702292892.


Trial 62 with params: {'learning_rate': 0.000419644482395918, 'weight_decay': 0.006, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1647,3.598698,0.1315,0.11047,0.1315,0.094309
2,3.4445,3.029953,0.236,0.240354,0.236,0.212191
3,2.9552,2.602014,0.3114,0.301855,0.3114,0.284673
4,2.5862,2.463436,0.3469,0.371335,0.3469,0.330715
5,2.2956,2.195178,0.4079,0.406538,0.4079,0.392551
6,2.0133,2.065371,0.4358,0.444359,0.4358,0.428235
7,1.7724,1.950318,0.466,0.466419,0.466,0.457817
8,1.5326,1.964247,0.4666,0.478899,0.4666,0.459621
9,1.3138,1.986352,0.4699,0.473678,0.4699,0.460486
10,1.1382,1.873563,0.4896,0.494177,0.4896,0.48665


[I 2025-03-25 16:26:40,190] Trial 62 finished with value: 0.4866500550924115 and parameters: {'learning_rate': 0.000419644482395918, 'weight_decay': 0.006, 'warmup_steps': 12}. Best is trial 31 with value: 0.500449702292892.


Trial 63 with params: {'learning_rate': 0.00026659947808181753, 'weight_decay': 0.005, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2372,3.670012,0.1172,0.095499,0.1172,0.081593
2,3.5387,3.081914,0.2366,0.239481,0.2366,0.2128
3,3.0321,2.6567,0.309,0.297262,0.309,0.282006
4,2.6383,2.479275,0.3489,0.361145,0.3489,0.327929
5,2.3224,2.229164,0.4018,0.405165,0.4018,0.387852
6,2.0379,2.074447,0.4402,0.440489,0.4402,0.431212


[I 2025-03-25 16:34:17,652] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.0009191789408615183, 'weight_decay': 0.004, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.184,3.712279,0.103,0.088912,0.103,0.068389
2,3.6543,3.282793,0.1783,0.178543,0.1783,0.152439
3,3.2424,2.937356,0.2381,0.229788,0.2381,0.209006
4,2.9182,2.77432,0.2801,0.299313,0.2801,0.258478
5,2.6486,2.470725,0.3409,0.345475,0.3409,0.323491


[I 2025-03-25 16:54:45,471] Trial 65 finished with value: 0.4633381274573074 and parameters: {'learning_rate': 0.000241251747353242, 'weight_decay': 0.009000000000000001, 'warmup_steps': 31}. Best is trial 31 with value: 0.500449702292892.


Trial 66 with params: {'learning_rate': 0.00043488699519156213, 'weight_decay': 0.006, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1703,3.663277,0.1206,0.101058,0.1206,0.087163
2,3.4925,3.071197,0.2314,0.243295,0.2314,0.207635
3,3.0068,2.659481,0.3056,0.292223,0.3056,0.275903
4,2.6298,2.529411,0.3352,0.357538,0.3352,0.31213
5,2.3263,2.179463,0.4135,0.415741,0.4135,0.398833
6,2.0514,2.057539,0.4455,0.454138,0.4455,0.437395
7,1.8011,1.986057,0.4666,0.469645,0.4666,0.457574
8,1.564,1.990484,0.4635,0.472728,0.4635,0.456111
9,1.3439,2.010055,0.4667,0.471516,0.4667,0.458488
10,1.1638,1.893784,0.4856,0.490686,0.4856,0.482351


[I 2025-03-25 17:07:35,876] Trial 66 finished with value: 0.48235070628991816 and parameters: {'learning_rate': 0.00043488699519156213, 'weight_decay': 0.006, 'warmup_steps': 11}. Best is trial 31 with value: 0.500449702292892.


Trial 67 with params: {'learning_rate': 0.0003626047722599073, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1722,3.59981,0.1251,0.124391,0.1251,0.093965
2,3.442,3.049053,0.2317,0.231459,0.2317,0.205635
3,2.9416,2.589868,0.3196,0.304487,0.3196,0.292617
4,2.5513,2.414215,0.359,0.374415,0.359,0.337681
5,2.2491,2.170775,0.4168,0.42081,0.4168,0.402889
6,1.967,2.00744,0.4537,0.453177,0.4537,0.443729
7,1.716,1.946093,0.4723,0.479002,0.4723,0.466269
8,1.4633,1.957828,0.4713,0.480505,0.4713,0.466075
9,1.2488,1.991421,0.4748,0.478304,0.4748,0.466691
10,1.0724,1.901408,0.482,0.488795,0.482,0.480531


[I 2025-03-25 17:20:25,619] Trial 67 finished with value: 0.48053135658028606 and parameters: {'learning_rate': 0.0003626047722599073, 'weight_decay': 0.007, 'warmup_steps': 8}. Best is trial 31 with value: 0.500449702292892.


Trial 68 with params: {'learning_rate': 0.0005264569401687198, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1796,3.683964,0.1106,0.07592,0.1106,0.074009
2,3.4936,3.114956,0.2197,0.234278,0.2197,0.196857
3,3.013,2.665572,0.3027,0.307389,0.3027,0.27426
4,2.6657,2.541283,0.3294,0.348187,0.3294,0.306851
5,2.3791,2.241164,0.3966,0.405822,0.3966,0.382429
6,2.1103,2.100985,0.4254,0.42866,0.4254,0.415205


[I 2025-03-25 17:28:07,314] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0012237866793198095, 'weight_decay': 0.007, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2491,3.939173,0.0708,0.063013,0.0708,0.044197
2,3.8326,3.647338,0.1175,0.097579,0.1175,0.091505
3,3.504,3.196866,0.2072,0.19746,0.2072,0.175621
4,3.1792,2.976868,0.2375,0.250366,0.2375,0.209685
5,2.9059,2.721098,0.2901,0.294798,0.2901,0.268864
6,2.6657,2.508605,0.3398,0.338426,0.3398,0.325459


[I 2025-03-25 17:35:51,378] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0032109758631513803, 'weight_decay': 0.004, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3339,4.040976,0.0594,0.039753,0.0594,0.028764
2,3.9894,3.854773,0.0879,0.076004,0.0879,0.058288
3,3.78,3.605191,0.1309,0.11739,0.1309,0.099159
4,3.5668,3.45197,0.1487,0.154639,0.1487,0.121573
5,3.3369,3.127023,0.2055,0.191422,0.2055,0.177514


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1762,3.701384,0.1128,0.089172,0.1128,0.078545
2,3.5499,3.155697,0.2035,0.215099,0.2035,0.178534
3,3.0828,2.776307,0.2791,0.272976,0.2791,0.255049


[I 2025-03-25 17:47:23,468] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.00016972099545840288, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3339,3.738927,0.1152,0.094031,0.1152,0.078074
2,3.6821,3.258046,0.2037,0.20056,0.2037,0.174002
3,3.2455,2.89952,0.2713,0.27628,0.2713,0.241643
4,2.8867,2.688442,0.3084,0.319079,0.3084,0.284795
5,2.5805,2.388893,0.3686,0.367368,0.3686,0.349541
6,2.3199,2.255853,0.4027,0.398691,0.4027,0.388711


[I 2025-03-25 17:55:05,463] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 5.953168512495511e-05, 'weight_decay': 0.01, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.5834,4.147998,0.0756,0.069145,0.0756,0.044667
2,4.1443,3.767668,0.1213,0.103173,0.1213,0.086365
3,3.8333,3.507434,0.1606,0.138436,0.1606,0.125379


[I 2025-03-25 17:58:56,381] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0003465646238744322, 'weight_decay': 0.007, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1886,3.602177,0.1307,0.12761,0.1307,0.092481
2,3.4902,3.088968,0.2249,0.234276,0.2249,0.198961
3,2.9954,2.661585,0.3059,0.30155,0.3059,0.278537
4,2.6139,2.47275,0.345,0.367804,0.345,0.32848
5,2.319,2.188214,0.4098,0.412062,0.4098,0.395575
6,2.0411,2.069156,0.4364,0.441946,0.4364,0.426254
7,1.7966,1.97058,0.4663,0.470674,0.4663,0.458262
8,1.5528,1.980531,0.4674,0.474366,0.4674,0.459587
9,1.3408,2.009,0.4671,0.470688,0.4671,0.45933
10,1.1699,1.900232,0.484,0.491059,0.484,0.482166


[I 2025-03-25 18:11:52,158] Trial 74 finished with value: 0.4821664443037011 and parameters: {'learning_rate': 0.0003465646238744322, 'weight_decay': 0.007, 'warmup_steps': 17}. Best is trial 31 with value: 0.500449702292892.


Trial 75 with params: {'learning_rate': 0.0004553332983395437, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1666,3.63853,0.1233,0.112191,0.1233,0.089912
2,3.4668,3.054191,0.2289,0.239982,0.2289,0.201835
3,2.9614,2.607926,0.3164,0.315138,0.3164,0.29293
4,2.5808,2.495421,0.3461,0.365209,0.3461,0.327209
5,2.2844,2.162137,0.416,0.415421,0.416,0.400311
6,2.0172,2.033084,0.4481,0.452414,0.4481,0.439688
7,1.7731,1.965208,0.4682,0.474031,0.4682,0.462091
8,1.5228,1.987613,0.4657,0.475828,0.4657,0.459208
9,1.3032,1.996947,0.4706,0.47862,0.4706,0.462572
10,1.1302,1.893844,0.4867,0.49154,0.4867,0.483689


[I 2025-03-25 18:24:39,462] Trial 75 finished with value: 0.48368860010640474 and parameters: {'learning_rate': 0.0004553332983395437, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}. Best is trial 31 with value: 0.500449702292892.


Trial 76 with params: {'learning_rate': 5.7423270605816206e-05, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.5562,4.122925,0.0777,0.05152,0.0777,0.0428
2,4.1214,3.767762,0.1228,0.130438,0.1228,0.087803
3,3.8371,3.514076,0.1607,0.139669,0.1607,0.126062
4,3.6368,3.436255,0.1785,0.173061,0.1785,0.151203
5,3.4717,3.219327,0.2115,0.18741,0.2115,0.183142
6,3.3381,3.122063,0.2309,0.211397,0.2309,0.200348


[I 2025-03-25 18:32:20,334] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0003612481528173199, 'weight_decay': 0.005, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1698,3.615118,0.1208,0.103985,0.1208,0.085648
2,3.4719,3.050301,0.2334,0.235465,0.2334,0.210666
3,2.9712,2.623327,0.317,0.3097,0.317,0.287758
4,2.576,2.461348,0.3537,0.376049,0.3537,0.334484
5,2.2644,2.170427,0.4161,0.416116,0.4161,0.400733
6,1.9777,2.037622,0.4445,0.454558,0.4445,0.43819
7,1.7391,1.956051,0.4659,0.472832,0.4659,0.45884
8,1.4893,1.975956,0.4654,0.475969,0.4654,0.459163
9,1.2779,2.015184,0.4653,0.470839,0.4653,0.455836
10,1.1074,1.891447,0.4891,0.491207,0.4891,0.484914


[I 2025-03-25 18:45:08,857] Trial 77 finished with value: 0.4849135125323347 and parameters: {'learning_rate': 0.0003612481528173199, 'weight_decay': 0.005, 'warmup_steps': 4}. Best is trial 31 with value: 0.500449702292892.


Trial 78 with params: {'learning_rate': 0.00026472199954123596, 'weight_decay': 0.004, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2104,3.60005,0.1407,0.124771,0.1407,0.105448
2,3.5335,3.152205,0.2263,0.241509,0.2263,0.203855
3,3.0502,2.703813,0.3079,0.298506,0.3079,0.280133
4,2.6528,2.505028,0.3463,0.364323,0.3463,0.323803
5,2.3444,2.254166,0.3966,0.402877,0.3966,0.380564
6,2.0593,2.089656,0.4361,0.443758,0.4361,0.428865


[I 2025-03-25 18:52:52,457] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.000503550544142504, 'weight_decay': 0.006, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1743,3.674268,0.117,0.09888,0.117,0.082785
2,3.527,3.145495,0.223,0.220309,0.223,0.19672
3,3.0428,2.657352,0.3079,0.305209,0.3079,0.283792
4,2.6439,2.516042,0.3372,0.365486,0.3372,0.317813
5,2.3321,2.192358,0.4129,0.413062,0.4129,0.397666
6,2.0584,2.03627,0.4486,0.454228,0.4486,0.440297
7,1.8082,1.958162,0.4669,0.470467,0.4669,0.458924
8,1.5588,1.996878,0.4644,0.479926,0.4644,0.457557
9,1.3434,2.004825,0.4696,0.474473,0.4696,0.461292
10,1.1625,1.890454,0.4873,0.493736,0.4873,0.48423


[I 2025-03-25 19:05:42,377] Trial 79 finished with value: 0.4842298967809291 and parameters: {'learning_rate': 0.000503550544142504, 'weight_decay': 0.006, 'warmup_steps': 3}. Best is trial 31 with value: 0.500449702292892.


Trial 80 with params: {'learning_rate': 0.0002715097179944954, 'weight_decay': 0.005, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2142,3.614545,0.1275,0.11963,0.1275,0.092859
2,3.5164,3.06418,0.235,0.237502,0.235,0.209619
3,3.0114,2.623237,0.3223,0.322137,0.3223,0.295527
4,2.611,2.503336,0.3464,0.369927,0.3464,0.329345
5,2.3067,2.212816,0.4088,0.41689,0.4088,0.395864
6,2.025,2.050271,0.443,0.442116,0.443,0.433118
7,1.7789,2.005575,0.4576,0.462301,0.4576,0.449933
8,1.5419,2.008183,0.46,0.466698,0.46,0.453007
9,1.3435,2.063434,0.4543,0.462778,0.4543,0.4458
10,1.1823,1.945425,0.4741,0.478054,0.4741,0.470429


[I 2025-03-25 19:18:32,017] Trial 80 finished with value: 0.47042915934075547 and parameters: {'learning_rate': 0.0002715097179944954, 'weight_decay': 0.005, 'warmup_steps': 7}. Best is trial 31 with value: 0.500449702292892.


Trial 81 with params: {'learning_rate': 0.00047365854695456417, 'weight_decay': 0.006, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.131,3.572348,0.1312,0.121842,0.1312,0.093319
2,3.3819,2.941759,0.261,0.271189,0.261,0.241518
3,2.9161,2.645683,0.3137,0.307288,0.3137,0.283645
4,2.5565,2.442449,0.3456,0.363822,0.3456,0.32853
5,2.2752,2.168774,0.4124,0.415517,0.4124,0.397206
6,2.0115,2.032204,0.442,0.447407,0.442,0.43452
7,1.7717,1.951828,0.4674,0.475091,0.4674,0.460508
8,1.5269,1.981522,0.469,0.481357,0.469,0.462985
9,1.3137,1.993575,0.4724,0.47818,0.4724,0.464309
10,1.1357,1.868854,0.49,0.492536,0.49,0.486367


[I 2025-03-25 19:31:25,825] Trial 81 finished with value: 0.48636672526837255 and parameters: {'learning_rate': 0.00047365854695456417, 'weight_decay': 0.006, 'warmup_steps': 11}. Best is trial 31 with value: 0.500449702292892.


Trial 82 with params: {'learning_rate': 0.00040171551241505155, 'weight_decay': 0.006, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.172,3.564824,0.1321,0.105231,0.1321,0.096849
2,3.4611,2.983435,0.2502,0.255689,0.2502,0.226698
3,2.9547,2.651988,0.3087,0.300293,0.3087,0.280463
4,2.5727,2.455209,0.3463,0.368239,0.3463,0.326319
5,2.2709,2.164141,0.4159,0.416952,0.4159,0.402992
6,1.9915,2.025153,0.4445,0.444264,0.4445,0.432044


[I 2025-03-25 19:39:06,574] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0006293470965521542, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1502,3.590372,0.1269,0.120555,0.1269,0.095167
2,3.4496,3.099226,0.2154,0.22651,0.2154,0.192947
3,2.9861,2.672241,0.2973,0.287098,0.2973,0.266392
4,2.6467,2.551936,0.3274,0.352492,0.3274,0.305836
5,2.3677,2.262815,0.3864,0.392281,0.3864,0.372752
6,2.1172,2.104924,0.4257,0.432222,0.4257,0.417714
7,1.8841,2.000887,0.4569,0.456995,0.4569,0.447778
8,1.643,2.011139,0.4577,0.469546,0.4577,0.451052
9,1.4338,2.050303,0.4567,0.461265,0.4567,0.447539
10,1.2498,1.900053,0.4851,0.488185,0.4851,0.480945


[I 2025-03-25 19:51:53,683] Trial 83 finished with value: 0.48094500041696236 and parameters: {'learning_rate': 0.0006293470965521542, 'weight_decay': 0.006, 'warmup_steps': 15}. Best is trial 31 with value: 0.500449702292892.


Trial 84 with params: {'learning_rate': 9.234777457216261e-05, 'weight_decay': 0.004, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.4682,3.912866,0.0986,0.094071,0.0986,0.060055
2,3.9224,3.539205,0.1571,0.15337,0.1571,0.123909
3,3.5813,3.259192,0.2073,0.183806,0.2073,0.173336


[I 2025-03-25 19:55:46,466] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.00038953624476127874, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1501,3.594177,0.1268,0.106641,0.1268,0.091617
2,3.4614,3.085627,0.2208,0.218549,0.2208,0.19515
3,2.9924,2.636877,0.3121,0.302353,0.3121,0.286125
4,2.6127,2.495092,0.3418,0.353412,0.3418,0.320284
5,2.3045,2.207302,0.4067,0.40863,0.4067,0.390547
6,2.0239,2.037355,0.4432,0.447285,0.4432,0.435298
7,1.775,1.97879,0.4638,0.468791,0.4638,0.456474
8,1.5206,2.000026,0.464,0.474705,0.464,0.458237
9,1.3013,2.0349,0.4597,0.467586,0.4597,0.450685
10,1.1233,1.927002,0.4787,0.489952,0.4787,0.477714


[I 2025-03-25 20:08:38,096] Trial 85 finished with value: 0.4777135784406129 and parameters: {'learning_rate': 0.00038953624476127874, 'weight_decay': 0.002, 'warmup_steps': 0}. Best is trial 31 with value: 0.500449702292892.


Trial 86 with params: {'learning_rate': 0.0007273755560606092, 'weight_decay': 0.004, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1921,3.776275,0.1014,0.077978,0.1014,0.070837
2,3.5917,3.262145,0.1903,0.208688,0.1903,0.168363
3,3.1422,2.813706,0.2737,0.27476,0.2737,0.246777


[I 2025-03-25 20:12:29,665] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0004191411023553298, 'weight_decay': 0.004, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2393,3.627229,0.1277,0.105659,0.1277,0.092229
2,3.5211,3.114839,0.22,0.212968,0.22,0.192682
3,3.042,2.704197,0.2968,0.284796,0.2968,0.265417
4,2.6573,2.545999,0.3284,0.352316,0.3284,0.310958
5,2.3578,2.186974,0.4088,0.406295,0.4088,0.392474
6,2.0745,2.077742,0.4365,0.438362,0.4365,0.426911
7,1.8296,2.001935,0.4627,0.469191,0.4627,0.454853
8,1.5877,1.979821,0.4669,0.473514,0.4669,0.45863
9,1.3695,2.018984,0.464,0.466225,0.464,0.453573
10,1.1905,1.915833,0.481,0.493833,0.481,0.479295


[I 2025-03-25 20:25:22,326] Trial 87 finished with value: 0.4792951916300593 and parameters: {'learning_rate': 0.0004191411023553298, 'weight_decay': 0.004, 'warmup_steps': 32}. Best is trial 31 with value: 0.500449702292892.


Trial 88 with params: {'learning_rate': 0.0002720158931689234, 'weight_decay': 0.007, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2274,3.668734,0.1144,0.111991,0.1144,0.081991
2,3.5346,3.135599,0.2213,0.225938,0.2213,0.194064
3,3.0378,2.695158,0.3042,0.299265,0.3042,0.279228
4,2.6265,2.510245,0.338,0.359745,0.338,0.318595
5,2.3116,2.201932,0.4154,0.416157,0.4154,0.40083
6,2.0271,2.068814,0.4424,0.444499,0.4424,0.434633
7,1.7751,2.006337,0.4613,0.464212,0.4613,0.453763
8,1.5363,2.002036,0.462,0.468118,0.462,0.455216
9,1.3345,2.0709,0.4533,0.458689,0.4533,0.445298
10,1.1695,1.970607,0.472,0.477635,0.472,0.469113


[I 2025-03-25 20:38:13,244] Trial 88 finished with value: 0.4691127433072749 and parameters: {'learning_rate': 0.0002720158931689234, 'weight_decay': 0.007, 'warmup_steps': 4}. Best is trial 31 with value: 0.500449702292892.


Trial 89 with params: {'learning_rate': 0.0007369136893393095, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1896,3.707032,0.1054,0.093734,0.1054,0.071243
2,3.6221,3.267594,0.1875,0.181452,0.1875,0.161901
3,3.1606,2.824488,0.2631,0.262282,0.2631,0.237461


[I 2025-03-25 20:42:04,960] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0011372885658903864, 'weight_decay': 0.007, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2352,3.879623,0.0762,0.059154,0.0762,0.049869
2,3.8125,3.554902,0.1344,0.122592,0.1344,0.107791
3,3.455,3.119169,0.2139,0.20883,0.2139,0.183545


[I 2025-03-25 20:46:15,281] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.00027658201251049946, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2115,3.638175,0.1246,0.114905,0.1246,0.089547
2,3.4924,3.093611,0.2284,0.227999,0.2284,0.202603
3,3.0096,2.637584,0.3167,0.307555,0.3167,0.29102
4,2.6164,2.485863,0.3457,0.367451,0.3457,0.329566
5,2.3143,2.19663,0.4132,0.41425,0.4132,0.399587
6,2.0335,2.097523,0.4393,0.44281,0.4393,0.430081
7,1.791,2.033218,0.4544,0.459722,0.4544,0.446411
8,1.5494,2.03397,0.4542,0.465649,0.4542,0.446274
9,1.3533,2.068846,0.451,0.455025,0.451,0.439894


In [None]:
print(best_base_random)

In [18]:
base.reset_seed()

## Prohledávání s destilací náhodně inicializovaného modelu

In [19]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-KD_hp-search", logging_dir=f"~/logs/{DATASET}/random-KD_hp-search",  remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [20]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [21]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [22]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_random_init_mobilenet(100)
)
  

Nastavení prohledávání.

In [None]:
best_distill_random = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-03-25 21:17:10,361] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9463,3.468593,0.13,0.099117,0.13,0.088588
2,3.3806,3.036226,0.2316,0.226119,0.2316,0.198101
3,2.9707,2.649721,0.3195,0.311019,0.3195,0.284502


[I 2025-03-25 21:20:59,280] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1221,3.654764,0.1051,0.073589,0.1051,0.065806
2,3.6755,3.382185,0.1646,0.168982,0.1646,0.130034
3,3.3893,3.138117,0.2178,0.218708,0.2178,0.180776
4,3.1686,2.975178,0.2523,0.262413,0.2523,0.22326
5,2.9851,2.784536,0.2935,0.293263,0.2935,0.266544
6,2.8257,2.693198,0.311,0.303405,0.311,0.28617


[I 2025-03-25 21:28:35,095] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2097,3.806188,0.0792,0.049113,0.0792,0.041798
2,3.8449,3.562362,0.1256,0.110789,0.1256,0.089509
3,3.6266,3.372823,0.1611,0.135241,0.1611,0.120874


[I 2025-03-25 21:32:21,955] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0714,3.605988,0.108,0.083131,0.108,0.066927
2,3.6103,3.313658,0.1785,0.175687,0.1785,0.144248
3,3.315,3.015286,0.2468,0.234051,0.2468,0.208777
4,3.0715,2.860942,0.2799,0.285692,0.2799,0.248629
5,2.8597,2.652696,0.3247,0.32223,0.3247,0.298046
6,2.6742,2.526834,0.3558,0.348638,0.3558,0.334441


[I 2025-03-25 21:39:53,520] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9028,3.594838,0.095,0.0733,0.095,0.055727
2,3.42,3.146732,0.1995,0.192126,0.1995,0.167387
3,3.0858,2.787504,0.2851,0.286155,0.2851,0.256255


[I 2025-03-25 21:43:39,924] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9617,3.766395,0.0618,0.040112,0.0618,0.030619
2,3.7031,3.516015,0.1144,0.090591,0.1144,0.07765
3,3.4499,3.220869,0.1802,0.170796,0.1802,0.143504
4,3.2149,3.101623,0.2044,0.218216,0.2044,0.17325
5,3.0104,2.844505,0.2719,0.263532,0.2719,0.24016
6,2.8222,2.648238,0.3158,0.310406,0.3158,0.292063


[I 2025-03-25 21:51:10,421] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8686,3.555465,0.1084,0.076668,0.1084,0.06918
2,3.3791,3.105663,0.2088,0.217041,0.2088,0.179004
3,3.0593,2.811692,0.2758,0.280694,0.2758,0.245381


[I 2025-03-25 21:54:56,366] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9404,3.794128,0.0514,0.033852,0.0514,0.022357
2,3.6611,3.547045,0.1109,0.095835,0.1109,0.075331
3,3.4432,3.230685,0.1746,0.17464,0.1746,0.139525
4,3.2115,3.042814,0.2147,0.225744,0.2147,0.182205
5,2.9998,2.795252,0.2828,0.271902,0.2828,0.249752
6,2.8108,2.634658,0.3186,0.318136,0.3186,0.299004
7,2.6446,2.505689,0.3555,0.35166,0.3555,0.338862
8,2.5015,2.443056,0.3704,0.375586,0.3704,0.354892
9,2.3761,2.450041,0.3759,0.371929,0.3759,0.356087
10,2.2746,2.320945,0.409,0.410765,0.409,0.396611


[I 2025-03-25 22:07:29,542] Trial 7 finished with value: 0.3966105872263933 and parameters: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 7 with value: 0.3966105872263933.


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1004,3.649293,0.1035,0.064642,0.1035,0.063683
2,3.6956,3.427874,0.1567,0.145372,0.1567,0.11975
3,3.4456,3.180007,0.2064,0.18217,0.2064,0.167178
4,3.2448,3.073159,0.2313,0.240865,0.2313,0.198969
5,3.0711,2.861894,0.276,0.267188,0.276,0.24651
6,2.9226,2.768399,0.2916,0.284598,0.2916,0.258953


[I 2025-03-25 22:15:02,024] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8951,3.573985,0.0992,0.083636,0.0992,0.063653
2,3.4728,3.235918,0.1892,0.179265,0.1892,0.156217
3,3.162,2.910298,0.2525,0.258265,0.2525,0.223867


[I 2025-03-25 22:18:49,074] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.004794768110099147, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0896,3.906732,0.038,0.012503,0.038,0.011911
2,3.8506,3.757166,0.0688,0.040627,0.0688,0.035111
3,3.7186,3.588461,0.1077,0.079036,0.1077,0.068969
4,3.5458,3.400377,0.1426,0.125837,0.1426,0.105353
5,3.3713,3.213976,0.1814,0.176692,0.1814,0.146047
6,3.2045,3.052033,0.224,0.21081,0.224,0.191073


[I 2025-03-25 22:26:21,482] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.00010269109613317165, 'weight_decay': 0.006, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1164,3.66154,0.0996,0.08762,0.0996,0.06158
2,3.6922,3.418407,0.1558,0.132713,0.1558,0.11781
3,3.423,3.148245,0.2137,0.19719,0.2137,0.174636


[I 2025-03-25 22:30:10,254] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 5.7047450064459894e-05, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1896,3.774694,0.0846,0.046194,0.0846,0.047922
2,3.8415,3.57077,0.122,0.097645,0.122,0.082323
3,3.6517,3.407372,0.1576,0.138667,0.1576,0.119226
4,3.4976,3.325582,0.1809,0.176235,0.1809,0.148911
5,3.3678,3.15319,0.211,0.195461,0.211,0.175104
6,3.2527,3.076954,0.2261,0.213502,0.2261,0.19067


[I 2025-03-25 22:37:39,206] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 6.179524839391358e-05, 'weight_decay': 0.002, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1891,3.775284,0.0826,0.071763,0.0826,0.047731
2,3.8172,3.536709,0.1296,0.104595,0.1296,0.093169
3,3.5961,3.344464,0.1712,0.150393,0.1712,0.133363


[I 2025-03-25 22:41:22,616] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0009700813739546189, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9053,3.650297,0.0818,0.045105,0.0818,0.044184
2,3.4853,3.270797,0.1689,0.164426,0.1689,0.132654
3,3.1768,2.932115,0.2423,0.222542,0.2423,0.208354
4,2.8891,2.750373,0.2884,0.305858,0.2884,0.262392
5,2.6502,2.464309,0.3591,0.363531,0.3591,0.334033
6,2.4379,2.304934,0.4088,0.407244,0.4088,0.395273
7,2.2547,2.182942,0.4368,0.435012,0.4368,0.421634
8,2.0812,2.136929,0.4477,0.455783,0.4477,0.436806
9,1.9273,2.140756,0.4503,0.453406,0.4503,0.439197
10,1.8023,2.01895,0.4789,0.480489,0.4789,0.471061


[I 2025-03-25 22:53:50,594] Trial 14 finished with value: 0.47106050067839594 and parameters: {'learning_rate': 0.0009700813739546189, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 14 with value: 0.47106050067839594.


Trial 15 with params: {'learning_rate': 0.000978873250785567, 'weight_decay': 0.009000000000000001, 'warmup_steps': 7, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8658,3.541898,0.0998,0.075311,0.0998,0.05752
2,3.4396,3.2075,0.1777,0.17724,0.1777,0.146381
3,3.1233,2.884252,0.2551,0.249746,0.2551,0.226845
4,2.8595,2.743951,0.2952,0.312158,0.2952,0.270149
5,2.6473,2.46295,0.3631,0.36238,0.3631,0.341113
6,2.4398,2.32389,0.3973,0.394202,0.3973,0.381956


[I 2025-03-25 23:01:18,482] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.001216237416589181, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9482,3.725064,0.0689,0.043762,0.0689,0.035796
2,3.6568,3.466044,0.1272,0.104083,0.1272,0.094764
3,3.3916,3.186769,0.1915,0.187736,0.1915,0.158333
4,3.1427,3.044496,0.2276,0.239596,0.2276,0.195559
5,2.9295,2.753752,0.2931,0.295493,0.2931,0.264033
6,2.7441,2.582666,0.3312,0.328964,0.3312,0.310486
7,2.5821,2.46958,0.3639,0.357106,0.3639,0.344031
8,2.4421,2.394377,0.382,0.387816,0.382,0.366541
9,2.3153,2.409801,0.3812,0.373963,0.3812,0.360001
10,2.2052,2.279311,0.4083,0.404013,0.4083,0.395327


[I 2025-03-25 23:13:44,445] Trial 16 finished with value: 0.39532661815803766 and parameters: {'learning_rate': 0.001216237416589181, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 14 with value: 0.47106050067839594.


Trial 17 with params: {'learning_rate': 0.0011293616727979141, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.94,3.704929,0.0718,0.041547,0.0718,0.03686
2,3.612,3.412202,0.1387,0.109949,0.1387,0.09971
3,3.3251,3.10127,0.2122,0.213545,0.2122,0.17661


[I 2025-03-25 23:17:27,485] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0005290018054075841, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8535,3.433629,0.1334,0.12366,0.1334,0.094581
2,3.2996,2.97256,0.2592,0.272951,0.2592,0.235249
3,2.8975,2.605948,0.33,0.321154,0.33,0.297484
4,2.5846,2.455738,0.3687,0.40157,0.3687,0.349879
5,2.3313,2.163102,0.4391,0.442496,0.4391,0.421296
6,2.0979,2.029295,0.4754,0.482381,0.4754,0.467734


[I 2025-03-25 23:24:55,107] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0012408666051807063, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9071,3.672522,0.0768,0.042015,0.0768,0.03978
2,3.509,3.2626,0.1733,0.157832,0.1733,0.138563
3,3.1991,2.967511,0.2384,0.230496,0.2384,0.205088
4,2.9419,2.810243,0.281,0.301202,0.281,0.256087
5,2.7163,2.528655,0.3551,0.360985,0.3551,0.329452
6,2.5106,2.349174,0.3937,0.390183,0.3937,0.377383
7,2.337,2.24296,0.4226,0.413623,0.4226,0.40369
8,2.1724,2.175645,0.4412,0.447285,0.4412,0.429479
9,2.0331,2.202799,0.4363,0.440499,0.4363,0.421088
10,1.909,2.063588,0.4712,0.472834,0.4712,0.463


[I 2025-03-25 23:37:19,334] Trial 19 finished with value: 0.4629996812648152 and parameters: {'learning_rate': 0.0012408666051807063, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 14 with value: 0.47106050067839594.


Trial 20 with params: {'learning_rate': 0.0011528136162582353, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8949,3.647086,0.0877,0.051677,0.0877,0.050371
2,3.4954,3.243725,0.1699,0.160199,0.1699,0.132943
3,3.1958,2.969909,0.2333,0.220768,0.2333,0.197715


[I 2025-03-25 23:41:02,016] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.0012045813355965416, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9134,3.726498,0.0727,0.038054,0.0727,0.03871
2,3.5953,3.405131,0.1439,0.127198,0.1439,0.111631
3,3.2814,3.057026,0.2209,0.217286,0.2209,0.186363
4,3.0049,2.834148,0.277,0.307886,0.277,0.251641
5,2.7753,2.567907,0.3447,0.345104,0.3447,0.317118
6,2.5742,2.387226,0.3846,0.3789,0.3846,0.367306
7,2.3992,2.298678,0.4044,0.401298,0.4044,0.384902
8,2.2407,2.229727,0.4168,0.423494,0.4168,0.402565
9,2.0985,2.231352,0.4258,0.432367,0.4258,0.412029
10,1.9842,2.119854,0.4473,0.450815,0.4473,0.437018


[I 2025-03-25 23:53:30,840] Trial 21 finished with value: 0.43701821946907843 and parameters: {'learning_rate': 0.0012045813355965416, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 14 with value: 0.47106050067839594.


Trial 22 with params: {'learning_rate': 0.0003391020650263581, 'weight_decay': 0.001, 'warmup_steps': 6, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8903,3.474786,0.1273,0.102777,0.1273,0.089261
2,3.3573,3.084857,0.2136,0.216063,0.2136,0.184845
3,2.9717,2.701167,0.3012,0.29814,0.3012,0.268867
4,2.641,2.50294,0.3577,0.380822,0.3577,0.333198
5,2.3745,2.215422,0.424,0.425914,0.424,0.405167
6,2.1337,2.108465,0.449,0.453893,0.449,0.439515


[I 2025-03-26 00:01:04,807] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0013931101860646207, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9383,3.684548,0.0791,0.052353,0.0791,0.044133
2,3.6324,3.454428,0.1355,0.119797,0.1355,0.100528
3,3.3545,3.126694,0.2053,0.196964,0.2053,0.170263
4,3.0943,2.975081,0.2523,0.270308,0.2523,0.223232
5,2.8597,2.673006,0.3226,0.329139,0.3226,0.298651
6,2.6599,2.501391,0.3629,0.363149,0.3629,0.345721
7,2.4854,2.35632,0.4003,0.396188,0.4003,0.382885
8,2.3277,2.293787,0.4073,0.415809,0.4073,0.391822
9,2.1903,2.302384,0.4087,0.408454,0.4087,0.39092
10,2.08,2.182335,0.4481,0.451261,0.4481,0.438406


[I 2025-03-26 00:13:41,857] Trial 23 finished with value: 0.4384057731322242 and parameters: {'learning_rate': 0.0013931101860646207, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 14 with value: 0.47106050067839594.


Trial 24 with params: {'learning_rate': 0.002308263324693165, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9426,3.787577,0.0571,0.032599,0.0571,0.023683
2,3.7123,3.585547,0.1032,0.090494,0.1032,0.068892
3,3.5218,3.320796,0.1561,0.155282,0.1561,0.117661


[I 2025-03-26 00:17:28,371] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.003030739320085235, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9873,3.798231,0.0611,0.02637,0.0611,0.027855
2,3.7542,3.67749,0.0836,0.077839,0.0836,0.050632
3,3.6053,3.468255,0.1315,0.111629,0.1315,0.092359
4,3.4179,3.293236,0.1655,0.152623,0.1655,0.125407
5,3.209,3.033789,0.2242,0.220084,0.2242,0.196698
6,3.0083,2.804726,0.2784,0.269476,0.2784,0.255136


[I 2025-03-26 00:25:00,308] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.001176594017473095, 'weight_decay': 0.004, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8882,3.575483,0.099,0.076198,0.099,0.063425
2,3.4506,3.244859,0.1804,0.163943,0.1804,0.146832
3,3.1338,2.896655,0.25,0.238489,0.25,0.215789


[I 2025-03-26 00:28:46,758] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.004440799462629861, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1076,3.905665,0.0398,0.013,0.0398,0.011842
2,3.8435,3.722701,0.0863,0.063313,0.0863,0.049265
3,3.6675,3.468427,0.1396,0.109351,0.1396,0.095934
4,3.4487,3.285386,0.1655,0.147042,0.1655,0.126052
5,3.242,3.086401,0.2168,0.200787,0.2168,0.179449
6,3.0578,2.882989,0.2558,0.249976,0.2558,0.224306


[I 2025-03-26 00:36:19,163] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0018194439261911175, 'weight_decay': 0.001, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.936,3.740962,0.0671,0.036788,0.0671,0.035655
2,3.6536,3.49322,0.1213,0.102542,0.1213,0.089084
3,3.3973,3.188812,0.185,0.163246,0.185,0.148257
4,3.153,3.011261,0.2322,0.250333,0.2322,0.202569
5,2.9533,2.761549,0.2873,0.289904,0.2873,0.262338
6,2.7693,2.59045,0.3306,0.324312,0.3306,0.3111
7,2.6059,2.456577,0.3651,0.355749,0.3651,0.343793
8,2.463,2.416237,0.3776,0.383875,0.3776,0.360853
9,2.3381,2.407953,0.3835,0.380173,0.3835,0.363448
10,2.2357,2.287908,0.4119,0.413575,0.4119,0.401546


[I 2025-03-26 00:48:45,135] Trial 28 finished with value: 0.4015458112039446 and parameters: {'learning_rate': 0.0018194439261911175, 'weight_decay': 0.001, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 14 with value: 0.47106050067839594.


Trial 29 with params: {'learning_rate': 0.0037867653604961434, 'weight_decay': 0.008, 'warmup_steps': 21, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0098,3.87885,0.0486,0.021105,0.0486,0.019378
2,3.818,3.684547,0.0847,0.057932,0.0847,0.048327
3,3.6395,3.48614,0.1261,0.103318,0.1261,0.08711


[I 2025-03-26 00:52:31,362] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0007243732057988554, 'weight_decay': 0.0, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.904,3.584441,0.1013,0.091809,0.1013,0.065161
2,3.4448,3.20527,0.1935,0.192385,0.1935,0.160195
3,3.1109,2.905169,0.256,0.254243,0.256,0.224819
4,2.8176,2.665037,0.3087,0.331521,0.3087,0.280398
5,2.5767,2.376825,0.3874,0.382429,0.3874,0.362922
6,2.3581,2.230463,0.4234,0.425288,0.4234,0.411593


[I 2025-03-26 01:00:03,158] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0007477553036348116, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8741,3.529783,0.1083,0.08871,0.1083,0.070574
2,3.3986,3.119143,0.2059,0.212723,0.2059,0.174228
3,3.0484,2.77227,0.2862,0.286663,0.2862,0.254776
4,2.7585,2.654373,0.3205,0.355708,0.3205,0.29596
5,2.5135,2.318791,0.3995,0.405199,0.3995,0.377144
6,2.2952,2.18576,0.4347,0.436441,0.4347,0.419578
7,2.1112,2.069385,0.4631,0.471507,0.4631,0.450967
8,1.9266,2.042067,0.4724,0.478785,0.4724,0.462696
9,1.7665,2.037165,0.4788,0.484392,0.4788,0.468328
10,1.6362,1.913326,0.5048,0.512621,0.5048,0.500894


[I 2025-03-26 01:12:36,069] Trial 31 finished with value: 0.5008940733078789 and parameters: {'learning_rate': 0.0007477553036348116, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 31 with value: 0.5008940733078789.


Trial 32 with params: {'learning_rate': 0.0005535150607937206, 'weight_decay': 0.004, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8589,3.453709,0.1225,0.098629,0.1225,0.08648
2,3.3362,3.057821,0.2225,0.226321,0.2225,0.194214
3,2.9772,2.687073,0.3058,0.300556,0.3058,0.273101
4,2.6728,2.570184,0.3321,0.361415,0.3321,0.307122
5,2.4328,2.260674,0.4104,0.416953,0.4104,0.39181
6,2.2042,2.124263,0.4502,0.457036,0.4502,0.441065
7,2.0029,2.037821,0.472,0.478506,0.472,0.460739
8,1.8098,1.994361,0.4801,0.485513,0.4801,0.470534
9,1.6393,1.994047,0.4858,0.496002,0.4858,0.478158
10,1.5089,1.897283,0.503,0.512441,0.503,0.500056


[I 2025-03-26 01:25:03,882] Trial 32 finished with value: 0.5000561803262452 and parameters: {'learning_rate': 0.0005535150607937206, 'weight_decay': 0.004, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 31 with value: 0.5008940733078789.


Trial 33 with params: {'learning_rate': 0.0008325840741825239, 'weight_decay': 0.005, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9203,3.582638,0.0998,0.085153,0.0998,0.058522
2,3.4671,3.24172,0.1785,0.173303,0.1785,0.145475
3,3.1291,2.861333,0.2667,0.264343,0.2667,0.237582
4,2.8391,2.692728,0.3052,0.33979,0.3052,0.272979
5,2.5968,2.422755,0.369,0.382549,0.369,0.345907
6,2.3785,2.253419,0.4169,0.418753,0.4169,0.403517
7,2.1889,2.128033,0.4477,0.448276,0.4477,0.433814
8,2.0172,2.088265,0.4606,0.46614,0.4606,0.449282
9,1.86,2.088841,0.4631,0.466493,0.4631,0.452157
10,1.7341,1.971701,0.4952,0.4955,0.4952,0.48838


[I 2025-03-26 01:37:35,528] Trial 33 finished with value: 0.488379989351326 and parameters: {'learning_rate': 0.0008325840741825239, 'weight_decay': 0.005, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 31 with value: 0.5008940733078789.


Trial 34 with params: {'learning_rate': 0.0004386459688910022, 'weight_decay': 0.003, 'warmup_steps': 14, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8812,3.446861,0.1333,0.119186,0.1333,0.094913
2,3.3075,2.983363,0.2516,0.263073,0.2516,0.223905
3,2.9081,2.616723,0.3269,0.320633,0.3269,0.295594
4,2.5983,2.508083,0.3478,0.365038,0.3478,0.32134
5,2.3515,2.219776,0.4306,0.435924,0.4306,0.412678
6,2.1289,2.078369,0.4633,0.472949,0.4633,0.455051
7,1.9254,1.99359,0.4855,0.495772,0.4855,0.476771
8,1.734,1.987282,0.4916,0.504287,0.4916,0.484783
9,1.5649,2.01127,0.4839,0.494902,0.4839,0.476795
10,1.4355,1.894892,0.5105,0.517392,0.5105,0.506033


[I 2025-03-26 01:50:07,181] Trial 34 finished with value: 0.5060332748384806 and parameters: {'learning_rate': 0.0004386459688910022, 'weight_decay': 0.003, 'warmup_steps': 14, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 34 with value: 0.5060332748384806.


Trial 35 with params: {'learning_rate': 0.0009100458875067605, 'weight_decay': 0.006, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9277,3.653066,0.0854,0.056953,0.0854,0.050527
2,3.545,3.289097,0.1725,0.148999,0.1725,0.134713
3,3.2264,2.994475,0.2329,0.220979,0.2329,0.197157
4,2.9465,2.793225,0.2804,0.296274,0.2804,0.252061
5,2.7117,2.516226,0.353,0.357953,0.353,0.328514
6,2.4935,2.32919,0.3955,0.39289,0.3955,0.380678
7,2.3091,2.228263,0.4238,0.422431,0.4238,0.408924
8,2.1352,2.17498,0.4346,0.438645,0.4346,0.42086
9,1.9854,2.20017,0.4372,0.437402,0.4372,0.421649
10,1.8615,2.052595,0.4657,0.468207,0.4657,0.457144


[I 2025-03-26 02:02:38,469] Trial 35 finished with value: 0.4571443063125598 and parameters: {'learning_rate': 0.0009100458875067605, 'weight_decay': 0.006, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 34 with value: 0.5060332748384806.


Trial 36 with params: {'learning_rate': 0.0005086220980811003, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8714,3.481662,0.123,0.102951,0.123,0.086354
2,3.3424,3.06888,0.2238,0.221589,0.2238,0.190714
3,2.9722,2.662844,0.3151,0.316842,0.3151,0.285454
4,2.6478,2.539665,0.3424,0.360361,0.3424,0.316999
5,2.3884,2.244202,0.4208,0.428302,0.4208,0.400048
6,2.1597,2.09026,0.4602,0.463184,0.4602,0.450991
7,1.9611,1.997663,0.4827,0.492532,0.4827,0.474334
8,1.77,1.989282,0.4818,0.493391,0.4818,0.473253
9,1.6026,2.004546,0.4863,0.493786,0.4863,0.476808
10,1.4634,1.885048,0.5096,0.516301,0.5096,0.504841


[I 2025-03-26 02:15:12,384] Trial 36 finished with value: 0.5048414992051901 and parameters: {'learning_rate': 0.0005086220980811003, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 34 with value: 0.5060332748384806.


Trial 37 with params: {'learning_rate': 0.00024701750706237303, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9466,3.488951,0.1294,0.099811,0.1294,0.088907
2,3.3882,3.05052,0.2335,0.247212,0.2335,0.201736
3,2.9873,2.686223,0.3105,0.304204,0.3105,0.278911
4,2.6581,2.499798,0.3602,0.379526,0.3602,0.33859
5,2.397,2.240494,0.4203,0.427783,0.4203,0.404104
6,2.1641,2.128657,0.4471,0.451465,0.4471,0.436053
7,1.9645,2.050752,0.4701,0.470163,0.4701,0.459116
8,1.7794,2.034249,0.4743,0.478674,0.4743,0.465005
9,1.6258,2.074105,0.4614,0.467442,0.4614,0.452
10,1.5128,1.980265,0.491,0.498188,0.491,0.48744


[I 2025-03-26 02:27:43,179] Trial 37 finished with value: 0.4874402326920508 and parameters: {'learning_rate': 0.00024701750706237303, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 34 with value: 0.5060332748384806.


Trial 38 with params: {'learning_rate': 0.00037983750578941106, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8863,3.445147,0.1268,0.126162,0.1268,0.087634
2,3.3111,2.951656,0.2518,0.267651,0.2518,0.221439
3,2.8863,2.612451,0.3297,0.32188,0.3297,0.298952
4,2.5574,2.433775,0.3727,0.400449,0.3727,0.34804
5,2.2911,2.17546,0.4408,0.44601,0.4408,0.422842
6,2.0541,2.04408,0.4721,0.476368,0.4721,0.462725
7,1.8413,1.945966,0.4959,0.507335,0.4959,0.48977
8,1.6484,1.952088,0.4912,0.501124,0.4912,0.483865
9,1.4809,1.968273,0.4922,0.503426,0.4922,0.484964
10,1.3516,1.886315,0.5106,0.522729,0.5106,0.507949


[I 2025-03-26 02:40:15,226] Trial 38 finished with value: 0.5079494715297588 and parameters: {'learning_rate': 0.00037983750578941106, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 3.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 39 with params: {'learning_rate': 0.0011017515049970927, 'weight_decay': 0.001, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9334,3.605195,0.0969,0.068364,0.0969,0.059936
2,3.494,3.265781,0.1691,0.178225,0.1691,0.13751
3,3.1607,2.934689,0.2438,0.258571,0.2438,0.211018


[I 2025-03-26 02:43:59,535] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0005088190543959069, 'weight_decay': 0.005, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8748,3.478165,0.1214,0.108105,0.1214,0.08387
2,3.3447,3.0138,0.238,0.237681,0.238,0.207845
3,2.9459,2.643863,0.316,0.297266,0.316,0.280532
4,2.6283,2.490865,0.3509,0.366754,0.3509,0.323864
5,2.3748,2.215549,0.431,0.428845,0.431,0.41254
6,2.1506,2.090752,0.4569,0.466215,0.4569,0.446992
7,1.9489,1.997921,0.4815,0.487186,0.4815,0.472446
8,1.7537,1.972031,0.4927,0.502945,0.4927,0.484608
9,1.5885,2.001736,0.4862,0.493256,0.4862,0.476568
10,1.451,1.878605,0.5096,0.515366,0.5096,0.504891


[I 2025-03-26 02:56:31,433] Trial 40 finished with value: 0.5048907655937307 and parameters: {'learning_rate': 0.0005088190543959069, 'weight_decay': 0.005, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 41 with params: {'learning_rate': 0.0001997135096607007, 'weight_decay': 0.002, 'warmup_steps': 20, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9919,3.550867,0.1189,0.107653,0.1189,0.082414
2,3.4554,3.135042,0.2168,0.215766,0.2168,0.180907
3,3.0901,2.791845,0.2944,0.282218,0.2944,0.261674
4,2.7779,2.628358,0.3311,0.349542,0.3311,0.304272
5,2.5108,2.349681,0.3928,0.397736,0.3928,0.37319
6,2.286,2.224579,0.4213,0.42145,0.4213,0.410044


[I 2025-03-26 03:04:02,748] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.00036731278743419685, 'weight_decay': 0.002, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8834,3.426183,0.1378,0.117532,0.1378,0.101505
2,3.3169,3.026701,0.2358,0.250049,0.2358,0.208161
3,2.9205,2.634386,0.3259,0.323504,0.3259,0.293388
4,2.5989,2.478773,0.36,0.389159,0.36,0.338646
5,2.331,2.18359,0.4384,0.445666,0.4384,0.423246
6,2.0986,2.054345,0.4639,0.472141,0.4639,0.455443
7,1.8905,1.980926,0.4846,0.488332,0.4846,0.475403
8,1.6903,1.967688,0.4915,0.499797,0.4915,0.484275
9,1.5283,1.999578,0.4838,0.491505,0.4838,0.475656
10,1.3987,1.886059,0.5094,0.514172,0.5094,0.5044


[I 2025-03-26 03:16:34,348] Trial 42 finished with value: 0.5044000435933891 and parameters: {'learning_rate': 0.00036731278743419685, 'weight_decay': 0.002, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 43 with params: {'learning_rate': 0.0007688753507565374, 'weight_decay': 0.004, 'warmup_steps': 20, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8784,3.568598,0.1069,0.085199,0.1069,0.069512
2,3.4062,3.1957,0.1822,0.179991,0.1822,0.148725
3,3.0718,2.817782,0.2765,0.266904,0.2765,0.244291
4,2.7828,2.666312,0.3039,0.334952,0.3039,0.274411
5,2.5516,2.384048,0.3832,0.393174,0.3832,0.365582
6,2.342,2.232771,0.427,0.428178,0.427,0.41352


[I 2025-03-26 03:24:01,771] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0002091541140284254, 'weight_decay': 0.006, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9926,3.526378,0.1262,0.108243,0.1262,0.086438
2,3.4613,3.155983,0.2136,0.218366,0.2136,0.17745
3,3.0927,2.763705,0.3016,0.288901,0.3016,0.266651
4,2.771,2.563449,0.3524,0.363573,0.3524,0.326796
5,2.5139,2.33536,0.403,0.406665,0.403,0.383058
6,2.2876,2.195156,0.4361,0.432337,0.4361,0.421722
7,2.0946,2.145884,0.4488,0.445836,0.4488,0.436419
8,1.9168,2.141147,0.4445,0.449009,0.4445,0.434433
9,1.7748,2.181317,0.4433,0.444143,0.4433,0.429017
10,1.6696,2.062424,0.4648,0.467432,0.4648,0.457416


[I 2025-03-26 03:36:39,272] Trial 44 finished with value: 0.4574156198417134 and parameters: {'learning_rate': 0.0002091541140284254, 'weight_decay': 0.006, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 38 with value: 0.5079494715297588.


Trial 45 with params: {'learning_rate': 0.00020597724859568273, 'weight_decay': 0.002, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9579,3.500018,0.1252,0.11701,0.1252,0.087789
2,3.4253,3.127189,0.2094,0.207491,0.2094,0.179298
3,3.0652,2.755845,0.299,0.291282,0.299,0.264986
4,2.7489,2.575051,0.3423,0.357397,0.3423,0.313366
5,2.4905,2.337736,0.3961,0.396878,0.3961,0.376173
6,2.2584,2.193022,0.431,0.428763,0.431,0.417369


[I 2025-03-26 03:44:11,419] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0004072596381116083, 'weight_decay': 0.005, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8906,3.457258,0.1249,0.113346,0.1249,0.090035
2,3.3245,2.993188,0.246,0.260124,0.246,0.218889
3,2.9259,2.62149,0.3205,0.308922,0.3205,0.288315
4,2.6019,2.45178,0.3619,0.387724,0.3619,0.340645
5,2.3423,2.201775,0.4258,0.426094,0.4258,0.407702
6,2.1062,2.05688,0.4686,0.47032,0.4686,0.457819
7,1.906,1.99307,0.4814,0.490201,0.4814,0.473177
8,1.7073,1.968657,0.491,0.498998,0.491,0.482551
9,1.5449,2.001305,0.486,0.499544,0.486,0.480091
10,1.4129,1.889308,0.5098,0.519615,0.5098,0.507441


[I 2025-03-26 03:56:39,901] Trial 46 finished with value: 0.5074408431538955 and parameters: {'learning_rate': 0.0004072596381116083, 'weight_decay': 0.005, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 4.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 47 with params: {'learning_rate': 0.00046859374433415015, 'weight_decay': 0.007, 'warmup_steps': 12, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8691,3.457491,0.1227,0.110377,0.1227,0.086916
2,3.3267,3.019954,0.2388,0.24929,0.2388,0.209003
3,2.9592,2.650692,0.3162,0.305467,0.3162,0.283307
4,2.6474,2.541683,0.3382,0.357369,0.3382,0.314705
5,2.3957,2.246755,0.4154,0.417836,0.4154,0.397082
6,2.1642,2.104124,0.4512,0.455209,0.4512,0.442599
7,1.966,2.024332,0.483,0.487653,0.483,0.473897
8,1.7703,2.010545,0.4813,0.489577,0.4813,0.471422
9,1.6031,2.046402,0.4704,0.474993,0.4704,0.45921
10,1.4671,1.910318,0.4999,0.509115,0.4999,0.497365


[I 2025-03-26 04:09:04,529] Trial 47 finished with value: 0.4973645679563052 and parameters: {'learning_rate': 0.00046859374433415015, 'weight_decay': 0.007, 'warmup_steps': 12, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 38 with value: 0.5079494715297588.


Trial 48 with params: {'learning_rate': 0.00038336856937958155, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.883,3.466388,0.1312,0.109025,0.1312,0.090892
2,3.3569,3.036505,0.2293,0.232435,0.2293,0.199053
3,2.9606,2.677099,0.3157,0.319569,0.3157,0.284804
4,2.6202,2.505528,0.3498,0.386195,0.3498,0.329288
5,2.3488,2.212723,0.423,0.42067,0.423,0.404031
6,2.1157,2.060544,0.4699,0.472228,0.4699,0.46036
7,1.9107,1.985103,0.4873,0.493371,0.4873,0.479267
8,1.7086,1.986725,0.4895,0.500509,0.4895,0.481658
9,1.5411,2.009473,0.4817,0.491846,0.4817,0.473304
10,1.4073,1.899656,0.5037,0.512577,0.5037,0.500208


[I 2025-03-26 04:21:30,817] Trial 48 finished with value: 0.5002081741395356 and parameters: {'learning_rate': 0.00038336856937958155, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 38 with value: 0.5079494715297588.


Trial 49 with params: {'learning_rate': 0.0005030408058863895, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8819,3.454602,0.1255,0.11898,0.1255,0.083061
2,3.3486,3.036964,0.228,0.243305,0.228,0.197678
3,2.9932,2.710319,0.3067,0.307981,0.3067,0.278171
4,2.6809,2.582161,0.3336,0.377221,0.3336,0.311542
5,2.4285,2.249893,0.4128,0.416111,0.4128,0.393133
6,2.1852,2.109703,0.4535,0.458304,0.4535,0.444223
7,1.9844,2.032543,0.479,0.488141,0.479,0.468134
8,1.7845,1.996275,0.4868,0.494929,0.4868,0.478968
9,1.6174,2.0093,0.4863,0.494513,0.4863,0.477189
10,1.4808,1.91105,0.5036,0.514521,0.5036,0.500532


[I 2025-03-26 04:33:57,353] Trial 49 finished with value: 0.5005315759267229 and parameters: {'learning_rate': 0.0005030408058863895, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 50 with params: {'learning_rate': 0.0021133792752108674, 'weight_decay': 0.005, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9615,3.730314,0.0641,0.033177,0.0641,0.030757
2,3.6854,3.510252,0.1131,0.092006,0.1131,0.075356
3,3.4496,3.2519,0.1687,0.158894,0.1687,0.129955
4,3.2303,3.100938,0.2075,0.205649,0.2075,0.178078
5,3.0317,2.841699,0.2665,0.262371,0.2665,0.238113
6,2.8556,2.672505,0.3027,0.292773,0.3027,0.281099


[I 2025-03-26 04:41:22,221] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0008062295432364033, 'weight_decay': 0.001, 'warmup_steps': 16, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8813,3.54709,0.1056,0.082337,0.1056,0.064505
2,3.376,3.153719,0.1968,0.220291,0.1968,0.166473
3,3.0317,2.771131,0.2831,0.300283,0.2831,0.250195
4,2.7486,2.677138,0.3125,0.336983,0.3125,0.286633
5,2.5209,2.361784,0.3931,0.397937,0.3931,0.371895
6,2.3119,2.182305,0.435,0.433093,0.435,0.42221


[I 2025-03-26 04:48:48,147] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0004028095062000762, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8916,3.491512,0.1162,0.102995,0.1162,0.080884
2,3.3568,3.035126,0.2345,0.238044,0.2345,0.19755
3,2.9688,2.691888,0.3119,0.316323,0.3119,0.282125
4,2.6374,2.488472,0.3593,0.380145,0.3593,0.335701
5,2.3812,2.250093,0.4202,0.425896,0.4202,0.403481
6,2.1456,2.106723,0.4575,0.463897,0.4575,0.448346
7,1.9447,1.999968,0.4812,0.486641,0.4812,0.47172
8,1.7495,2.005414,0.4823,0.493449,0.4823,0.473796
9,1.5857,2.00365,0.489,0.497085,0.489,0.479524
10,1.4527,1.899402,0.5049,0.513457,0.5049,0.502419


[I 2025-03-26 05:01:14,234] Trial 52 finished with value: 0.5024187300156698 and parameters: {'learning_rate': 0.0004028095062000762, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 3.5}. Best is trial 38 with value: 0.5079494715297588.


Trial 53 with params: {'learning_rate': 0.0005716487350887632, 'weight_decay': 0.003, 'warmup_steps': 8, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8597,3.497134,0.1249,0.093768,0.1249,0.08149
2,3.319,3.024021,0.2281,0.244829,0.2281,0.203133
3,2.9346,2.620505,0.3212,0.313061,0.3212,0.289411
4,2.6301,2.481717,0.3547,0.383319,0.3547,0.331237
5,2.381,2.24371,0.4111,0.415788,0.4111,0.389793
6,2.1584,2.104263,0.4527,0.461438,0.4527,0.444134
7,1.9601,1.990135,0.483,0.492535,0.483,0.472106
8,1.773,1.976598,0.4914,0.499368,0.4914,0.482313
9,1.6032,1.990102,0.4896,0.501318,0.4896,0.481118
10,1.4682,1.871733,0.5113,0.51734,0.5113,0.507925


[I 2025-03-26 05:13:46,058] Trial 53 finished with value: 0.5079251010803726 and parameters: {'learning_rate': 0.0005716487350887632, 'weight_decay': 0.003, 'warmup_steps': 8, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 54 with params: {'learning_rate': 0.00039249094076069857, 'weight_decay': 0.004, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.91,3.491349,0.124,0.116737,0.124,0.086517
2,3.3569,3.072896,0.2257,0.250921,0.2257,0.197434
3,2.9676,2.65814,0.3207,0.310165,0.3207,0.288347
4,2.6238,2.465761,0.3638,0.390175,0.3638,0.34186
5,2.355,2.209409,0.4343,0.437911,0.4343,0.417899
6,2.1186,2.085593,0.4539,0.46115,0.4539,0.445105
7,1.9179,1.973501,0.4815,0.48568,0.4815,0.472299
8,1.7223,1.971591,0.4897,0.501744,0.4897,0.48495
9,1.5614,2.005204,0.485,0.497727,0.485,0.478933
10,1.43,1.903989,0.5056,0.516902,0.5056,0.504129


[I 2025-03-26 05:26:09,517] Trial 54 finished with value: 0.5041287004681106 and parameters: {'learning_rate': 0.00039249094076069857, 'weight_decay': 0.004, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 55 with params: {'learning_rate': 7.242888062473813e-05, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.158,3.721058,0.0964,0.06842,0.0964,0.056627
2,3.7625,3.501426,0.1365,0.122594,0.1365,0.099763
3,3.5081,3.237498,0.1879,0.184649,0.1879,0.148638


[I 2025-03-26 05:29:55,301] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.00024601606295897527, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9352,3.489295,0.1277,0.124401,0.1277,0.094632
2,3.4035,3.081199,0.23,0.242076,0.23,0.193907
3,3.0363,2.735029,0.3038,0.304069,0.3038,0.271766
4,2.7175,2.556918,0.3441,0.367633,0.3441,0.320544
5,2.4453,2.27339,0.4115,0.416228,0.4115,0.39356
6,2.2117,2.182064,0.4343,0.43976,0.4343,0.42453
7,2.0101,2.079096,0.4628,0.469086,0.4628,0.454249
8,1.8204,2.072889,0.4672,0.475527,0.4672,0.459158
9,1.6674,2.110237,0.4564,0.466349,0.4564,0.446684
10,1.5526,2.001812,0.4801,0.484515,0.4801,0.473431


[I 2025-03-26 05:42:21,807] Trial 56 finished with value: 0.4734305389543029 and parameters: {'learning_rate': 0.00024601606295897527, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 0.0, 'temperature': 3.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 57 with params: {'learning_rate': 0.0006391916974737571, 'weight_decay': 0.005, 'warmup_steps': 25, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9019,3.50753,0.1103,0.096361,0.1103,0.077759
2,3.3916,3.078849,0.2234,0.225481,0.2234,0.192421
3,3.0182,2.731937,0.3018,0.30102,0.3018,0.26811
4,2.7184,2.579277,0.3288,0.353526,0.3288,0.303689
5,2.4797,2.307833,0.3988,0.400689,0.3988,0.378246
6,2.2653,2.165905,0.4363,0.4361,0.4363,0.422667


[I 2025-03-26 05:49:48,791] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.00036158885503338636, 'weight_decay': 0.001, 'warmup_steps': 20, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8951,3.421169,0.1349,0.115027,0.1349,0.095911
2,3.3204,3.008813,0.2434,0.253359,0.2434,0.212496
3,2.9244,2.624728,0.3226,0.328,0.3226,0.291495
4,2.5961,2.472697,0.3584,0.372426,0.3584,0.337149
5,2.3354,2.185855,0.4349,0.441904,0.4349,0.421241
6,2.0984,2.069962,0.4619,0.469954,0.4619,0.452889
7,1.8964,1.982978,0.4831,0.491828,0.4831,0.474705
8,1.7013,1.999661,0.4781,0.490562,0.4781,0.469631
9,1.5393,1.997042,0.4845,0.493411,0.4845,0.476074
10,1.4104,1.888427,0.5081,0.516245,0.5081,0.50416


[I 2025-03-26 06:02:21,733] Trial 58 finished with value: 0.504159968306162 and parameters: {'learning_rate': 0.00036158885503338636, 'weight_decay': 0.001, 'warmup_steps': 20, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 38 with value: 0.5079494715297588.


Trial 59 with params: {'learning_rate': 0.001106601853212097, 'weight_decay': 0.005, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9296,3.691244,0.0758,0.053266,0.0758,0.042963
2,3.5363,3.277565,0.1604,0.134981,0.1604,0.121817
3,3.2331,3.029147,0.2175,0.20083,0.2175,0.181168
4,2.9874,2.856272,0.269,0.305022,0.269,0.238554
5,2.7564,2.580316,0.3349,0.341988,0.3349,0.311356
6,2.5524,2.40387,0.3841,0.389588,0.3841,0.370103


[I 2025-03-26 06:09:54,824] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.00012525369862126539, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0592,3.605086,0.115,0.09852,0.115,0.074233
2,3.5948,3.300875,0.177,0.173172,0.177,0.138394
3,3.2804,2.986549,0.2509,0.237667,0.2509,0.217658


[I 2025-03-26 06:13:40,014] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.00031992947274899023, 'weight_decay': 0.002, 'warmup_steps': 6, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8997,3.454318,0.13,0.11421,0.13,0.090011
2,3.3536,3.038593,0.2439,0.24921,0.2439,0.213759
3,2.9433,2.637777,0.3248,0.318781,0.3248,0.293675
4,2.6134,2.452107,0.3702,0.386342,0.3702,0.345698
5,2.3557,2.18878,0.4298,0.429671,0.4298,0.411324
6,2.1155,2.082386,0.4559,0.462583,0.4559,0.44778
7,1.9089,1.985383,0.4856,0.488119,0.4856,0.475534
8,1.7151,1.990038,0.4818,0.491148,0.4818,0.474161
9,1.556,2.023244,0.4752,0.483974,0.4752,0.466147
10,1.4303,1.910467,0.5028,0.507312,0.5028,0.498546


[I 2025-03-26 06:26:09,540] Trial 61 finished with value: 0.49854575893881 and parameters: {'learning_rate': 0.00031992947274899023, 'weight_decay': 0.002, 'warmup_steps': 6, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 62 with params: {'learning_rate': 0.0003673052624329801, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8903,3.459525,0.1273,0.098819,0.1273,0.087851
2,3.3127,2.962109,0.2564,0.261215,0.2564,0.228471
3,2.9,2.608895,0.3285,0.322324,0.3285,0.298019
4,2.5767,2.455968,0.3713,0.397826,0.3713,0.352517
5,2.3084,2.181965,0.4361,0.432272,0.4361,0.4188
6,2.0738,2.037342,0.4732,0.48216,0.4732,0.466924
7,1.8666,1.955971,0.491,0.493601,0.491,0.481417
8,1.6711,1.963155,0.4964,0.50818,0.4964,0.491112
9,1.5062,1.996458,0.4831,0.493606,0.4831,0.475797
10,1.3765,1.893226,0.5084,0.519775,0.5084,0.50629


[I 2025-03-26 06:38:38,034] Trial 62 finished with value: 0.5062900513786359 and parameters: {'learning_rate': 0.0003673052624329801, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 63 with params: {'learning_rate': 0.0006080132103857521, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9014,3.554818,0.1102,0.099188,0.1102,0.072475
2,3.3912,3.08553,0.2165,0.222331,0.2165,0.182547
3,3.0181,2.739956,0.2948,0.294322,0.2948,0.262623
4,2.7095,2.588881,0.3303,0.363887,0.3303,0.305882
5,2.4567,2.265255,0.4197,0.423024,0.4197,0.399414
6,2.2272,2.147052,0.4447,0.451042,0.4447,0.434573
7,2.0342,2.025694,0.4791,0.483656,0.4791,0.467218
8,1.8499,1.998595,0.4814,0.490562,0.4814,0.470837
9,1.6863,2.026353,0.4774,0.487661,0.4774,0.467947
10,1.55,1.889855,0.5097,0.517247,0.5097,0.506004


[I 2025-03-26 06:51:06,676] Trial 63 finished with value: 0.5060037064599107 and parameters: {'learning_rate': 0.0006080132103857521, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 64 with params: {'learning_rate': 0.0008739086026908621, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9039,3.574824,0.1017,0.081428,0.1017,0.066195
2,3.4256,3.139591,0.2032,0.217289,0.2032,0.170149
3,3.0796,2.822572,0.2757,0.262941,0.2757,0.242063


[I 2025-03-26 06:54:53,664] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0009583604114510528, 'weight_decay': 0.0, 'warmup_steps': 20, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8982,3.627383,0.0831,0.066789,0.0831,0.05009
2,3.4719,3.22819,0.1774,0.165034,0.1774,0.146913
3,3.1713,2.949314,0.2489,0.2468,0.2489,0.217636
4,2.9212,2.796913,0.2867,0.30308,0.2867,0.256279
5,2.7027,2.509016,0.3526,0.361444,0.3526,0.330125
6,2.5049,2.345246,0.3958,0.395406,0.3958,0.380044


[I 2025-03-26 07:02:26,751] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.004387816666803014, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0137,3.803861,0.0545,0.041743,0.0545,0.02167
2,3.7778,3.689604,0.0828,0.058323,0.0828,0.049022
3,3.6352,3.506873,0.1189,0.113839,0.1189,0.081463


[I 2025-03-26 07:06:11,092] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0004940025574091448, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9038,3.518784,0.1179,0.116499,0.1179,0.07768
2,3.3837,3.081769,0.2219,0.220243,0.2219,0.189752
3,3.0225,2.730301,0.3014,0.302366,0.3014,0.270678
4,2.7108,2.566034,0.3403,0.358764,0.3403,0.314966
5,2.4477,2.267736,0.4149,0.418141,0.4149,0.397409
6,2.2114,2.128521,0.4547,0.46314,0.4547,0.445261
7,2.0074,2.018967,0.4794,0.48509,0.4794,0.469694
8,1.8127,1.999668,0.4862,0.495136,0.4862,0.478587
9,1.6482,2.00529,0.4841,0.492228,0.4841,0.475437
10,1.5089,1.904351,0.5071,0.512416,0.5071,0.502449


[I 2025-03-26 07:18:39,725] Trial 67 finished with value: 0.5024487119345264 and parameters: {'learning_rate': 0.0004940025574091448, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 68 with params: {'learning_rate': 0.0006179995897153078, 'weight_decay': 0.0, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8963,3.523026,0.1088,0.090415,0.1088,0.07013
2,3.3971,3.110924,0.2104,0.214575,0.2104,0.174852
3,3.0427,2.774377,0.2945,0.299367,0.2945,0.260234
4,2.732,2.632025,0.316,0.345927,0.316,0.287791
5,2.4868,2.335812,0.3937,0.400178,0.3937,0.373685
6,2.2653,2.148673,0.4405,0.440456,0.4405,0.426492
7,2.0697,2.073094,0.4616,0.468695,0.4616,0.447368
8,1.8889,2.031134,0.4779,0.488658,0.4779,0.468693
9,1.7223,2.020112,0.4826,0.488254,0.4826,0.472255
10,1.5867,1.931736,0.5025,0.50691,0.5025,0.4961


[I 2025-03-26 07:31:08,348] Trial 68 finished with value: 0.4961004333647407 and parameters: {'learning_rate': 0.0006179995897153078, 'weight_decay': 0.0, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 69 with params: {'learning_rate': 7.808255793137976e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 21, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1654,3.742302,0.0868,0.061352,0.0868,0.051001
2,3.7761,3.4994,0.1431,0.13798,0.1431,0.108912
3,3.5207,3.274633,0.1885,0.167987,0.1885,0.148746


[I 2025-03-26 07:34:53,133] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.000270742228453638, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9261,3.483182,0.1191,0.122079,0.1191,0.087128
2,3.3791,3.086015,0.2332,0.232707,0.2332,0.203793
3,2.9878,2.659137,0.3216,0.306993,0.3216,0.288586
4,2.6448,2.49102,0.3597,0.382963,0.3597,0.336626
5,2.3802,2.221896,0.4315,0.437896,0.4315,0.415614
6,2.1377,2.0923,0.4556,0.459159,0.4556,0.44643
7,1.9369,2.032359,0.4779,0.483752,0.4779,0.468021
8,1.7514,2.030137,0.4787,0.487514,0.4787,0.469581
9,1.5971,2.049825,0.4704,0.482272,0.4704,0.463107
10,1.4811,1.945886,0.4988,0.506059,0.4988,0.494108


[I 2025-03-26 07:47:22,400] Trial 70 finished with value: 0.4941077796843041 and parameters: {'learning_rate': 0.000270742228453638, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 38 with value: 0.5079494715297588.


Trial 71 with params: {'learning_rate': 0.0004384700251936054, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8954,3.448546,0.1304,0.108286,0.1304,0.091428
2,3.34,3.054404,0.229,0.228684,0.229,0.199057
3,2.9678,2.67624,0.3114,0.306081,0.3114,0.27952
4,2.6552,2.533102,0.337,0.364081,0.337,0.31225
5,2.405,2.251996,0.4171,0.412858,0.4171,0.397777
6,2.1728,2.113106,0.4501,0.449554,0.4501,0.437951
7,1.9678,2.040217,0.4674,0.471599,0.4674,0.45588
8,1.7648,2.034939,0.4711,0.479668,0.4711,0.461666
9,1.5993,2.041578,0.4739,0.482202,0.4739,0.465193
10,1.462,1.92216,0.5012,0.508624,0.5012,0.497323


[I 2025-03-26 07:59:49,001] Trial 71 finished with value: 0.4973227969709146 and parameters: {'learning_rate': 0.0004384700251936054, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 38 with value: 0.5079494715297588.


Trial 72 with params: {'learning_rate': 0.001789555076357524, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9518,3.748249,0.0628,0.035457,0.0628,0.03107
2,3.6826,3.559827,0.1094,0.096638,0.1094,0.078259
3,3.4558,3.284117,0.1666,0.150402,0.1666,0.130174


[I 2025-03-26 08:03:33,989] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.00032415594322927015, 'weight_decay': 0.001, 'warmup_steps': 14, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9215,3.504178,0.1242,0.100315,0.1242,0.083936
2,3.3756,3.053754,0.2303,0.247786,0.2303,0.196968
3,2.9895,2.677506,0.3133,0.312583,0.3133,0.282279
4,2.662,2.555982,0.3387,0.361364,0.3387,0.312867
5,2.3945,2.24412,0.4261,0.429555,0.4261,0.406714
6,2.1567,2.114009,0.4531,0.457598,0.4531,0.444331
7,1.953,2.031849,0.4714,0.481252,0.4714,0.463015
8,1.7533,2.013069,0.4772,0.487893,0.4772,0.471072
9,1.5899,2.037661,0.477,0.485397,0.477,0.467483
10,1.462,1.941568,0.4925,0.498286,0.4925,0.487741


[I 2025-03-26 08:15:57,462] Trial 73 finished with value: 0.48774140525251425 and parameters: {'learning_rate': 0.00032415594322927015, 'weight_decay': 0.001, 'warmup_steps': 14, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 38 with value: 0.5079494715297588.


Trial 74 with params: {'learning_rate': 0.0002952710041203322, 'weight_decay': 0.01, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9412,3.514573,0.1182,0.106142,0.1182,0.0814
2,3.3806,3.03708,0.238,0.249157,0.238,0.205127
3,2.9774,2.656194,0.3167,0.302846,0.3167,0.28652
4,2.6499,2.483079,0.3612,0.386552,0.3612,0.337362
5,2.3896,2.250975,0.4126,0.416183,0.4126,0.39568
6,2.1517,2.134617,0.4457,0.45221,0.4457,0.435355


[I 2025-03-26 08:23:24,306] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0006283655842637848, 'weight_decay': 0.005, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8852,3.485216,0.1215,0.095113,0.1215,0.077598
2,3.3719,3.076193,0.226,0.243667,0.226,0.199386
3,2.998,2.744354,0.2945,0.300035,0.2945,0.259366
4,2.6893,2.56125,0.3468,0.373919,0.3468,0.323293
5,2.4332,2.269604,0.4068,0.415942,0.4068,0.387642
6,2.2057,2.117091,0.4532,0.458037,0.4532,0.442571
7,2.0136,2.019751,0.4783,0.479575,0.4783,0.467769
8,1.8287,1.982853,0.4889,0.500118,0.4889,0.482434
9,1.6658,2.002251,0.4884,0.497669,0.4884,0.479956
10,1.534,1.887059,0.5125,0.524594,0.5125,0.511131


[I 2025-03-26 08:35:50,341] Trial 75 finished with value: 0.5111310431167122 and parameters: {'learning_rate': 0.0006283655842637848, 'weight_decay': 0.005, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 75 with value: 0.5111310431167122.


Trial 76 with params: {'learning_rate': 0.0005052098680649999, 'weight_decay': 0.005, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8888,3.498149,0.1226,0.09142,0.1226,0.082015
2,3.374,3.073087,0.2189,0.23169,0.2189,0.185505
3,2.9989,2.700478,0.3064,0.301674,0.3064,0.273646
4,2.6783,2.545368,0.34,0.354496,0.34,0.311652
5,2.4239,2.251049,0.4188,0.426252,0.4188,0.401033
6,2.1911,2.120206,0.4565,0.467905,0.4565,0.449939
7,1.9876,2.013148,0.4807,0.482725,0.4807,0.469083
8,1.7966,2.008441,0.4832,0.494863,0.4832,0.47455
9,1.6249,2.00272,0.4829,0.490003,0.4829,0.472401
10,1.491,1.912224,0.5025,0.509638,0.5025,0.498739


[I 2025-03-26 08:48:04,208] Trial 76 finished with value: 0.4987386373445207 and parameters: {'learning_rate': 0.0005052098680649999, 'weight_decay': 0.005, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 75 with value: 0.5111310431167122.


Trial 77 with params: {'learning_rate': 0.0005732408661814271, 'weight_decay': 0.006, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8876,3.554197,0.1075,0.104261,0.1075,0.075898
2,3.386,3.128801,0.2157,0.230138,0.2157,0.185227
3,3.0418,2.753177,0.2919,0.285962,0.2919,0.261493
4,2.7206,2.570458,0.3366,0.357756,0.3366,0.313561
5,2.4584,2.299954,0.4035,0.404232,0.4035,0.382597
6,2.2268,2.136702,0.4477,0.45504,0.4477,0.439316
7,2.0322,2.029305,0.4749,0.477971,0.4749,0.463433
8,1.8419,1.992768,0.4876,0.495299,0.4876,0.480417
9,1.6799,2.004261,0.4853,0.490594,0.4853,0.476657
10,1.5425,1.909792,0.51,0.518946,0.51,0.507477


[I 2025-03-26 09:00:21,826] Trial 77 finished with value: 0.5074772930795212 and parameters: {'learning_rate': 0.0005732408661814271, 'weight_decay': 0.006, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 78 with params: {'learning_rate': 0.0006774519277837021, 'weight_decay': 0.005, 'warmup_steps': 20, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8859,3.505506,0.1169,0.095024,0.1169,0.080458
2,3.3836,3.116407,0.2045,0.216826,0.2045,0.170996
3,3.0371,2.789811,0.2792,0.270847,0.2792,0.24474
4,2.7314,2.598435,0.3274,0.342689,0.3274,0.30134
5,2.4826,2.304351,0.4,0.407407,0.4,0.38089
6,2.2654,2.158951,0.4355,0.440278,0.4355,0.423901
7,2.0755,2.057761,0.469,0.469032,0.469,0.455391
8,1.8865,2.023532,0.4728,0.485791,0.4728,0.464949
9,1.7258,2.030414,0.4739,0.481259,0.4739,0.464456
10,1.5906,1.8981,0.51,0.516301,0.51,0.505327


[I 2025-03-26 09:12:34,277] Trial 78 finished with value: 0.5053265361811295 and parameters: {'learning_rate': 0.0006774519277837021, 'weight_decay': 0.005, 'warmup_steps': 20, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 75 with value: 0.5111310431167122.


Trial 79 with params: {'learning_rate': 0.00035454481274557943, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8822,3.478141,0.1192,0.112143,0.1192,0.079428
2,3.3237,3.01703,0.2466,0.26742,0.2466,0.220152
3,2.9319,2.63962,0.3212,0.314158,0.3212,0.291283
4,2.5973,2.485708,0.3607,0.385762,0.3607,0.338641
5,2.3326,2.194921,0.4284,0.430824,0.4284,0.412077
6,2.09,2.038523,0.4692,0.469994,0.4692,0.459858
7,1.8761,1.98677,0.4813,0.491362,0.4813,0.472461
8,1.6775,1.957058,0.4893,0.495611,0.4893,0.48247
9,1.512,1.979079,0.4794,0.490822,0.4794,0.473504
10,1.3816,1.89195,0.5084,0.518412,0.5084,0.506065


[I 2025-03-26 09:24:48,805] Trial 79 finished with value: 0.5060648531925044 and parameters: {'learning_rate': 0.00035454481274557943, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 80 with params: {'learning_rate': 0.000336297879215768, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8814,3.452591,0.1319,0.123741,0.1319,0.095731
2,3.3291,2.972857,0.2531,0.255999,0.2531,0.223488
3,2.9252,2.630523,0.3256,0.327625,0.3256,0.295838
4,2.5967,2.44626,0.3672,0.389742,0.3672,0.34252
5,2.3288,2.190597,0.4328,0.427311,0.4328,0.41428
6,2.0866,2.045474,0.4691,0.468925,0.4691,0.456882
7,1.8815,2.000916,0.481,0.494462,0.481,0.47266
8,1.6834,1.989421,0.4831,0.49512,0.4831,0.475333
9,1.518,2.020959,0.4783,0.484909,0.4783,0.468576
10,1.3895,1.896354,0.5103,0.516299,0.5103,0.506886


[I 2025-03-26 09:37:04,949] Trial 80 finished with value: 0.5068858464177592 and parameters: {'learning_rate': 0.000336297879215768, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 81 with params: {'learning_rate': 0.0003431234472279335, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8662,3.445426,0.1296,0.122317,0.1296,0.096689
2,3.3255,3.007072,0.2483,0.261383,0.2483,0.221195
3,2.9189,2.614884,0.3304,0.332697,0.3304,0.302342
4,2.583,2.450682,0.3722,0.40345,0.3722,0.353796
5,2.3196,2.18881,0.4309,0.438645,0.4309,0.416959
6,2.0839,2.046185,0.4682,0.470304,0.4682,0.458125
7,1.8744,1.972288,0.4828,0.490875,0.4828,0.473083
8,1.6752,1.972235,0.4909,0.501084,0.4909,0.483855
9,1.5127,1.987163,0.4899,0.498944,0.4899,0.482499
10,1.3831,1.887682,0.5038,0.509171,0.5038,0.499155


[I 2025-03-26 09:49:21,161] Trial 81 finished with value: 0.4991553030849781 and parameters: {'learning_rate': 0.0003431234472279335, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 3.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 82 with params: {'learning_rate': 0.00012593454765223402, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0383,3.586593,0.1068,0.092098,0.1068,0.069438
2,3.5823,3.307955,0.1805,0.170962,0.1805,0.14757
3,3.2835,2.991407,0.2496,0.225721,0.2496,0.21
4,3.0251,2.826148,0.2825,0.283107,0.2825,0.254083
5,2.8018,2.59203,0.3421,0.337133,0.3421,0.316643
6,2.6164,2.475832,0.3654,0.359176,0.3654,0.347265


[I 2025-03-26 09:56:39,756] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0001574284379617559, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0283,3.58938,0.1091,0.094704,0.1091,0.074585
2,3.5284,3.220569,0.1924,0.193155,0.1924,0.159644
3,3.2107,2.909077,0.2685,0.261503,0.2685,0.233886


[I 2025-03-26 10:00:20,491] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.0004858689231509453, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8545,3.461206,0.1268,0.116136,0.1268,0.086832
2,3.3218,3.002176,0.238,0.237462,0.238,0.207341
3,2.9495,2.641449,0.3163,0.304379,0.3163,0.282031
4,2.6314,2.528694,0.3433,0.370997,0.3433,0.316472
5,2.3725,2.204862,0.4335,0.438574,0.4335,0.416168
6,2.1417,2.070123,0.4582,0.459893,0.4582,0.4477
7,1.934,1.98684,0.4902,0.495066,0.4902,0.478608
8,1.7368,1.952789,0.4929,0.49939,0.4929,0.483541
9,1.5635,1.966362,0.4928,0.502009,0.4928,0.485893
10,1.4239,1.858877,0.5155,0.520141,0.5155,0.509826


[I 2025-03-26 10:12:32,262] Trial 84 finished with value: 0.5098259048039491 and parameters: {'learning_rate': 0.0004858689231509453, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 75 with value: 0.5111310431167122.


Trial 85 with params: {'learning_rate': 0.0005011933362834468, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8572,3.452448,0.1267,0.099472,0.1267,0.089775
2,3.3408,3.078885,0.2195,0.223549,0.2195,0.188731
3,2.9595,2.637085,0.3222,0.312313,0.3222,0.288329
4,2.6342,2.507635,0.3517,0.374662,0.3517,0.330452
5,2.3802,2.229783,0.4136,0.421432,0.4136,0.396614
6,2.1454,2.090935,0.4525,0.456548,0.4525,0.442964


[I 2025-03-26 10:19:50,424] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0005738477907468393, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8355,3.464953,0.1231,0.100224,0.1231,0.083102
2,3.3172,3.041786,0.2266,0.230319,0.2266,0.197049
3,2.9476,2.672426,0.3079,0.301261,0.3079,0.275781
4,2.6442,2.548224,0.3415,0.367443,0.3415,0.318175
5,2.3945,2.231628,0.4222,0.423384,0.4222,0.402826
6,2.1693,2.095102,0.4608,0.465432,0.4608,0.45244
7,1.9696,1.997645,0.4818,0.488708,0.4818,0.472877
8,1.7755,1.978556,0.4859,0.494418,0.4859,0.478511
9,1.6079,1.99619,0.4834,0.495558,0.4834,0.476117
10,1.4683,1.861427,0.5156,0.519462,0.5156,0.510836


[I 2025-03-26 10:32:09,225] Trial 86 finished with value: 0.5108360553231056 and parameters: {'learning_rate': 0.0005738477907468393, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 87 with params: {'learning_rate': 0.0012242047441216673, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9244,3.709045,0.0728,0.038553,0.0728,0.036918
2,3.5639,3.374284,0.1453,0.129777,0.1453,0.111631
3,3.2681,3.057053,0.2163,0.212713,0.2163,0.181599


[I 2025-03-26 10:35:48,868] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0002746315314534532, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.896,3.479789,0.1302,0.123147,0.1302,0.09195
2,3.3652,3.028161,0.2423,0.253659,0.2423,0.211017
3,2.9691,2.669427,0.3188,0.321436,0.3188,0.288788
4,2.6423,2.505771,0.3608,0.375868,0.3608,0.337693
5,2.3718,2.23008,0.4276,0.434168,0.4276,0.410968
6,2.1332,2.121572,0.4557,0.463825,0.4557,0.446838
7,1.9254,2.039688,0.4726,0.476219,0.4726,0.461909
8,1.7279,2.070328,0.466,0.478222,0.466,0.457573
9,1.569,2.077834,0.4725,0.477006,0.4725,0.461853
10,1.4483,1.970876,0.4896,0.501899,0.4896,0.486297


[I 2025-03-26 10:48:02,220] Trial 88 finished with value: 0.48629703046249034 and parameters: {'learning_rate': 0.0002746315314534532, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 89 with params: {'learning_rate': 0.0004722917832222688, 'weight_decay': 0.006, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8718,3.487584,0.1204,0.106657,0.1204,0.084479
2,3.3595,3.051868,0.2282,0.242075,0.2282,0.198611
3,2.9886,2.683225,0.3112,0.300535,0.3112,0.276059
4,2.6735,2.530707,0.3404,0.365864,0.3404,0.316527
5,2.4196,2.244622,0.4179,0.422732,0.4179,0.398428
6,2.1892,2.112489,0.4558,0.458589,0.4558,0.445595
7,1.9934,2.006464,0.476,0.473032,0.476,0.463757
8,1.7967,1.98838,0.4837,0.491795,0.4837,0.474954
9,1.6298,2.003918,0.4886,0.496686,0.4886,0.479453
10,1.4925,1.891885,0.5081,0.514571,0.5081,0.503501


[I 2025-03-26 11:00:15,549] Trial 89 finished with value: 0.5035011083637074 and parameters: {'learning_rate': 0.0004722917832222688, 'weight_decay': 0.006, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 90 with params: {'learning_rate': 0.0007765177516708139, 'weight_decay': 0.01, 'warmup_steps': 23, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9123,3.54372,0.105,0.097434,0.105,0.06567
2,3.4156,3.168318,0.1986,0.213763,0.1986,0.165106
3,3.0863,2.815769,0.2722,0.265888,0.2722,0.240984
4,2.8006,2.688709,0.3045,0.319318,0.3045,0.276323
5,2.5724,2.427049,0.3836,0.391339,0.3836,0.36402
6,2.3641,2.26667,0.4124,0.414279,0.4124,0.396854


[I 2025-03-26 11:07:33,591] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0006501622487839108, 'weight_decay': 0.005, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8655,3.506619,0.1129,0.092926,0.1129,0.073448
2,3.3717,3.089107,0.2138,0.22544,0.2138,0.186032
3,3.02,2.733615,0.2926,0.296453,0.2926,0.260138
4,2.7232,2.602332,0.3313,0.354872,0.3313,0.306043
5,2.4856,2.32856,0.3989,0.409804,0.3989,0.379647
6,2.2713,2.175499,0.4334,0.43506,0.4334,0.421953


[I 2025-03-26 11:14:54,190] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.00035658125289380894, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8876,3.422679,0.1328,0.135316,0.1328,0.094313
2,3.2992,2.992266,0.2492,0.255542,0.2492,0.214876
3,2.9107,2.602767,0.3328,0.330505,0.3328,0.303625
4,2.5762,2.439439,0.3719,0.390382,0.3719,0.347634
5,2.312,2.176675,0.4345,0.434681,0.4345,0.41727
6,2.0826,2.073427,0.4618,0.468985,0.4618,0.453718
7,1.8743,1.977427,0.4855,0.493812,0.4855,0.477454
8,1.6799,1.978268,0.4857,0.49628,0.4857,0.480002
9,1.5143,2.000201,0.4846,0.497419,0.4846,0.478925
10,1.3846,1.895077,0.5077,0.517941,0.5077,0.505144


[I 2025-03-26 11:27:08,354] Trial 92 finished with value: 0.5051443725592578 and parameters: {'learning_rate': 0.00035658125289380894, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 93 with params: {'learning_rate': 0.00265768294671018, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9878,3.817034,0.0567,0.025378,0.0567,0.023355
2,3.756,3.633794,0.0935,0.065193,0.0935,0.057768
3,3.5755,3.417035,0.1372,0.109899,0.1372,0.096341


[I 2025-03-26 11:30:48,636] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0007521753609460533, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8951,3.597645,0.0959,0.073446,0.0959,0.058944
2,3.4438,3.185322,0.187,0.177578,0.187,0.15131
3,3.0808,2.838982,0.2776,0.280125,0.2776,0.249503
4,2.7875,2.652934,0.3118,0.336047,0.3118,0.285785
5,2.5483,2.375934,0.3855,0.390498,0.3855,0.362422
6,2.3343,2.219204,0.4255,0.42517,0.4255,0.412083


[I 2025-03-26 11:38:08,483] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.00023206247154351742, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9493,3.516695,0.121,0.095532,0.121,0.083243
2,3.416,3.103404,0.2246,0.22459,0.2246,0.191395
3,3.0511,2.740211,0.2997,0.301047,0.2997,0.269345
4,2.7232,2.554437,0.3442,0.367236,0.3442,0.320195
5,2.452,2.310079,0.4079,0.411258,0.4079,0.389692
6,2.211,2.180047,0.4378,0.438919,0.4378,0.42529


[I 2025-03-26 11:45:28,679] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 5.399635979922363e-05, 'weight_decay': 0.0, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2194,3.848634,0.0714,0.046034,0.0714,0.036925
2,3.8765,3.598083,0.1189,0.098421,0.1189,0.08206
3,3.6552,3.398249,0.1599,0.137545,0.1599,0.12158
4,3.4994,3.310779,0.1803,0.166729,0.1803,0.146197
5,3.3714,3.159079,0.2115,0.196309,0.2115,0.179919
6,3.2678,3.098933,0.2192,0.220726,0.2192,0.184679


[I 2025-03-26 11:52:48,048] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 9.986365127990428e-05, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1012,3.649841,0.0975,0.079608,0.0975,0.062719
2,3.675,3.404149,0.1591,0.141014,0.1591,0.12423
3,3.4118,3.131647,0.2222,0.201963,0.2222,0.182597


[I 2025-03-26 11:56:27,912] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.00018907309877158132, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9739,3.500661,0.1289,0.118878,0.1289,0.090414
2,3.4381,3.094357,0.2303,0.233619,0.2303,0.197475
3,3.0595,2.769561,0.3001,0.296222,0.3001,0.267159
4,2.7456,2.562057,0.3447,0.360245,0.3447,0.320665
5,2.4913,2.337751,0.4013,0.408437,0.4013,0.382508
6,2.2639,2.215937,0.4261,0.430585,0.4261,0.414456


[I 2025-03-26 12:03:46,907] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0017422533204379319, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9331,3.746485,0.0632,0.03448,0.0632,0.032151
2,3.6379,3.449962,0.1286,0.100676,0.1286,0.087616
3,3.356,3.107095,0.205,0.190476,0.205,0.166457
4,3.1046,2.971034,0.2377,0.251636,0.2377,0.208418
5,2.8845,2.709354,0.3115,0.320507,0.3115,0.288953
6,2.6957,2.548924,0.3505,0.350158,0.3505,0.330927


[I 2025-03-26 12:11:05,973] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.00026885910198952694, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9695,3.521972,0.1169,0.102756,0.1169,0.077126
2,3.4265,3.111565,0.2189,0.220483,0.2189,0.183938
3,3.0461,2.751113,0.2978,0.293015,0.2978,0.267236
4,2.6992,2.53354,0.3493,0.367148,0.3493,0.325482
5,2.4233,2.264065,0.4134,0.414341,0.4134,0.39566
6,2.1855,2.131968,0.4485,0.456127,0.4485,0.439829
7,1.9801,2.050439,0.4707,0.478524,0.4707,0.461887
8,1.7924,2.035825,0.4756,0.482819,0.4756,0.467058
9,1.6384,2.085908,0.4631,0.470798,0.4631,0.45399
10,1.5173,1.979453,0.4889,0.499667,0.4889,0.485334


[I 2025-03-26 12:23:19,830] Trial 100 finished with value: 0.4853339047101348 and parameters: {'learning_rate': 0.00026885910198952694, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 101 with params: {'learning_rate': 0.00031502971397332646, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8934,3.488222,0.1219,0.111838,0.1219,0.084736
2,3.366,3.051736,0.2356,0.238448,0.2356,0.204181
3,2.9613,2.679002,0.3104,0.301543,0.3104,0.27729
4,2.6224,2.494399,0.3579,0.386707,0.3579,0.335457
5,2.3516,2.227572,0.4272,0.429154,0.4272,0.40923
6,2.1122,2.078665,0.4618,0.46808,0.4618,0.452768
7,1.9074,2.006152,0.4884,0.496103,0.4884,0.478456
8,1.7087,2.019813,0.4816,0.491469,0.4816,0.473401
9,1.5476,2.030381,0.4786,0.481389,0.4786,0.467485
10,1.4187,1.932703,0.5001,0.506824,0.5001,0.49728


[I 2025-03-26 12:35:32,687] Trial 101 finished with value: 0.4972802393162273 and parameters: {'learning_rate': 0.00031502971397332646, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 6.5}. Best is trial 75 with value: 0.5111310431167122.


Trial 102 with params: {'learning_rate': 0.0001288686614173822, 'weight_decay': 0.006, 'warmup_steps': 10, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0767,3.61689,0.1151,0.083229,0.1151,0.075418
2,3.614,3.331321,0.1745,0.167022,0.1745,0.138653
3,3.2992,3.002292,0.2446,0.236556,0.2446,0.205257
4,3.0373,2.849884,0.28,0.291705,0.28,0.247879
5,2.8269,2.606648,0.3383,0.337676,0.3383,0.315085
6,2.6217,2.485457,0.3657,0.364426,0.3657,0.346703


[I 2025-03-26 12:42:52,846] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.001755014470696089, 'weight_decay': 0.003, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9352,3.720698,0.0667,0.033658,0.0667,0.032442
2,3.6463,3.459541,0.1244,0.094685,0.1244,0.082893
3,3.407,3.169483,0.185,0.163671,0.185,0.147893


[I 2025-03-26 12:46:33,547] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.00026392051924687435, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9248,3.48089,0.1224,0.131503,0.1224,0.086824
2,3.3556,3.024782,0.2438,0.237927,0.2438,0.211342
3,2.9527,2.620292,0.3321,0.332295,0.3321,0.30223
4,2.6231,2.477483,0.3678,0.382336,0.3678,0.343875
5,2.3528,2.237134,0.4154,0.422919,0.4154,0.399069
6,2.1174,2.105234,0.454,0.459664,0.454,0.444928
7,1.9128,2.036724,0.4718,0.475094,0.4718,0.461055
8,1.7201,2.0445,0.4681,0.479929,0.4681,0.460389
9,1.565,2.078145,0.4642,0.474553,0.4642,0.453991
10,1.4446,1.949589,0.4922,0.500298,0.4922,0.488415


[I 2025-03-26 12:58:54,837] Trial 104 finished with value: 0.4884147367318628 and parameters: {'learning_rate': 0.00026392051924687435, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 75 with value: 0.5111310431167122.


Trial 105 with params: {'learning_rate': 0.001394113520827695, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9248,3.658,0.0779,0.058574,0.0779,0.045609
2,3.5595,3.368678,0.1499,0.134655,0.1499,0.115678
3,3.2763,3.058039,0.2173,0.218239,0.2173,0.179116


[I 2025-03-26 13:02:41,202] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.00016644555832767357, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9927,3.53064,0.1178,0.101813,0.1178,0.076488
2,3.4947,3.221318,0.1969,0.196598,0.1969,0.163408
3,3.1857,2.890917,0.2712,0.260306,0.2712,0.239984
4,2.9029,2.714147,0.3086,0.328951,0.3086,0.280963
5,2.6589,2.475103,0.3725,0.371724,0.3725,0.350597
6,2.4416,2.322346,0.4037,0.396774,0.4037,0.386389


[I 2025-03-26 13:10:03,828] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.00037702628416615467, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8819,3.448723,0.1282,0.118545,0.1282,0.090356
2,3.3365,2.997875,0.2461,0.25639,0.2461,0.218435
3,2.945,2.673865,0.3105,0.305897,0.3105,0.277626
4,2.6237,2.506937,0.3519,0.378629,0.3519,0.32888
5,2.3629,2.203114,0.4286,0.432224,0.4286,0.409387
6,2.1271,2.07432,0.4559,0.465704,0.4559,0.448331
7,1.9254,1.985713,0.4862,0.486407,0.4862,0.472936
8,1.7224,1.983826,0.4819,0.4883,0.4819,0.471405
9,1.5557,2.021982,0.4788,0.491302,0.4788,0.470734
10,1.4248,1.891982,0.5057,0.511646,0.5057,0.500618


[I 2025-03-26 13:22:18,738] Trial 107 finished with value: 0.5006181509834038 and parameters: {'learning_rate': 0.00037702628416615467, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 108 with params: {'learning_rate': 0.0006661897866245444, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8758,3.558175,0.1093,0.103709,0.1093,0.077481
2,3.3741,3.121153,0.2116,0.213546,0.2116,0.1788
3,3.015,2.738438,0.2946,0.286576,0.2946,0.259065
4,2.7011,2.598984,0.3318,0.361263,0.3318,0.306153
5,2.4497,2.277533,0.4112,0.418766,0.4112,0.392914
6,2.2291,2.104963,0.4554,0.456427,0.4554,0.444703
7,2.038,2.010412,0.4824,0.48299,0.4824,0.470767
8,1.8447,1.996621,0.4833,0.493592,0.4833,0.473924
9,1.6828,2.013214,0.4823,0.492399,0.4823,0.474599
10,1.547,1.886603,0.5151,0.519244,0.5151,0.511028


[I 2025-03-26 13:34:46,306] Trial 108 finished with value: 0.511027956524135 and parameters: {'learning_rate': 0.0006661897866245444, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 3.5}. Best is trial 75 with value: 0.5111310431167122.


Trial 109 with params: {'learning_rate': 0.0017183437098516675, 'weight_decay': 0.008, 'warmup_steps': 21, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9563,3.759344,0.0655,0.044375,0.0655,0.036641
2,3.648,3.433374,0.136,0.105244,0.136,0.098166
3,3.3605,3.138742,0.2048,0.201003,0.2048,0.164309
4,3.0987,2.936485,0.2529,0.259356,0.2529,0.218461
5,2.8773,2.685707,0.3144,0.32046,0.3144,0.285197
6,2.6964,2.543468,0.3412,0.339719,0.3412,0.323715


[I 2025-03-26 13:42:07,212] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0003389725793240962, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8839,3.460933,0.1255,0.111718,0.1255,0.087881
2,3.3468,3.01809,0.2406,0.240814,0.2406,0.207166
3,2.952,2.653278,0.3202,0.308623,0.3202,0.288234
4,2.6093,2.485928,0.3667,0.392855,0.3667,0.345755
5,2.3373,2.197831,0.4357,0.438032,0.4357,0.420152
6,2.099,2.076827,0.4598,0.463374,0.4598,0.449183
7,1.8893,1.973112,0.4897,0.497218,0.4897,0.48239
8,1.6928,1.985191,0.4857,0.497002,0.4857,0.478391
9,1.5286,2.010174,0.4799,0.486838,0.4799,0.468522
10,1.3982,1.897138,0.5078,0.513462,0.5078,0.504131


[I 2025-03-26 13:54:28,363] Trial 110 finished with value: 0.5041313088002517 and parameters: {'learning_rate': 0.0003389725793240962, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 75 with value: 0.5111310431167122.


Trial 111 with params: {'learning_rate': 0.000864884629562241, 'weight_decay': 0.006, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8963,3.624531,0.088,0.074861,0.088,0.054173
2,3.451,3.199815,0.1728,0.168067,0.1728,0.136958
3,3.1219,2.891608,0.2533,0.265162,0.2533,0.218528


[I 2025-03-26 13:58:13,011] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0003275583004060039, 'weight_decay': 0.002, 'warmup_steps': 15, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9099,3.476864,0.1228,0.099475,0.1228,0.085733
2,3.3362,3.002071,0.2398,0.246994,0.2398,0.20841
3,2.9318,2.6276,0.3221,0.318452,0.3221,0.290403
4,2.5943,2.479349,0.3659,0.387786,0.3659,0.342761
5,2.3259,2.198008,0.4321,0.433182,0.4321,0.41539
6,2.0911,2.062521,0.4621,0.465669,0.4621,0.451572
7,1.8852,2.006403,0.48,0.486184,0.48,0.470429
8,1.6884,1.985644,0.4854,0.496044,0.4854,0.479258
9,1.525,2.02961,0.4771,0.48219,0.4771,0.466563
10,1.4001,1.906936,0.5037,0.507064,0.5037,0.498371


[I 2025-03-26 14:10:33,664] Trial 112 finished with value: 0.4983714808372909 and parameters: {'learning_rate': 0.0003275583004060039, 'weight_decay': 0.002, 'warmup_steps': 15, 'lambda_param': 0.0, 'temperature': 2.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 113 with params: {'learning_rate': 0.0005699717032182727, 'weight_decay': 0.007, 'warmup_steps': 14, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8805,3.500922,0.1131,0.087273,0.1131,0.078624
2,3.3463,3.04834,0.223,0.212676,0.223,0.189653
3,2.977,2.693842,0.3074,0.298496,0.3074,0.274901
4,2.6726,2.554663,0.3413,0.360553,0.3413,0.314104
5,2.4248,2.277473,0.4114,0.41859,0.4114,0.392586
6,2.2,2.129603,0.4519,0.456497,0.4519,0.441583


[I 2025-03-26 14:17:56,912] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0008906485966908172, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8897,3.581582,0.0978,0.064432,0.0978,0.061836
2,3.456,3.191478,0.1909,0.183967,0.1909,0.157894
3,3.1293,2.910627,0.2514,0.249788,0.2514,0.220471
4,2.8597,2.704339,0.3012,0.317965,0.3012,0.273895
5,2.6278,2.443621,0.3627,0.36764,0.3627,0.339313
6,2.4201,2.285849,0.4085,0.403968,0.4085,0.393343


[I 2025-03-26 14:25:21,526] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0007818417292192163, 'weight_decay': 0.002, 'warmup_steps': 8, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8578,3.528928,0.1058,0.081107,0.1058,0.07092
2,3.3684,3.091688,0.2138,0.201469,0.2138,0.184005
3,3.0231,2.784482,0.2814,0.285891,0.2814,0.251178


[I 2025-03-26 14:29:04,016] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0006996135603342016, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.905,3.598309,0.0948,0.085856,0.0948,0.058577
2,3.451,3.148196,0.1995,0.200361,0.1995,0.166935
3,3.0869,2.86534,0.257,0.257763,0.257,0.227799
4,2.7908,2.663397,0.3094,0.337548,0.3094,0.284496
5,2.5439,2.353518,0.3944,0.395162,0.3944,0.37335
6,2.3315,2.209659,0.4265,0.426522,0.4265,0.413469


[I 2025-03-26 14:36:28,687] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.00012050092247739796, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0789,3.605866,0.11,0.080667,0.11,0.072116
2,3.5993,3.305885,0.182,0.170159,0.182,0.145937
3,3.2931,3.030372,0.2361,0.230997,0.2361,0.198202


[I 2025-03-26 14:40:11,247] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0011492762075918583, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9173,3.666496,0.0828,0.055912,0.0828,0.049864
2,3.5483,3.32991,0.1618,0.146245,0.1618,0.127574
3,3.2387,3.022794,0.228,0.219447,0.228,0.195967


[I 2025-03-26 14:43:53,435] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.00043371506384188896, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8676,3.418805,0.1313,0.119913,0.1313,0.093013
2,3.311,2.986322,0.2449,0.247467,0.2449,0.215703
3,2.9135,2.62665,0.3236,0.32138,0.3236,0.291632
4,2.595,2.458199,0.3611,0.37782,0.3611,0.337659
5,2.331,2.189565,0.4348,0.440802,0.4348,0.420264
6,2.0994,2.050537,0.4647,0.467476,0.4647,0.454212
7,1.8939,1.964114,0.4833,0.491731,0.4833,0.474532
8,1.6918,1.978724,0.4856,0.499965,0.4856,0.479513
9,1.5226,1.989925,0.4892,0.501429,0.4892,0.481289
10,1.3869,1.867199,0.5139,0.522424,0.5139,0.510532


[I 2025-03-26 14:56:17,282] Trial 119 finished with value: 0.5105321627971413 and parameters: {'learning_rate': 0.00043371506384188896, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 75 with value: 0.5111310431167122.


Trial 120 with params: {'learning_rate': 0.0001750563431473434, 'weight_decay': 0.003, 'warmup_steps': 6, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9881,3.546252,0.1166,0.123644,0.1166,0.076091
2,3.4896,3.19302,0.2108,0.196352,0.2108,0.176879
3,3.1547,2.864674,0.2747,0.265498,0.2747,0.24076


[I 2025-03-26 15:00:02,207] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 8.532115701682182e-05, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1546,3.734529,0.0922,0.062567,0.0922,0.054943
2,3.7431,3.454577,0.1506,0.137926,0.1506,0.114641
3,3.4756,3.206381,0.2025,0.179632,0.2025,0.162673
4,3.273,3.099478,0.2288,0.236861,0.2288,0.195855
5,3.1034,2.879553,0.2733,0.261338,0.2733,0.242965
6,2.9589,2.801752,0.2892,0.280013,0.2892,0.260274


[I 2025-03-26 15:07:30,418] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.00049497207425666, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8569,3.467353,0.1202,0.096764,0.1202,0.082272
2,3.2989,2.938947,0.2578,0.272209,0.2578,0.229842
3,2.8812,2.601117,0.3316,0.33997,0.3316,0.303693
4,2.5658,2.441954,0.3685,0.401637,0.3685,0.345745
5,2.31,2.17913,0.4352,0.441357,0.4352,0.417343
6,2.0795,2.05479,0.4677,0.473711,0.4677,0.459222
7,1.8817,1.962373,0.4947,0.499286,0.4947,0.486284
8,1.6814,1.951714,0.4933,0.505531,0.4933,0.487839
9,1.5146,1.990895,0.4892,0.501234,0.4892,0.483195
10,1.3763,1.858297,0.5207,0.530059,0.5207,0.518285


[I 2025-03-26 15:19:52,514] Trial 122 finished with value: 0.5182847790196461 and parameters: {'learning_rate': 0.00049497207425666, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 3.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 123 with params: {'learning_rate': 0.0006656426432625081, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8475,3.448075,0.1269,0.107356,0.1269,0.087706
2,3.3357,3.067806,0.23,0.240578,0.23,0.203157
3,2.9667,2.65566,0.3125,0.313427,0.3125,0.283368
4,2.6529,2.540204,0.338,0.358899,0.338,0.310851
5,2.4096,2.276993,0.4045,0.410082,0.4045,0.38469
6,2.1932,2.107307,0.4515,0.454651,0.4515,0.438995
7,2.0032,2.036615,0.4724,0.478087,0.4724,0.461392
8,1.8176,1.987907,0.4816,0.490261,0.4816,0.474041
9,1.6556,2.022758,0.4808,0.488438,0.4808,0.47282
10,1.517,1.900032,0.5066,0.512903,0.5066,0.501211


[I 2025-03-26 15:32:17,400] Trial 123 finished with value: 0.5012113345848449 and parameters: {'learning_rate': 0.0006656426432625081, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 122 with value: 0.5182847790196461.


Trial 124 with params: {'learning_rate': 0.0003464977502124078, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8719,3.422404,0.1336,0.142478,0.1336,0.093167
2,3.2714,2.91785,0.2633,0.26934,0.2633,0.233806
3,2.852,2.566946,0.3469,0.335443,0.3469,0.315869
4,2.5314,2.390952,0.3851,0.404123,0.3851,0.362587
5,2.2706,2.17841,0.4321,0.440249,0.4321,0.418043
6,2.0317,2.040455,0.4691,0.476751,0.4691,0.461396
7,1.822,1.964967,0.4873,0.495869,0.4873,0.47858
8,1.6266,1.949057,0.4906,0.500303,0.4906,0.483222
9,1.4598,2.014649,0.4856,0.4997,0.4856,0.477353
10,1.3339,1.886767,0.5097,0.520994,0.5097,0.507044


[I 2025-03-26 15:44:50,560] Trial 124 finished with value: 0.507043582876592 and parameters: {'learning_rate': 0.0003464977502124078, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 125 with params: {'learning_rate': 0.00011095693396299871, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0636,3.611323,0.113,0.090204,0.113,0.075666
2,3.6075,3.297404,0.1858,0.178919,0.1858,0.153347
3,3.3143,3.037764,0.2419,0.233946,0.2419,0.2067
4,3.0761,2.872633,0.2775,0.280931,0.2775,0.245081
5,2.8753,2.672093,0.318,0.314153,0.318,0.289538
6,2.6942,2.550997,0.3461,0.339113,0.3461,0.323917


[I 2025-03-26 15:52:14,029] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.0004487275062819783, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8904,3.50144,0.1143,0.084094,0.1143,0.079017
2,3.3713,3.088794,0.2224,0.238951,0.2224,0.193115
3,2.9764,2.658898,0.3128,0.309966,0.3128,0.278027
4,2.6482,2.526797,0.3482,0.366731,0.3482,0.324054
5,2.3816,2.208402,0.43,0.431345,0.43,0.410305
6,2.1444,2.074998,0.4601,0.464888,0.4601,0.449462
7,1.941,1.983552,0.4834,0.489501,0.4834,0.472266
8,1.7394,1.96452,0.4867,0.498437,0.4867,0.479064
9,1.5692,1.972221,0.4901,0.499559,0.4901,0.484355
10,1.4298,1.869364,0.5103,0.521736,0.5103,0.508797


[I 2025-03-26 16:04:34,707] Trial 126 finished with value: 0.5087968537360639 and parameters: {'learning_rate': 0.0004487275062819783, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 127 with params: {'learning_rate': 0.0008515811631642015, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8865,3.614036,0.0895,0.070973,0.0895,0.054848
2,3.4598,3.187005,0.1826,0.163388,0.1826,0.143146
3,3.1248,2.901408,0.2561,0.25273,0.2561,0.219238
4,2.8431,2.70426,0.2995,0.321651,0.2995,0.276201
5,2.6042,2.394072,0.3781,0.381929,0.3781,0.356455
6,2.3865,2.253381,0.4263,0.427902,0.4263,0.412828


[I 2025-03-26 16:12:00,046] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0004367248584488247, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8688,3.439123,0.1255,0.103666,0.1255,0.086118
2,3.3155,3.025836,0.2395,0.243598,0.2395,0.212289
3,2.944,2.682636,0.3039,0.285865,0.3039,0.2702
4,2.6311,2.530338,0.3458,0.37081,0.3458,0.321269
5,2.3686,2.214986,0.4274,0.438505,0.4274,0.412068
6,2.138,2.069297,0.4578,0.454901,0.4578,0.447294
7,1.9314,1.999396,0.4773,0.482564,0.4773,0.468152
8,1.7341,1.972208,0.4863,0.492926,0.4863,0.477019
9,1.5677,2.019432,0.4791,0.492591,0.4791,0.47114
10,1.4314,1.884813,0.5069,0.510799,0.5069,0.501018


[I 2025-03-26 16:24:24,558] Trial 128 finished with value: 0.5010183378220051 and parameters: {'learning_rate': 0.0004367248584488247, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}. Best is trial 122 with value: 0.5182847790196461.


Trial 129 with params: {'learning_rate': 0.00026739434487091146, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.897,3.476945,0.1242,0.115913,0.1242,0.087938
2,3.3721,3.111902,0.2144,0.212296,0.2144,0.179894
3,2.9895,2.682797,0.3103,0.307313,0.3103,0.277753
4,2.6567,2.506971,0.3543,0.374055,0.3543,0.329306
5,2.3832,2.236754,0.4208,0.424754,0.4208,0.402586
6,2.1488,2.106951,0.4555,0.452551,0.4555,0.443965


[I 2025-03-26 16:31:52,640] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0004213956436865399, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8636,3.449178,0.1275,0.117246,0.1275,0.089383
2,3.3306,3.038723,0.2317,0.242932,0.2317,0.197747
3,2.9374,2.641119,0.3249,0.312494,0.3249,0.293396
4,2.615,2.493613,0.3512,0.372583,0.3512,0.325691
5,2.3492,2.230178,0.4246,0.431609,0.4246,0.408128
6,2.1178,2.077882,0.4644,0.470426,0.4644,0.45452
7,1.9111,1.9693,0.4867,0.492094,0.4867,0.477711
8,1.7083,1.955988,0.4871,0.493809,0.4871,0.480178
9,1.5399,1.985251,0.4898,0.499288,0.4898,0.481564
10,1.4041,1.883112,0.5101,0.518753,0.5101,0.507069


[I 2025-03-26 16:44:15,017] Trial 130 finished with value: 0.5070688350718213 and parameters: {'learning_rate': 0.0004213956436865399, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 122 with value: 0.5182847790196461.


Trial 131 with params: {'learning_rate': 0.0005930802566689036, 'weight_decay': 0.003, 'warmup_steps': 6, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8854,3.561097,0.102,0.08785,0.102,0.065923
2,3.4094,3.086409,0.2204,0.213867,0.2204,0.188113
3,3.0395,2.765219,0.2878,0.287621,0.2878,0.255865
4,2.7211,2.568499,0.3364,0.352139,0.3364,0.311833
5,2.4601,2.308109,0.4012,0.404423,0.4012,0.378761
6,2.2397,2.11859,0.4496,0.447358,0.4496,0.437742
7,2.042,2.034123,0.4699,0.469296,0.4699,0.456327
8,1.8492,2.037422,0.473,0.485672,0.473,0.462977
9,1.6782,2.041887,0.4795,0.488368,0.4795,0.469877
10,1.5459,1.91712,0.5094,0.519844,0.5094,0.505315


[I 2025-03-26 16:56:38,255] Trial 131 finished with value: 0.5053154846545928 and parameters: {'learning_rate': 0.0005930802566689036, 'weight_decay': 0.003, 'warmup_steps': 6, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 122 with value: 0.5182847790196461.


Trial 132 with params: {'learning_rate': 0.00038767115629543625, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8571,3.436732,0.1228,0.09872,0.1228,0.084117
2,3.3032,2.983721,0.2519,0.258052,0.2519,0.223751
3,2.9109,2.607136,0.3293,0.32174,0.3293,0.296304
4,2.5915,2.466099,0.3606,0.389972,0.3606,0.339378
5,2.3339,2.172433,0.4351,0.433147,0.4351,0.414479
6,2.1045,2.078282,0.464,0.468578,0.464,0.455151
7,1.9006,1.971915,0.4841,0.488763,0.4841,0.474833
8,1.6991,1.958079,0.4888,0.496738,0.4888,0.480862
9,1.53,2.010432,0.4824,0.492737,0.4824,0.473206
10,1.3966,1.885799,0.5051,0.513751,0.5051,0.501455


[I 2025-03-26 17:08:59,739] Trial 132 finished with value: 0.5014545383571708 and parameters: {'learning_rate': 0.00038767115629543625, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 2.5}. Best is trial 122 with value: 0.5182847790196461.


Trial 133 with params: {'learning_rate': 0.00022813694684778728, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9533,3.468002,0.1321,0.121751,0.1321,0.091846
2,3.3997,3.107559,0.2185,0.223025,0.2185,0.186835
3,3.0233,2.684448,0.3223,0.324428,0.3223,0.290704
4,2.6853,2.530499,0.3529,0.369457,0.3529,0.327859
5,2.4163,2.282706,0.4108,0.415232,0.4108,0.393055
6,2.1816,2.137927,0.4418,0.443799,0.4418,0.430865
7,1.9817,2.05906,0.4625,0.465019,0.4625,0.4533
8,1.8004,2.072025,0.4644,0.473444,0.4644,0.45413
9,1.6591,2.108898,0.4652,0.470553,0.4652,0.453733
10,1.5451,1.994023,0.4838,0.490921,0.4838,0.478644


[I 2025-03-26 17:21:25,631] Trial 133 finished with value: 0.4786438921935279 and parameters: {'learning_rate': 0.00022813694684778728, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 134 with params: {'learning_rate': 0.00042033463586058897, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8572,3.460599,0.1201,0.103712,0.1201,0.08169
2,3.318,2.998336,0.2409,0.246133,0.2409,0.209064
3,2.9332,2.654049,0.3136,0.306487,0.3136,0.281571
4,2.6137,2.469498,0.3576,0.375831,0.3576,0.331385
5,2.352,2.185046,0.4338,0.437832,0.4338,0.41621
6,2.1133,2.06271,0.4667,0.474866,0.4667,0.456737
7,1.9109,1.991882,0.4829,0.486718,0.4829,0.471625
8,1.7152,1.973988,0.4942,0.501815,0.4942,0.484544
9,1.5474,1.969222,0.4898,0.498256,0.4898,0.481214
10,1.4166,1.869393,0.5184,0.524393,0.5184,0.513783


[I 2025-03-26 17:33:46,882] Trial 134 finished with value: 0.5137834451201487 and parameters: {'learning_rate': 0.00042033463586058897, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 122 with value: 0.5182847790196461.


Trial 135 with params: {'learning_rate': 0.0006334182085695436, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8568,3.471729,0.1222,0.102982,0.1222,0.085046
2,3.3338,3.056853,0.2318,0.246959,0.2318,0.200825
3,2.9757,2.68907,0.3069,0.297646,0.3069,0.276135
4,2.6675,2.564832,0.3299,0.364818,0.3299,0.305243
5,2.4151,2.241601,0.4171,0.421458,0.4171,0.39626
6,2.1856,2.12155,0.4409,0.44363,0.4409,0.429411
7,1.9961,2.00299,0.476,0.481623,0.476,0.464365
8,1.8022,1.969995,0.4856,0.494544,0.4856,0.476658
9,1.6273,1.986024,0.4894,0.497742,0.4894,0.480349
10,1.4847,1.872898,0.5101,0.517255,0.5101,0.505434


[I 2025-03-26 17:46:09,548] Trial 135 finished with value: 0.5054338089610253 and parameters: {'learning_rate': 0.0006334182085695436, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 3.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 136 with params: {'learning_rate': 0.00039505793117628623, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8523,3.439949,0.1231,0.110143,0.1231,0.083508
2,3.2926,2.981116,0.2505,0.250874,0.2505,0.219245
3,2.8909,2.578043,0.331,0.316913,0.331,0.299632
4,2.5679,2.441617,0.369,0.398074,0.369,0.348031
5,2.303,2.154397,0.4396,0.443611,0.4396,0.423082
6,2.0646,2.040091,0.47,0.478022,0.47,0.460684
7,1.8576,1.94901,0.4911,0.494041,0.4911,0.48134
8,1.6625,1.935844,0.4975,0.504427,0.4975,0.489836
9,1.5018,1.976206,0.4874,0.494455,0.4874,0.4786
10,1.3679,1.847605,0.5177,0.525451,0.5177,0.515106


[I 2025-03-26 17:58:45,773] Trial 136 finished with value: 0.5151055213204917 and parameters: {'learning_rate': 0.00039505793117628623, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 137 with params: {'learning_rate': 0.00019319559239646588, 'weight_decay': 0.004, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9765,3.549951,0.1123,0.103085,0.1123,0.077871
2,3.4568,3.130216,0.2123,0.225074,0.2123,0.178248
3,3.1014,2.796623,0.2946,0.286285,0.2946,0.261487
4,2.7991,2.627841,0.3305,0.341832,0.3305,0.303245
5,2.5432,2.40226,0.3749,0.375396,0.3749,0.356087
6,2.321,2.254081,0.4132,0.411234,0.4132,0.398342


[I 2025-03-26 18:06:09,660] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.00030633796421336264, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9254,3.489452,0.1184,0.107212,0.1184,0.080032
2,3.3503,3.014554,0.2321,0.238621,0.2321,0.19938
3,2.9362,2.648046,0.3188,0.326454,0.3188,0.291064
4,2.6021,2.477543,0.3571,0.377554,0.3571,0.332985
5,2.3405,2.223447,0.4185,0.418804,0.4185,0.402356
6,2.1079,2.100476,0.4498,0.453502,0.4498,0.441278
7,1.8989,2.016663,0.4726,0.478586,0.4726,0.462887
8,1.7028,2.005736,0.479,0.486305,0.479,0.470375
9,1.5412,2.060118,0.4669,0.475503,0.4669,0.457554
10,1.4132,1.943182,0.4941,0.502107,0.4941,0.490763


[I 2025-03-26 18:18:34,131] Trial 138 finished with value: 0.4907629513228177 and parameters: {'learning_rate': 0.00030633796421336264, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 3.5}. Best is trial 122 with value: 0.5182847790196461.


Trial 139 with params: {'learning_rate': 0.00028040263644054866, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9066,3.469419,0.1228,0.104079,0.1228,0.085881
2,3.3183,2.974414,0.2588,0.271203,0.2588,0.230889
3,2.8983,2.584791,0.3354,0.337652,0.3354,0.304668
4,2.5672,2.484617,0.3568,0.379984,0.3568,0.332644
5,2.3032,2.179757,0.4327,0.440751,0.4327,0.416639
6,2.0678,2.066448,0.4645,0.467104,0.4645,0.455153
7,1.8639,1.988153,0.4795,0.486625,0.4795,0.470808
8,1.6696,2.020843,0.4771,0.489322,0.4771,0.469953
9,1.5161,2.035001,0.4703,0.476649,0.4703,0.460992
10,1.3938,1.924874,0.4965,0.506563,0.4965,0.493491


[I 2025-03-26 18:30:55,702] Trial 139 finished with value: 0.4934907953689105 and parameters: {'learning_rate': 0.00028040263644054866, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 140 with params: {'learning_rate': 0.0006193479031612188, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8626,3.508016,0.1145,0.099214,0.1145,0.077223
2,3.3735,3.093536,0.2185,0.216783,0.2185,0.188688
3,3.038,2.772281,0.2894,0.289457,0.2894,0.25987
4,2.7349,2.608583,0.3255,0.364303,0.3255,0.304723
5,2.4868,2.337789,0.3965,0.404601,0.3965,0.377877
6,2.2567,2.163945,0.4455,0.451502,0.4455,0.433712
7,2.0624,2.045645,0.4701,0.473362,0.4701,0.4594
8,1.8714,2.018436,0.477,0.48811,0.477,0.466383
9,1.7056,2.029649,0.4784,0.484776,0.4784,0.468252
10,1.5718,1.9081,0.5029,0.509673,0.5029,0.498061


[I 2025-03-26 18:43:14,468] Trial 140 finished with value: 0.49806124923571543 and parameters: {'learning_rate': 0.0006193479031612188, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 141 with params: {'learning_rate': 0.0002981418565140718, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8899,3.461136,0.1309,0.134162,0.1309,0.092771
2,3.3393,3.014108,0.2482,0.241745,0.2482,0.219548
3,2.9364,2.628281,0.3283,0.314772,0.3283,0.293327
4,2.5966,2.446577,0.3718,0.403812,0.3718,0.354761
5,2.3282,2.207782,0.4281,0.436046,0.4281,0.411532
6,2.0917,2.066829,0.4646,0.47067,0.4646,0.457782
7,1.8849,1.997888,0.4812,0.483435,0.4812,0.471109
8,1.6906,1.999353,0.4822,0.492977,0.4822,0.47649
9,1.5294,2.040967,0.4762,0.47999,0.4762,0.466387
10,1.4044,1.920214,0.5037,0.506773,0.5037,0.499012


[I 2025-03-26 18:55:36,298] Trial 141 finished with value: 0.4990124918132649 and parameters: {'learning_rate': 0.0002981418565140718, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 142 with params: {'learning_rate': 0.00017688370147691295, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.991,3.521532,0.1211,0.116302,0.1211,0.085193
2,3.4734,3.145343,0.2125,0.226679,0.2125,0.176385
3,3.1257,2.820149,0.2859,0.273709,0.2859,0.251645
4,2.8283,2.655825,0.3267,0.344391,0.3267,0.301345
5,2.5737,2.400062,0.3812,0.386888,0.3812,0.361689
6,2.3513,2.270546,0.4118,0.409237,0.4118,0.396935


[I 2025-03-26 19:02:58,824] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0002659276568555364, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8956,3.469648,0.1326,0.107688,0.1326,0.094505
2,3.3636,3.047016,0.2404,0.250657,0.2404,0.206129
3,2.9668,2.629698,0.3279,0.326278,0.3279,0.298581
4,2.6241,2.474944,0.3663,0.396605,0.3663,0.344005
5,2.3605,2.216833,0.4234,0.427949,0.4234,0.406231
6,2.1244,2.087337,0.4558,0.458173,0.4558,0.445067


[I 2025-03-26 19:10:21,043] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.00023110258601521564, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.946,3.489925,0.1229,0.098787,0.1229,0.083955
2,3.4046,3.077488,0.2232,0.23872,0.2232,0.189607
3,3.0271,2.693259,0.3156,0.304686,0.3156,0.281036
4,2.698,2.540407,0.3468,0.364814,0.3468,0.324447
5,2.4369,2.29232,0.4044,0.414601,0.4044,0.386045
6,2.2021,2.183268,0.4373,0.440304,0.4373,0.424667
7,2.01,2.084809,0.4639,0.463858,0.4639,0.452499
8,1.8271,2.076436,0.4689,0.477336,0.4689,0.459776
9,1.683,2.116102,0.4589,0.465308,0.4589,0.447947
10,1.5709,2.006282,0.4874,0.491488,0.4874,0.48154


[I 2025-03-26 19:22:44,028] Trial 144 finished with value: 0.4815397370658176 and parameters: {'learning_rate': 0.00023110258601521564, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 145 with params: {'learning_rate': 0.0003348475583025199, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8688,3.436032,0.1363,0.117541,0.1363,0.099238
2,3.3195,3.020827,0.2372,0.248129,0.2372,0.205463
3,2.9167,2.630778,0.3252,0.323939,0.3252,0.293078
4,2.5781,2.460581,0.3621,0.386817,0.3621,0.341114
5,2.3069,2.1549,0.4413,0.444736,0.4413,0.425041
6,2.0715,2.028172,0.4704,0.474168,0.4704,0.45981
7,1.8648,1.982184,0.4843,0.492707,0.4843,0.475028
8,1.6672,1.981159,0.4895,0.500415,0.4895,0.481655
9,1.5014,1.991419,0.4836,0.492698,0.4836,0.474884
10,1.3703,1.8968,0.508,0.515701,0.508,0.504284


[I 2025-03-26 19:35:07,690] Trial 145 finished with value: 0.5042837816042376 and parameters: {'learning_rate': 0.0003348475583025199, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 146 with params: {'learning_rate': 0.000358316709037866, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8833,3.476015,0.1286,0.108714,0.1286,0.088649
2,3.3322,2.997249,0.2505,0.249141,0.2505,0.218892
3,2.936,2.63761,0.3224,0.317758,0.3224,0.288655
4,2.6023,2.471397,0.3595,0.394553,0.3595,0.338845
5,2.3311,2.199253,0.4295,0.433384,0.4295,0.411723
6,2.0965,2.06624,0.4583,0.458134,0.4583,0.446825
7,1.8858,1.982004,0.4841,0.490776,0.4841,0.475124
8,1.6875,1.996033,0.4832,0.49529,0.4832,0.477635
9,1.5234,1.985508,0.4852,0.495785,0.4852,0.477034
10,1.3916,1.900122,0.5053,0.513314,0.5053,0.50195


[I 2025-03-26 19:47:29,024] Trial 146 finished with value: 0.5019495890229785 and parameters: {'learning_rate': 0.000358316709037866, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 122 with value: 0.5182847790196461.


Trial 147 with params: {'learning_rate': 0.0025651152134400176, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9634,3.778428,0.0553,0.033033,0.0553,0.029565
2,3.7122,3.609957,0.1039,0.081395,0.1039,0.069411
3,3.5064,3.31458,0.1561,0.147815,0.1561,0.119258


[I 2025-03-26 19:51:11,772] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0016076950598799939, 'weight_decay': 0.004, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9404,3.731105,0.0634,0.040018,0.0634,0.030336
2,3.6259,3.464541,0.1312,0.108167,0.1312,0.095815
3,3.362,3.166537,0.1936,0.180103,0.1936,0.155711


[I 2025-03-26 19:54:54,303] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.00036175490456492393, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9107,3.435325,0.1282,0.097487,0.1282,0.084195
2,3.3025,2.953835,0.2567,0.262607,0.2567,0.229138
3,2.8944,2.580575,0.3309,0.323724,0.3309,0.299945
4,2.5689,2.482928,0.3555,0.391606,0.3555,0.333282
5,2.3067,2.183867,0.4367,0.441992,0.4367,0.420007
6,2.0682,2.06321,0.467,0.474565,0.467,0.459876
7,1.8614,1.984611,0.4821,0.490378,0.4821,0.474019
8,1.6659,1.966948,0.4883,0.493588,0.4883,0.478705
9,1.4999,2.007833,0.4804,0.487065,0.4804,0.471535
10,1.373,1.89727,0.5055,0.515627,0.5055,0.503846


[I 2025-03-26 20:07:22,408] Trial 149 finished with value: 0.5038455837632714 and parameters: {'learning_rate': 0.00036175490456492393, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 122 with value: 0.5182847790196461.


In [None]:
print(best_distill_random)

BestRun(run_id='122', objective=0.5182847790196461, hyperparameters={'learning_rate': 0.00049497207425666, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 3.0}, run_summary=None)


In [None]:
base.reset_seed()

## Prohledávání s normálním tréninkem s doučením klasifikační hlavy předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [None]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-head_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-head_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [None]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [None]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [None]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.freeze_model(base.get_mobilenet(100))
)
  

Nastavení prohledávání.

In [None]:
best_base_head = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-03-26 20:07:23,117] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5202,2.770046,0.4225,0.468604,0.4225,0.406531
2,2.3527,2.19929,0.4778,0.497693,0.4778,0.473857
3,2.0066,2.018175,0.5049,0.50788,0.5049,0.495592


[I 2025-03-26 20:09:48,407] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7196,2.086023,0.4824,0.521987,0.4824,0.471569
2,1.8035,1.858829,0.5099,0.535738,0.5099,0.509495
3,1.617,1.7699,0.5321,0.54182,0.5321,0.525333
4,1.5164,1.812177,0.523,0.547373,0.523,0.519518
5,1.4524,1.729588,0.5372,0.55033,0.5372,0.531638
6,1.3997,1.683485,0.5523,0.558256,0.5523,0.548053


[I 2025-03-26 20:14:30,131] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2963,3.974202,0.2446,0.277944,0.2446,0.234597
2,3.6819,3.448128,0.3665,0.379516,0.3665,0.355963
3,3.2624,3.130024,0.414,0.4169,0.414,0.396761


[I 2025-03-26 20:16:52,812] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5663,1.97103,0.4913,0.542859,0.4913,0.481489
2,1.6994,1.814313,0.5185,0.548373,0.5185,0.518457
3,1.5304,1.74986,0.5337,0.551255,0.5337,0.527317
4,1.4346,1.812558,0.521,0.55065,0.521,0.51859
5,1.3705,1.730676,0.5376,0.557007,0.5376,0.533488
6,1.3143,1.673102,0.5477,0.553594,0.5477,0.543535


[I 2025-03-26 20:21:34,155] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3053,1.928462,0.4926,0.556274,0.4926,0.486354
2,1.6561,1.852242,0.511,0.551477,0.511,0.510541
3,1.4985,1.795897,0.5296,0.555184,0.5296,0.52144
4,1.4032,1.883305,0.5115,0.54743,0.5115,0.508827
5,1.3358,1.796153,0.524,0.552362,0.524,0.520899
6,1.2709,1.712751,0.5421,0.551909,0.5421,0.53828


[I 2025-03-26 20:26:17,264] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.059,3.56754,0.339,0.381459,0.339,0.320825
2,3.1648,2.894534,0.4335,0.447343,0.4335,0.42273
3,2.6857,2.579561,0.4635,0.465928,0.4635,0.448779
4,2.4167,2.430411,0.4595,0.47636,0.4595,0.448813
5,2.2557,2.295245,0.4829,0.485774,0.4829,0.469647
6,2.1499,2.209571,0.5002,0.496518,0.5002,0.490183


[I 2025-03-26 20:31:02,403] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3101,2.539508,0.4432,0.482323,0.4432,0.429581
2,2.1618,2.0694,0.4892,0.510634,0.4892,0.487155
3,1.8734,1.922327,0.5152,0.519791,0.5152,0.506977
4,1.7385,1.918225,0.5113,0.531605,0.5113,0.506576
5,1.6589,1.824849,0.5292,0.53573,0.5292,0.521889
6,1.601,1.774767,0.5426,0.54591,0.5426,0.537888


[I 2025-03-26 20:35:44,152] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1456,3.720205,0.3109,0.352351,0.3109,0.295644
2,3.3535,3.089255,0.4176,0.436292,0.4176,0.406808
3,2.8838,2.762203,0.4488,0.451054,0.4488,0.432724
4,2.5993,2.587095,0.4461,0.462757,0.4461,0.433621
5,2.4216,2.442276,0.4734,0.474915,0.4734,0.458686
6,2.3029,2.34742,0.4899,0.485946,0.4899,0.477799


[I 2025-03-26 20:40:27,781] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1786,2.439229,0.4515,0.489401,0.4515,0.43906
2,2.0854,2.022209,0.493,0.514794,0.493,0.49187
3,1.823,1.88856,0.5184,0.524138,0.5184,0.510487
4,1.6973,1.893632,0.5138,0.535037,0.5138,0.509555
5,1.622,1.802774,0.532,0.539278,0.532,0.525138
6,1.5659,1.754398,0.5458,0.549447,0.5458,0.541126


[I 2025-03-26 20:45:10,948] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9644,2.262776,0.466,0.501232,0.466,0.45441
2,1.9467,1.937822,0.5019,0.524971,0.5019,0.501497
3,1.7248,1.826339,0.5239,0.531097,0.5239,0.516733
4,1.613,1.848642,0.5181,0.540863,0.5181,0.514699
5,1.5442,1.762333,0.5354,0.544908,0.5354,0.529112
6,1.4907,1.716477,0.5493,0.554481,0.5493,0.544901
7,1.4536,1.698564,0.5461,0.554827,0.5461,0.544897
8,1.4271,1.735813,0.5406,0.542915,0.5406,0.535563
9,1.405,1.742952,0.5379,0.548376,0.5379,0.534493
10,1.3954,1.729679,0.542,0.550593,0.542,0.540301


[I 2025-03-26 20:53:03,383] Trial 9 finished with value: 0.5403012316073997 and parameters: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}. Best is trial 9 with value: 0.5403012316073997.


Trial 10 with params: {'learning_rate': 0.0026025741521183794, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3146,1.930951,0.492,0.558539,0.492,0.485754
2,1.6648,1.876052,0.5071,0.551435,0.5071,0.50691
3,1.5063,1.818669,0.5283,0.555346,0.5283,0.519379
4,1.4095,1.909051,0.509,0.546367,0.509,0.506419
5,1.34,1.819602,0.521,0.552119,0.521,0.518343
6,1.2722,1.730285,0.5411,0.552005,0.5411,0.53742


[I 2025-03-26 20:57:48,875] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0003262588029927626, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3315,2.607308,0.4355,0.476091,0.4355,0.421031
2,2.2236,2.114183,0.4842,0.5044,0.4842,0.481513
3,1.9217,1.957389,0.5112,0.515806,0.5112,0.502716


[I 2025-03-26 21:00:10,980] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.0009531187414107555, 'weight_decay': 0.005, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6085,2.025879,0.4878,0.533056,0.4878,0.47783
2,1.7515,1.834412,0.5133,0.54014,0.5133,0.513098
3,1.5762,1.755115,0.5329,0.544033,0.5329,0.52631
4,1.4789,1.805818,0.5221,0.548588,0.5221,0.519077
5,1.416,1.723971,0.5374,0.551918,0.5374,0.532268
6,1.3626,1.675448,0.5504,0.556693,0.5504,0.546276
7,1.3232,1.668387,0.5495,0.559397,0.5495,0.548583
8,1.2943,1.709017,0.5432,0.547643,0.5432,0.538597
9,1.2677,1.721911,0.5429,0.553393,0.5429,0.539476
10,1.2544,1.703633,0.5432,0.552205,0.5432,0.541981


[I 2025-03-26 21:08:03,084] Trial 12 finished with value: 0.5419811933449915 and parameters: {'learning_rate': 0.0009531187414107555, 'weight_decay': 0.005, 'warmup_steps': 4}. Best is trial 12 with value: 0.5419811933449915.


Trial 13 with params: {'learning_rate': 0.0009263363105887989, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6148,2.032885,0.4864,0.531588,0.4864,0.476564
2,1.7579,1.837387,0.5128,0.539401,0.5128,0.512562
3,1.5815,1.756744,0.5327,0.543462,0.5327,0.526042
4,1.484,1.806244,0.5225,0.54815,0.5225,0.51926
5,1.4211,1.724311,0.5373,0.551957,0.5373,0.532057
6,1.3679,1.676359,0.5514,0.557939,0.5514,0.547358
7,1.3286,1.668738,0.5497,0.55946,0.5497,0.54878
8,1.3001,1.709089,0.5434,0.547901,0.5434,0.538852
9,1.2737,1.721775,0.5424,0.552843,0.5424,0.539015
10,1.2607,1.70383,0.5434,0.552723,0.5434,0.542299


[I 2025-03-26 21:15:55,943] Trial 13 finished with value: 0.542299275301681 and parameters: {'learning_rate': 0.0009263363105887989, 'weight_decay': 0.006, 'warmup_steps': 1}. Best is trial 13 with value: 0.542299275301681.


Trial 14 with params: {'learning_rate': 0.0017763026521482, 'weight_decay': 0.005, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.365,1.932764,0.4931,0.552645,0.4931,0.485608
2,1.6594,1.818954,0.5167,0.550828,0.5167,0.516354
3,1.5002,1.764485,0.5318,0.551885,0.5318,0.525007
4,1.4062,1.840402,0.5152,0.549181,0.5152,0.512716
5,1.3415,1.756808,0.5319,0.554064,0.5319,0.527879
6,1.2816,1.686069,0.5463,0.553566,0.5463,0.542373
7,1.235,1.689229,0.544,0.555313,0.544,0.543063
8,1.1982,1.738784,0.5416,0.550873,0.5416,0.537405
9,1.1623,1.753978,0.5422,0.555206,0.5422,0.53933
10,1.1413,1.726564,0.5403,0.548499,0.5403,0.539143


[I 2025-03-26 21:23:50,422] Trial 14 finished with value: 0.5391432946939926 and parameters: {'learning_rate': 0.0017763026521482, 'weight_decay': 0.005, 'warmup_steps': 7}. Best is trial 13 with value: 0.542299275301681.


Trial 15 with params: {'learning_rate': 0.002125688919623599, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4024,1.934263,0.4902,0.555251,0.4902,0.483659
2,1.6612,1.839329,0.5125,0.550899,0.5125,0.512051
3,1.4999,1.78419,0.5309,0.554249,0.5309,0.522889
4,1.4043,1.868717,0.5134,0.549435,0.5134,0.51102
5,1.3368,1.782303,0.527,0.554139,0.527,0.523561
6,1.2733,1.702601,0.5439,0.552609,0.5439,0.539931


[I 2025-03-26 21:28:36,756] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0001293425222493065, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9744,3.45771,0.3542,0.39463,0.3542,0.335383
2,3.0457,2.783499,0.4414,0.454298,0.4414,0.430745
3,2.5779,2.485425,0.4704,0.472071,0.4704,0.456221
4,2.3244,2.354187,0.4656,0.482158,0.4656,0.455857
5,2.175,2.225643,0.488,0.49241,0.488,0.475925
6,2.0769,2.145302,0.505,0.502165,0.505,0.495881


[I 2025-03-26 21:33:24,924] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0017352383115840264, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.363,1.933406,0.4934,0.551599,0.4934,0.485733
2,1.6603,1.817236,0.5175,0.551041,0.5175,0.517162
3,1.5011,1.76252,0.5326,0.552221,0.5326,0.525797
4,1.4073,1.837268,0.516,0.54923,0.516,0.513625
5,1.3428,1.754059,0.5335,0.555717,0.5335,0.529685
6,1.2834,1.684446,0.5464,0.553765,0.5464,0.54259
7,1.2371,1.687278,0.5444,0.555561,0.5444,0.543424
8,1.2008,1.736467,0.5414,0.550819,0.5414,0.53732
9,1.1653,1.751638,0.5421,0.554973,0.5421,0.539126
10,1.1448,1.724761,0.5404,0.548497,0.5404,0.539202


[I 2025-03-26 21:41:18,636] Trial 17 finished with value: 0.5392017806380118 and parameters: {'learning_rate': 0.0017352383115840264, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 13 with value: 0.542299275301681.


Trial 18 with params: {'learning_rate': 0.0001044907148504563, 'weight_decay': 0.006, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1316,3.669623,0.321,0.36354,0.321,0.304359
2,3.2811,3.007882,0.4235,0.442342,0.4235,0.412996
3,2.7978,2.680062,0.456,0.458115,0.456,0.440381


[I 2025-03-26 21:43:40,195] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00046730377985285565, 'weight_decay': 0.004, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1017,2.35376,0.4597,0.495815,0.4597,0.447867
2,2.0161,1.977988,0.498,0.519856,0.498,0.497076
3,1.7725,1.855397,0.521,0.527558,0.521,0.513546
4,1.6537,1.869265,0.5162,0.538295,0.5162,0.512553
5,1.5816,1.780643,0.5341,0.542204,0.5341,0.527572
6,1.5269,1.73368,0.547,0.551346,0.547,0.542317
7,1.4896,1.713087,0.5448,0.552502,0.5448,0.543289
8,1.4629,1.74991,0.5398,0.54128,0.5398,0.534336
9,1.4413,1.754989,0.5351,0.545613,0.5351,0.531609
10,1.4322,1.742834,0.5393,0.547862,0.5393,0.53748


[I 2025-03-26 21:51:32,171] Trial 19 finished with value: 0.5374798773793585 and parameters: {'learning_rate': 0.00046730377985285565, 'weight_decay': 0.004, 'warmup_steps': 13}. Best is trial 13 with value: 0.542299275301681.


Trial 20 with params: {'learning_rate': 0.0036707750721263967, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3236,1.975108,0.4845,0.556339,0.4845,0.480343
2,1.7237,1.990628,0.4923,0.556777,0.4923,0.493114
3,1.5624,1.922486,0.5128,0.551683,0.5128,0.503803


[I 2025-03-26 21:53:53,682] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00047225306258254224, 'weight_decay': 0.006, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0598,2.33787,0.4607,0.497172,0.4607,0.449144
2,2.0057,1.972945,0.4986,0.520389,0.4986,0.497852
3,1.767,1.852207,0.5206,0.527174,0.5206,0.513153
4,1.6496,1.867105,0.5169,0.538873,0.5169,0.513228
5,1.5782,1.778855,0.5343,0.54282,0.5343,0.527949
6,1.5238,1.732095,0.5466,0.551009,0.5466,0.541994


[I 2025-03-26 21:58:34,000] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0016548731075088127, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3667,1.935266,0.4937,0.549352,0.4937,0.485622
2,1.6632,1.814565,0.5178,0.550872,0.5178,0.517524
3,1.5038,1.759246,0.5332,0.553441,0.5332,0.526983
4,1.4101,1.831232,0.5175,0.549239,0.5175,0.515121
5,1.3462,1.748689,0.534,0.555994,0.534,0.53021
6,1.2874,1.681458,0.5472,0.554201,0.5472,0.543336
7,1.2419,1.683748,0.5458,0.556727,0.5458,0.54482
8,1.2065,1.732077,0.5424,0.551339,0.5424,0.538308
9,1.1718,1.747083,0.5435,0.555656,0.5435,0.540259
10,1.1521,1.721267,0.5407,0.548875,0.5407,0.539479


[I 2025-03-26 22:06:21,000] Trial 22 finished with value: 0.5394788327494493 and parameters: {'learning_rate': 0.0016548731075088127, 'weight_decay': 0.005, 'warmup_steps': 0}. Best is trial 13 with value: 0.542299275301681.


Trial 23 with params: {'learning_rate': 0.00028190905615004794, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4439,2.733074,0.4276,0.47008,0.4276,0.411734
2,2.3299,2.18816,0.4792,0.498801,0.4792,0.475414
3,1.9974,2.012663,0.5056,0.507992,0.5056,0.496101
4,1.8426,1.986199,0.5031,0.52235,0.5031,0.497583
5,1.7532,1.887752,0.5217,0.527004,0.5217,0.51356
6,1.6904,1.833051,0.5348,0.536718,0.5348,0.529859


[I 2025-03-26 22:11:03,333] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0005230425586679374, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.982,2.275087,0.4651,0.500619,0.4651,0.453492
2,1.9563,1.94337,0.5014,0.523998,0.5014,0.500884
3,1.7316,1.83041,0.5238,0.53125,0.5238,0.516711


[I 2025-03-26 22:13:23,300] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0006127471015546029, 'weight_decay': 0.004, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8806,2.192297,0.4734,0.508301,0.4734,0.461781
2,1.8903,1.904993,0.5054,0.529528,0.5054,0.504964
3,1.6827,1.802239,0.5276,0.535315,0.5276,0.520682
4,1.5757,1.832038,0.5196,0.54311,0.5196,0.516139
5,1.509,1.747415,0.5361,0.546281,0.5361,0.530079
6,1.456,1.702026,0.5532,0.558838,0.5532,0.548768
7,1.4188,1.686705,0.5472,0.556176,0.5472,0.546141
8,1.3921,1.724517,0.5413,0.544601,0.5413,0.536655
9,1.3692,1.733407,0.5394,0.549934,0.5394,0.53599
10,1.3589,1.71898,0.5426,0.551361,0.5426,0.541079


[I 2025-03-26 22:21:13,846] Trial 25 finished with value: 0.5410789591381692 and parameters: {'learning_rate': 0.0006127471015546029, 'weight_decay': 0.004, 'warmup_steps': 5}. Best is trial 13 with value: 0.542299275301681.


Trial 26 with params: {'learning_rate': 0.0007708863978783736, 'weight_decay': 0.004, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7466,2.095968,0.4818,0.521158,0.4818,0.470879
2,1.8114,1.862466,0.5099,0.535313,0.5099,0.509279
3,1.6225,1.772258,0.531,0.540749,0.531,0.524302
4,1.5213,1.813441,0.5229,0.547292,0.5229,0.519449
5,1.4569,1.730707,0.5369,0.549617,0.5369,0.531212
6,1.4042,1.684717,0.5519,0.557738,0.5519,0.547604
7,1.366,1.67367,0.5488,0.558274,0.5488,0.547769
8,1.3385,1.712659,0.544,0.548352,0.544,0.539567
9,1.3139,1.723908,0.5405,0.551354,0.5405,0.537252
10,1.3022,1.707542,0.5435,0.552793,0.5435,0.542356


[I 2025-03-26 22:29:04,424] Trial 26 finished with value: 0.5423560118383949 and parameters: {'learning_rate': 0.0007708863978783736, 'weight_decay': 0.004, 'warmup_steps': 9}. Best is trial 26 with value: 0.5423560118383949.


Trial 27 with params: {'learning_rate': 0.00021059103361382344, 'weight_decay': 0.001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7438,3.050114,0.3979,0.445261,0.3979,0.378883
2,2.6038,2.386755,0.4664,0.483933,0.4664,0.459621
3,2.1894,2.158025,0.4955,0.497252,0.4955,0.484629
4,1.9953,2.094536,0.4894,0.506731,0.4894,0.482328
5,1.8855,1.986177,0.5083,0.512303,0.5083,0.498888
6,1.812,1.923304,0.5244,0.523975,0.5244,0.518111


[I 2025-03-26 22:33:41,576] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0008680205092932948, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6842,2.056369,0.4859,0.528505,0.4859,0.475281
2,1.7776,1.845802,0.511,0.537414,0.511,0.510517
3,1.5959,1.761552,0.5319,0.542242,0.5319,0.525183
4,1.4969,1.808148,0.5223,0.547862,0.5223,0.51889
5,1.4333,1.725935,0.5375,0.551451,0.5375,0.532127
6,1.3802,1.678763,0.5507,0.55672,0.5507,0.546465
7,1.3414,1.669942,0.5496,0.558873,0.5496,0.548521
8,1.3131,1.709761,0.5435,0.548113,0.5435,0.539023
9,1.2874,1.72198,0.5422,0.552907,0.5422,0.538995
10,1.2748,1.704554,0.5424,0.55141,0.5424,0.541162


[I 2025-03-26 22:41:28,724] Trial 28 finished with value: 0.5411622436531168 and parameters: {'learning_rate': 0.0008680205092932948, 'weight_decay': 0.008, 'warmup_steps': 11}. Best is trial 26 with value: 0.5423560118383949.


Trial 29 with params: {'learning_rate': 0.0012954019726683075, 'weight_decay': 0.003, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4991,1.964544,0.4924,0.542796,0.4924,0.482682
2,1.6941,1.813404,0.5181,0.547909,0.5181,0.518193
3,1.5283,1.749293,0.5339,0.551049,0.5339,0.527593
4,1.4336,1.81177,0.5205,0.549768,0.5205,0.518002
5,1.3702,1.730179,0.5383,0.556829,0.5383,0.533963
6,1.3144,1.673005,0.5485,0.554227,0.5485,0.544364
7,1.2721,1.671592,0.5484,0.558082,0.5484,0.547243
8,1.2401,1.716055,0.5447,0.551161,0.5447,0.540331
9,1.2095,1.730448,0.5427,0.55434,0.5427,0.539508
10,1.193,1.7085,0.5428,0.551139,0.5428,0.54151


[I 2025-03-26 22:49:24,529] Trial 29 finished with value: 0.5415095609574747 and parameters: {'learning_rate': 0.0012954019726683075, 'weight_decay': 0.003, 'warmup_steps': 14}. Best is trial 26 with value: 0.5423560118383949.


Trial 30 with params: {'learning_rate': 0.0006298670817356332, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8981,2.186168,0.4742,0.510156,0.4742,0.462522
2,1.8838,1.900396,0.5056,0.530464,0.5056,0.505214
3,1.6763,1.798548,0.5281,0.535526,0.5281,0.520994
4,1.5694,1.829519,0.5201,0.543549,0.5201,0.516565
5,1.5026,1.745029,0.5365,0.547088,0.5365,0.530469
6,1.4496,1.699594,0.5539,0.559911,0.5539,0.549548
7,1.4122,1.684761,0.5472,0.556305,0.5472,0.546084
8,1.3854,1.722683,0.5427,0.546111,0.5427,0.538111
9,1.3622,1.731854,0.5385,0.549494,0.5385,0.53522
10,1.3518,1.717135,0.5425,0.551492,0.5425,0.541017


[I 2025-03-26 22:57:14,871] Trial 30 finished with value: 0.5410170555723007 and parameters: {'learning_rate': 0.0006298670817356332, 'weight_decay': 0.006, 'warmup_steps': 15}. Best is trial 26 with value: 0.5423560118383949.


Trial 31 with params: {'learning_rate': 0.0011989402603761162, 'weight_decay': 0.004, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5071,1.975143,0.492,0.541518,0.492,0.482091
2,1.7047,1.816312,0.5181,0.546301,0.5181,0.517786
3,1.5379,1.748602,0.5333,0.548608,0.5333,0.526971
4,1.4429,1.8082,0.5208,0.549309,0.5208,0.518193
5,1.3801,1.726733,0.5379,0.554456,0.5379,0.533142
6,1.3251,1.672378,0.5491,0.554887,0.5491,0.544895
7,1.2836,1.669568,0.5492,0.558884,0.5492,0.54821
8,1.2526,1.712905,0.5454,0.550911,0.5454,0.540865
9,1.2232,1.726971,0.5441,0.555619,0.5441,0.541023
10,1.2076,1.70618,0.5441,0.552512,0.5441,0.542875


[I 2025-03-26 23:05:02,346] Trial 31 finished with value: 0.5428747065867819 and parameters: {'learning_rate': 0.0011989402603761162, 'weight_decay': 0.004, 'warmup_steps': 7}. Best is trial 31 with value: 0.5428747065867819.


Trial 32 with params: {'learning_rate': 0.0014232907566822306, 'weight_decay': 0.004, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4373,1.949483,0.4933,0.547347,0.4933,0.483866
2,1.6793,1.811268,0.5182,0.550016,0.5182,0.518539
3,1.5168,1.751539,0.5348,0.552443,0.5348,0.528444
4,1.4227,1.817587,0.5189,0.549411,0.5189,0.516713
5,1.3593,1.735845,0.5368,0.556303,0.5368,0.532801
6,1.3026,1.675097,0.548,0.554133,0.548,0.5439


[I 2025-03-26 23:09:43,359] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0007975378542990936, 'weight_decay': 0.004, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7261,2.083627,0.4825,0.522567,0.4825,0.471713
2,1.801,1.857314,0.5101,0.536186,0.5101,0.509669
3,1.6145,1.768828,0.5324,0.542119,0.5324,0.525564
4,1.514,1.811613,0.5234,0.548172,0.5234,0.52001
5,1.4499,1.729054,0.5375,0.550766,0.5375,0.531961
6,1.3971,1.682809,0.5525,0.558325,0.5525,0.548269
7,1.3588,1.672391,0.5488,0.558119,0.5488,0.5478
8,1.3311,1.711584,0.5447,0.548874,0.5447,0.540151
9,1.3062,1.723132,0.5413,0.552179,0.5413,0.538155
10,1.2942,1.706493,0.5434,0.552731,0.5434,0.542237


[I 2025-03-26 23:17:33,234] Trial 33 finished with value: 0.5422369603025289 and parameters: {'learning_rate': 0.0007975378542990936, 'weight_decay': 0.004, 'warmup_steps': 9}. Best is trial 31 with value: 0.5428747065867819.


Trial 34 with params: {'learning_rate': 0.0008802681549221086, 'weight_decay': 0.002, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6655,2.050476,0.4863,0.52957,0.4863,0.475897
2,1.7728,1.843761,0.5104,0.536793,0.5104,0.510034
3,1.5926,1.760382,0.5326,0.542924,0.5326,0.525851
4,1.494,1.807622,0.5223,0.547893,0.5223,0.518881
5,1.4305,1.72552,0.5373,0.551622,0.5373,0.531927
6,1.3774,1.678186,0.5505,0.556631,0.5505,0.5463
7,1.3385,1.669651,0.55,0.559372,0.55,0.548944
8,1.3103,1.709551,0.5433,0.547935,0.5433,0.538898
9,1.2844,1.721887,0.5423,0.552848,0.5423,0.539055
10,1.2717,1.704364,0.5424,0.551621,0.5424,0.541247


[I 2025-03-26 23:25:27,156] Trial 34 finished with value: 0.5412471197997692 and parameters: {'learning_rate': 0.0008802681549221086, 'weight_decay': 0.002, 'warmup_steps': 8}. Best is trial 31 with value: 0.5428747065867819.


Trial 35 with params: {'learning_rate': 0.0008484156538546054, 'weight_decay': 0.004, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6972,2.063695,0.4852,0.527689,0.4852,0.4746
2,1.7838,1.84879,0.5109,0.537256,0.5109,0.510436
3,1.6009,1.763393,0.5322,0.541952,0.5322,0.525338
4,1.5014,1.808968,0.5226,0.54761,0.5226,0.519148
5,1.4377,1.726639,0.5369,0.550443,0.5369,0.531374
6,1.3847,1.679782,0.5509,0.556905,0.5509,0.546666
7,1.346,1.670496,0.5493,0.558675,0.5493,0.548309
8,1.3179,1.71014,0.5442,0.548675,0.5442,0.539719
9,1.2924,1.722197,0.5414,0.552188,0.5414,0.538278
10,1.28,1.704971,0.5426,0.551818,0.5426,0.541432


[I 2025-03-26 23:33:18,416] Trial 35 finished with value: 0.5414315460507296 and parameters: {'learning_rate': 0.0008484156538546054, 'weight_decay': 0.004, 'warmup_steps': 11}. Best is trial 31 with value: 0.5428747065867819.


Trial 36 with params: {'learning_rate': 5.370203809578854e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3459,4.072112,0.2044,0.233773,0.2044,0.196774
2,3.8204,3.610804,0.3358,0.355859,0.3358,0.325814
3,3.4431,3.315487,0.3914,0.394519,0.3914,0.374015


[I 2025-03-26 23:35:40,840] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0013537177696110007, 'weight_decay': 0.0, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4888,1.958841,0.4914,0.544973,0.4914,0.481754
2,1.6884,1.812339,0.5185,0.549087,0.5185,0.518725
3,1.5233,1.750228,0.5346,0.552008,0.5346,0.528157
4,1.4287,1.814371,0.52,0.55001,0.52,0.517644
5,1.3651,1.732637,0.5374,0.55692,0.5374,0.533422
6,1.3088,1.673794,0.5472,0.55313,0.5472,0.543027
7,1.2659,1.673131,0.5466,0.556529,0.5466,0.545441
8,1.2334,1.718303,0.5443,0.551106,0.5443,0.539972
9,1.2021,1.732835,0.5434,0.554866,0.5434,0.540067
10,1.185,1.710184,0.5429,0.550903,0.5429,0.541495


[I 2025-03-26 23:43:27,005] Trial 37 finished with value: 0.541494602114532 and parameters: {'learning_rate': 0.0013537177696110007, 'weight_decay': 0.0, 'warmup_steps': 16}. Best is trial 31 with value: 0.5428747065867819.


Trial 38 with params: {'learning_rate': 0.004049231430661466, 'weight_decay': 0.007, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3092,2.002449,0.4834,0.557609,0.4834,0.479602
2,1.7502,2.039943,0.4877,0.555897,0.4877,0.488709
3,1.5889,1.963116,0.5094,0.547567,0.5094,0.500011
4,1.4817,2.04335,0.4991,0.54272,0.4991,0.49703
5,1.4013,1.94564,0.511,0.551509,0.511,0.50817
6,1.3188,1.829658,0.531,0.548035,0.531,0.527481


[I 2025-03-26 23:48:08,493] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0010475348879951107, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6659,2.015302,0.4874,0.534311,0.4874,0.476782
2,1.7396,1.827693,0.5148,0.542154,0.5148,0.514646
3,1.5629,1.751857,0.5326,0.544834,0.5326,0.526048
4,1.4652,1.806069,0.5214,0.548177,0.5214,0.51853
5,1.4014,1.724155,0.538,0.553681,0.538,0.533216
6,1.347,1.673326,0.5495,0.555062,0.5495,0.545035
7,1.3065,1.668055,0.5494,0.559497,0.5494,0.548544
8,1.2766,1.709774,0.5446,0.549446,0.5446,0.54001
9,1.2487,1.723094,0.5426,0.553837,0.5426,0.539333
10,1.2343,1.703684,0.5426,0.551192,0.5426,0.541403


[I 2025-03-26 23:55:56,899] Trial 39 finished with value: 0.5414034044915106 and parameters: {'learning_rate': 0.0010475348879951107, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32}. Best is trial 31 with value: 0.5428747065867819.


Trial 40 with params: {'learning_rate': 0.0006683134293570091, 'weight_decay': 0.003, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8403,2.15496,0.4767,0.511852,0.4767,0.465124
2,1.8596,1.887634,0.5076,0.53263,0.5076,0.507181
3,1.659,1.789624,0.5294,0.53761,0.5294,0.522466
4,1.5542,1.823789,0.5217,0.545169,0.5217,0.518081
5,1.4884,1.740005,0.5369,0.547833,0.5369,0.530898
6,1.4356,1.694558,0.5533,0.559044,0.5533,0.54898
7,1.3981,1.680855,0.5477,0.556744,0.5477,0.546568
8,1.3711,1.719024,0.5426,0.546308,0.5426,0.53809
9,1.3476,1.728891,0.54,0.550795,0.54,0.536699
10,1.3368,1.713704,0.544,0.552772,0.544,0.542537


[I 2025-03-27 00:03:45,807] Trial 40 finished with value: 0.5425370178361741 and parameters: {'learning_rate': 0.0006683134293570091, 'weight_decay': 0.003, 'warmup_steps': 10}. Best is trial 31 with value: 0.5428747065867819.


Trial 41 with params: {'learning_rate': 6.459897452290429e-05, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2826,3.966994,0.2472,0.278616,0.2472,0.237352
2,3.679,3.449211,0.3667,0.379851,0.3667,0.356016
3,3.2661,3.13552,0.4134,0.416393,0.4134,0.396251


[I 2025-03-27 00:06:07,243] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0002902527241920383, 'weight_decay': 0.001, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4582,2.722933,0.4273,0.470105,0.4273,0.411735
2,2.3168,2.176416,0.4801,0.500152,0.4801,0.476485
3,1.9846,2.002598,0.5065,0.510145,0.5065,0.497358
4,1.8311,1.978208,0.5046,0.523616,0.5046,0.499187
5,1.7423,1.880035,0.5219,0.526962,0.5219,0.513821
6,1.6799,1.825795,0.5357,0.537799,0.5357,0.530806


[I 2025-03-27 00:10:47,720] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.00035478309803482067, 'weight_decay': 0.003, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2895,2.547289,0.4412,0.480166,0.4412,0.427803
2,2.1721,2.078361,0.4887,0.5094,0.4887,0.486647
3,1.8837,1.930166,0.5135,0.517488,0.5135,0.50499


[I 2025-03-27 00:13:09,076] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 7.012112975444019e-05, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2788,3.937189,0.2563,0.288301,0.2563,0.245199
2,3.6295,3.387128,0.376,0.389243,0.376,0.365004
3,3.1955,3.062423,0.4206,0.423529,0.4206,0.403334
4,2.904,2.859772,0.4254,0.445749,0.4254,0.411604
5,2.7086,2.704102,0.4534,0.458071,0.4534,0.436631
6,2.573,2.596633,0.4708,0.466535,0.4708,0.456051


[I 2025-03-27 00:17:51,767] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.001084688049409519, 'weight_decay': 0.004, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5533,1.995239,0.4912,0.53765,0.4912,0.480736
2,1.7236,1.822801,0.5156,0.542596,0.5156,0.515318
3,1.5533,1.749969,0.5328,0.545747,0.5328,0.5265
4,1.4575,1.805797,0.5208,0.547722,0.5208,0.517868
5,1.3947,1.724246,0.5382,0.553909,0.5382,0.533362
6,1.3405,1.672819,0.549,0.554727,0.549,0.544672
7,1.2999,1.66818,0.5494,0.559356,0.5494,0.548636
8,1.27,1.710214,0.5447,0.549804,0.5447,0.540289
9,1.2419,1.72382,0.5435,0.554667,0.5435,0.540346
10,1.2273,1.704195,0.543,0.551123,0.543,0.541692


[I 2025-03-27 00:25:41,155] Trial 45 finished with value: 0.541692433016335 and parameters: {'learning_rate': 0.001084688049409519, 'weight_decay': 0.004, 'warmup_steps': 7}. Best is trial 31 with value: 0.5428747065867819.


Trial 46 with params: {'learning_rate': 0.0007617906184996216, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7648,2.102227,0.4808,0.519512,0.4808,0.469722
2,1.8162,1.864661,0.51,0.535409,0.51,0.50937
3,1.6258,1.773642,0.531,0.540614,0.531,0.524359
4,1.5241,1.814224,0.5224,0.54671,0.5224,0.518968
5,1.4595,1.731375,0.537,0.54948,0.537,0.531353
6,1.4067,1.685436,0.5526,0.558307,0.5526,0.548238
7,1.3686,1.674147,0.5484,0.55776,0.5484,0.547279
8,1.3411,1.713078,0.5443,0.548577,0.5443,0.539868
9,1.3166,1.724233,0.5404,0.551318,0.5404,0.537174
10,1.305,1.707936,0.5434,0.552666,0.5434,0.542178


[I 2025-03-27 00:33:32,738] Trial 46 finished with value: 0.5421780523736526 and parameters: {'learning_rate': 0.0007617906184996216, 'weight_decay': 0.003, 'warmup_steps': 12}. Best is trial 31 with value: 0.5428747065867819.


Trial 47 with params: {'learning_rate': 0.001462799825535077, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4101,1.945628,0.4932,0.547093,0.4932,0.484117
2,1.6751,1.811197,0.5191,0.550941,0.5191,0.519285
3,1.5137,1.752586,0.5351,0.553369,0.5351,0.528861
4,1.4199,1.81959,0.519,0.549533,0.519,0.516558
5,1.3566,1.737752,0.5361,0.556168,0.5361,0.532196
6,1.2995,1.675958,0.547,0.55295,0.547,0.542986


[I 2025-03-27 00:38:15,335] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.00026126050000721833, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5281,2.813787,0.4211,0.467085,0.4211,0.404573
2,2.3968,2.234536,0.4761,0.494956,0.4761,0.471553
3,2.043,2.046429,0.5027,0.504551,0.5027,0.492862


[I 2025-03-27 00:40:37,175] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0013211339849780776, 'weight_decay': 0.005, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4696,1.959541,0.4926,0.544781,0.4926,0.482915
2,1.6895,1.812492,0.5183,0.548883,0.5183,0.518643
3,1.5252,1.749503,0.5345,0.551754,0.5345,0.528123
4,1.4308,1.812725,0.5213,0.550597,0.5213,0.518832
5,1.3677,1.731136,0.538,0.557482,0.538,0.533805
6,1.3117,1.673329,0.5477,0.553536,0.5477,0.543581
7,1.2692,1.672249,0.5483,0.55795,0.5483,0.547149
8,1.2371,1.716976,0.5443,0.550969,0.5443,0.540013
9,1.2062,1.731448,0.5433,0.554838,0.5433,0.539985
10,1.1895,1.709277,0.5428,0.551034,0.5428,0.541477


[I 2025-03-27 00:48:29,255] Trial 49 finished with value: 0.5414774079354706 and parameters: {'learning_rate': 0.0013211339849780776, 'weight_decay': 0.005, 'warmup_steps': 8}. Best is trial 31 with value: 0.5428747065867819.


Trial 50 with params: {'learning_rate': 0.0003144555766003982, 'weight_decay': 0.007, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4328,2.667913,0.4295,0.470387,0.4295,0.414371
2,2.2663,2.138987,0.4823,0.502883,0.4823,0.479324
3,1.9457,1.973534,0.51,0.514371,0.51,0.501416
4,1.7982,1.956151,0.5074,0.526502,0.5074,0.502193
5,1.7125,1.859474,0.5237,0.528935,0.5237,0.515952
6,1.6516,1.806713,0.5396,0.542262,0.5396,0.534852


[I 2025-03-27 00:53:10,302] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0007217634778158603, 'weight_decay': 0.002, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7873,2.121531,0.4801,0.518417,0.4801,0.469128
2,1.8325,1.873336,0.509,0.534137,0.509,0.508455
3,1.6387,1.779625,0.5308,0.540018,0.5308,0.523945
4,1.536,1.817667,0.522,0.545947,0.522,0.518324
5,1.4711,1.734518,0.5373,0.54891,0.5373,0.531391
6,1.4183,1.68888,0.553,0.558897,0.553,0.5487
7,1.3805,1.676604,0.5489,0.558417,0.5489,0.547874
8,1.3533,1.715177,0.5438,0.548001,0.5438,0.539386
9,1.3292,1.72584,0.5402,0.551092,0.5402,0.537042
10,1.3179,1.710027,0.5442,0.553212,0.5442,0.542907


[I 2025-03-27 01:01:02,569] Trial 51 finished with value: 0.5429071787902532 and parameters: {'learning_rate': 0.0007217634778158603, 'weight_decay': 0.002, 'warmup_steps': 9}. Best is trial 51 with value: 0.5429071787902532.


Trial 52 with params: {'learning_rate': 0.0008538160237018203, 'weight_decay': 0.002, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6723,2.05867,0.4854,0.528876,0.4854,0.47509
2,1.7802,1.847452,0.5114,0.537575,0.5114,0.510849
3,1.5988,1.762612,0.5319,0.541864,0.5319,0.525073
4,1.4998,1.808604,0.5224,0.547672,0.5224,0.518969
5,1.4363,1.726364,0.5367,0.550336,0.5367,0.531286
6,1.3833,1.679463,0.5506,0.556652,0.5506,0.546413
7,1.3447,1.670334,0.5494,0.558892,0.5494,0.548466
8,1.3166,1.710026,0.5434,0.547879,0.5434,0.538938
9,1.2911,1.722132,0.5421,0.55273,0.5421,0.538858
10,1.2787,1.704896,0.5424,0.55169,0.5424,0.541226


[I 2025-03-27 01:08:55,445] Trial 52 finished with value: 0.5412263045904896 and parameters: {'learning_rate': 0.0008538160237018203, 'weight_decay': 0.002, 'warmup_steps': 5}. Best is trial 51 with value: 0.5429071787902532.


Trial 53 with params: {'learning_rate': 0.0007167369272358462, 'weight_decay': 0.001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7917,2.124382,0.48,0.517514,0.48,0.469043
2,1.8348,1.874553,0.5083,0.53315,0.5083,0.507689
3,1.6405,1.780501,0.5307,0.539827,0.5307,0.523825
4,1.5376,1.818187,0.5225,0.546075,0.5225,0.518743
5,1.4726,1.734969,0.5371,0.548666,0.5371,0.531154
6,1.4199,1.689358,0.5532,0.559314,0.5532,0.54897
7,1.3821,1.67694,0.5489,0.558372,0.5489,0.547846
8,1.3549,1.715492,0.5436,0.547782,0.5436,0.539215
9,1.3308,1.726082,0.5403,0.551252,0.5403,0.53712
10,1.3196,1.710319,0.5444,0.553309,0.5444,0.543117


[I 2025-03-27 01:16:44,102] Trial 53 finished with value: 0.5431165419813282 and parameters: {'learning_rate': 0.0007167369272358462, 'weight_decay': 0.001, 'warmup_steps': 9}. Best is trial 53 with value: 0.5431165419813282.


Trial 54 with params: {'learning_rate': 0.0004966602557481673, 'weight_decay': 0.001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0429,2.31139,0.463,0.498742,0.463,0.451434
2,1.9837,1.958822,0.4995,0.521605,0.4995,0.498914
3,1.7502,1.841524,0.5222,0.529277,0.5222,0.514904
4,1.6346,1.859349,0.5172,0.539482,0.5172,0.513597
5,1.5641,1.771788,0.5346,0.543197,0.5346,0.528167
6,1.51,1.725383,0.5484,0.553192,0.5484,0.543931
7,1.4728,1.706024,0.5459,0.554002,0.5459,0.544535
8,1.4462,1.74305,0.5408,0.542476,0.5408,0.535412
9,1.4244,1.749093,0.5361,0.546708,0.5361,0.532702
10,1.4151,1.736438,0.5409,0.54939,0.5409,0.539086


[I 2025-03-27 01:24:32,939] Trial 54 finished with value: 0.5390863656724156 and parameters: {'learning_rate': 0.0004966602557481673, 'weight_decay': 0.001, 'warmup_steps': 9}. Best is trial 53 with value: 0.5431165419813282.


Trial 55 with params: {'learning_rate': 0.001244307650631205, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4691,1.967125,0.4924,0.542885,0.4924,0.482696
2,1.6972,1.814421,0.5172,0.545676,0.5172,0.516975
3,1.5323,1.748622,0.5332,0.548586,0.5332,0.526771
4,1.4378,1.809326,0.5207,0.549211,0.5207,0.518055
5,1.3752,1.728021,0.5384,0.556641,0.5384,0.533921
6,1.3199,1.672529,0.5489,0.554682,0.5489,0.544753
7,1.278,1.670433,0.5498,0.559234,0.5498,0.548692
8,1.2467,1.714185,0.5451,0.551079,0.5451,0.540643
9,1.2167,1.728403,0.544,0.55551,0.544,0.540835
10,1.2008,1.707191,0.5434,0.551894,0.5434,0.542176


[I 2025-03-27 01:32:24,071] Trial 55 finished with value: 0.5421760862314936 and parameters: {'learning_rate': 0.001244307650631205, 'weight_decay': 0.007, 'warmup_steps': 0}. Best is trial 53 with value: 0.5431165419813282.


Trial 56 with params: {'learning_rate': 0.001577357454509515, 'weight_decay': 0.001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4111,1.940257,0.4944,0.549527,0.4944,0.485889
2,1.6689,1.812764,0.5175,0.549475,0.5175,0.517299
3,1.5078,1.756355,0.5335,0.551657,0.5335,0.526847
4,1.4138,1.826721,0.519,0.550755,0.519,0.516711
5,1.3498,1.744332,0.5351,0.55659,0.5351,0.531242
6,1.2917,1.679029,0.5474,0.554227,0.5474,0.543576
7,1.2469,1.680745,0.5471,0.557948,0.5471,0.546094
8,1.2122,1.728321,0.542,0.549831,0.542,0.537785
9,1.1784,1.743297,0.5435,0.555303,0.5435,0.540258
10,1.1593,1.718194,0.5419,0.55004,0.5419,0.540597


[I 2025-03-27 01:40:09,760] Trial 56 finished with value: 0.5405967900659759 and parameters: {'learning_rate': 0.001577357454509515, 'weight_decay': 0.001, 'warmup_steps': 10}. Best is trial 53 with value: 0.5431165419813282.


Trial 57 with params: {'learning_rate': 0.0003482263668999834, 'weight_decay': 0.001, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.307,2.563408,0.4397,0.479333,0.4397,0.426137
2,2.1849,2.086725,0.4873,0.507869,0.4873,0.485052
3,1.8925,1.936299,0.5135,0.517317,0.5135,0.504942


[I 2025-03-27 01:42:31,645] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0024300018867834676, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2902,1.928931,0.4931,0.55673,0.4931,0.48707
2,1.658,1.861719,0.5101,0.55249,0.5101,0.509857
3,1.5007,1.804551,0.5291,0.555179,0.5291,0.520696


[I 2025-03-27 01:44:53,170] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0005422147120050799, 'weight_decay': 0.001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.02,2.269474,0.4652,0.500722,0.4652,0.453435
2,1.9486,1.936749,0.5016,0.524248,0.5016,0.50093
3,1.7228,1.824617,0.5231,0.530325,0.5231,0.515864
4,1.6101,1.847191,0.5186,0.541211,0.5186,0.51511
5,1.5408,1.760749,0.5354,0.545232,0.5354,0.529271
6,1.487,1.714837,0.5497,0.5549,0.5497,0.545278
7,1.4498,1.69715,0.5465,0.55573,0.5465,0.545459
8,1.4231,1.734462,0.5405,0.54263,0.5405,0.535389
9,1.4007,1.741733,0.5375,0.548136,0.5375,0.534077
10,1.3911,1.728248,0.5428,0.551108,0.5428,0.540993


[I 2025-03-27 01:52:39,169] Trial 59 finished with value: 0.5409925745851695 and parameters: {'learning_rate': 0.0005422147120050799, 'weight_decay': 0.001, 'warmup_steps': 20}. Best is trial 53 with value: 0.5431165419813282.


Trial 60 with params: {'learning_rate': 0.0005026444332318829, 'weight_decay': 0.002, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0484,2.30775,0.463,0.499498,0.463,0.451404
2,1.98,1.956114,0.5006,0.522803,0.5006,0.499978
3,1.7467,1.839311,0.5223,0.529375,0.5223,0.514998


[I 2025-03-27 01:54:59,761] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0009265476292252998, 'weight_decay': 0.004, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6444,2.036253,0.4878,0.53294,0.4878,0.477922
2,1.7602,1.83787,0.5131,0.539916,0.5131,0.51279
3,1.5823,1.756959,0.533,0.543535,0.533,0.526245
4,1.4844,1.80642,0.5225,0.548455,0.5225,0.519279
5,1.4211,1.724436,0.5373,0.552307,0.5373,0.53212
6,1.3678,1.676348,0.5511,0.557308,0.5511,0.546991
7,1.3285,1.668741,0.5498,0.55971,0.5498,0.548945
8,1.2999,1.709106,0.543,0.547572,0.543,0.538461
9,1.2735,1.721788,0.5428,0.553306,0.5428,0.539426
10,1.2604,1.703777,0.5436,0.55278,0.5436,0.542425


[I 2025-03-27 02:02:53,705] Trial 61 finished with value: 0.542425310949524 and parameters: {'learning_rate': 0.0009265476292252998, 'weight_decay': 0.004, 'warmup_steps': 10}. Best is trial 53 with value: 0.5431165419813282.


Trial 62 with params: {'learning_rate': 0.001053681375200332, 'weight_decay': 0.0, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5743,2.002582,0.4897,0.535937,0.4897,0.479198
2,1.7301,1.825234,0.5143,0.541728,0.5143,0.514229
3,1.5584,1.750845,0.5333,0.545384,0.5333,0.526711
4,1.4622,1.805556,0.5209,0.547589,0.5209,0.51786
5,1.3992,1.723965,0.5379,0.553537,0.5379,0.533095
6,1.3452,1.673207,0.5495,0.555172,0.5495,0.545073
7,1.3049,1.668028,0.5492,0.559362,0.5492,0.548466
8,1.2753,1.709737,0.5446,0.549561,0.5446,0.540079
9,1.2475,1.723206,0.5435,0.554387,0.5435,0.540321
10,1.2331,1.703872,0.5429,0.551231,0.5429,0.541611


[I 2025-03-27 02:10:44,082] Trial 62 finished with value: 0.5416106883255607 and parameters: {'learning_rate': 0.001053681375200332, 'weight_decay': 0.0, 'warmup_steps': 9}. Best is trial 53 with value: 0.5431165419813282.


Trial 63 with params: {'learning_rate': 0.0030015470453135062, 'weight_decay': 0.005, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.303,1.939398,0.4928,0.561597,0.4928,0.487186
2,1.6811,1.913549,0.5009,0.55206,0.5009,0.500937
3,1.5221,1.854114,0.5216,0.553311,0.5216,0.512624
4,1.423,1.944899,0.5029,0.542621,0.5029,0.500837
5,1.3511,1.852422,0.5178,0.551469,0.5178,0.515619
6,1.2794,1.755881,0.538,0.550933,0.538,0.534195


[I 2025-03-27 02:15:24,749] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.0009791087569692406, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5855,2.017957,0.488,0.533937,0.488,0.478052
2,1.7445,1.831532,0.5148,0.541959,0.5148,0.51477
3,1.5708,1.753696,0.5334,0.544707,0.5334,0.52672
4,1.4741,1.80548,0.5221,0.548555,0.5221,0.518931
5,1.4114,1.72371,0.5379,0.552795,0.5379,0.532801
6,1.3578,1.674754,0.55,0.556242,0.55,0.545779
7,1.3182,1.668161,0.5503,0.560044,0.5503,0.549333
8,1.2892,1.709016,0.5437,0.548303,0.5437,0.539027
9,1.2623,1.72208,0.5435,0.55393,0.5435,0.540142
10,1.2487,1.703592,0.5431,0.551954,0.5431,0.541883


[I 2025-03-27 02:23:12,568] Trial 64 finished with value: 0.5418831747943047 and parameters: {'learning_rate': 0.0009791087569692406, 'weight_decay': 0.004, 'warmup_steps': 1}. Best is trial 53 with value: 0.5431165419813282.


Trial 65 with params: {'learning_rate': 0.000709971861392446, 'weight_decay': 0.003, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8014,2.128914,0.4797,0.517506,0.4797,0.468716
2,1.8384,1.87636,0.5079,0.532856,0.5079,0.507297
3,1.6431,1.781693,0.5303,0.539138,0.5303,0.523358
4,1.5399,1.818911,0.5225,0.546056,0.5225,0.518615
5,1.4747,1.735595,0.5372,0.548463,0.5372,0.531159
6,1.422,1.690034,0.5534,0.559398,0.5534,0.549164
7,1.3842,1.677432,0.5486,0.558019,0.5486,0.547529
8,1.357,1.715933,0.5435,0.547511,0.5435,0.539102
9,1.3331,1.726401,0.5403,0.551158,0.5403,0.537125
10,1.3219,1.710736,0.5443,0.553103,0.5443,0.542957


[I 2025-03-27 02:31:00,602] Trial 65 finished with value: 0.5429566861727393 and parameters: {'learning_rate': 0.000709971861392446, 'weight_decay': 0.003, 'warmup_steps': 10}. Best is trial 53 with value: 0.5431165419813282.


Trial 66 with params: {'learning_rate': 0.0006666746738282045, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8493,2.157465,0.4766,0.512874,0.4766,0.465242
2,1.8613,1.888368,0.5074,0.532179,0.5074,0.506865
3,1.66,1.790055,0.5289,0.537116,0.5289,0.521996
4,1.555,1.824075,0.5215,0.545002,0.5215,0.51791
5,1.4891,1.740224,0.5371,0.547992,0.5371,0.531076
6,1.4362,1.694782,0.5532,0.558872,0.5532,0.548867
7,1.3987,1.681023,0.5477,0.556638,0.5477,0.546543
8,1.3717,1.719192,0.5427,0.54637,0.5427,0.538168
9,1.3482,1.729018,0.5401,0.550787,0.5401,0.536782
10,1.3374,1.713836,0.5439,0.552806,0.5439,0.542476


[I 2025-03-27 02:38:56,814] Trial 66 finished with value: 0.542476224678547 and parameters: {'learning_rate': 0.0006666746738282045, 'weight_decay': 0.003, 'warmup_steps': 12}. Best is trial 53 with value: 0.5431165419813282.


Trial 67 with params: {'learning_rate': 0.0009305192845903592, 'weight_decay': 0.003, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.657,2.037071,0.4876,0.533217,0.4876,0.477789
2,1.7605,1.837782,0.5123,0.53923,0.5123,0.511967
3,1.582,1.756874,0.5329,0.543202,0.5329,0.526099
4,1.484,1.806417,0.5223,0.548331,0.5223,0.519077
5,1.4205,1.724385,0.5373,0.552017,0.5373,0.532116
6,1.3671,1.676206,0.5514,0.557565,0.5514,0.547278
7,1.3278,1.668686,0.5495,0.559375,0.5495,0.548586
8,1.299,1.709105,0.5433,0.5479,0.5433,0.53878
9,1.2726,1.721802,0.5425,0.553164,0.5425,0.539189
10,1.2594,1.703704,0.5438,0.552761,0.5438,0.542573


[I 2025-03-27 02:46:50,577] Trial 67 finished with value: 0.5425732756051956 and parameters: {'learning_rate': 0.0009305192845903592, 'weight_decay': 0.003, 'warmup_steps': 14}. Best is trial 53 with value: 0.5431165419813282.


Trial 68 with params: {'learning_rate': 0.0009260367862049292, 'weight_decay': 0.002, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6789,2.041191,0.4865,0.532488,0.4865,0.476485
2,1.7635,1.838814,0.5118,0.53872,0.5118,0.511443
3,1.5838,1.757356,0.5327,0.543079,0.5327,0.525889
4,1.4853,1.806622,0.5225,0.54817,0.5225,0.519227
5,1.4216,1.724544,0.5374,0.552052,0.5374,0.532235
6,1.3681,1.676378,0.5511,0.557074,0.5511,0.546889
7,1.3288,1.668749,0.5494,0.559086,0.5494,0.548431
8,1.3001,1.709138,0.5434,0.548251,0.5434,0.538943
9,1.2736,1.721792,0.5423,0.55299,0.5423,0.538973
10,1.2605,1.703716,0.5437,0.552767,0.5437,0.542503


[I 2025-03-27 02:54:39,371] Trial 68 finished with value: 0.5425032820506784 and parameters: {'learning_rate': 0.0009260367862049292, 'weight_decay': 0.002, 'warmup_steps': 19}. Best is trial 53 with value: 0.5431165419813282.


Trial 69 with params: {'learning_rate': 0.001191055617710326, 'weight_decay': 0.001, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5857,1.984745,0.4896,0.538559,0.4896,0.479089
2,1.7124,1.818034,0.5176,0.545954,0.5176,0.517257
3,1.5417,1.749201,0.5336,0.54851,0.5336,0.527037
4,1.4456,1.808456,0.5209,0.54938,0.5209,0.51808
5,1.3819,1.726753,0.5383,0.555014,0.5383,0.533732
6,1.3266,1.672371,0.549,0.554612,0.549,0.544656


[I 2025-03-27 02:59:21,009] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0012591271232436807, 'weight_decay': 0.003, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5379,1.972012,0.4918,0.542113,0.4918,0.481911
2,1.7008,1.814914,0.5171,0.546295,0.5171,0.517031
3,1.533,1.749124,0.5337,0.55009,0.5337,0.527347
4,1.4377,1.810532,0.5212,0.550371,0.5212,0.51864
5,1.3742,1.728878,0.5372,0.554812,0.5372,0.532704
6,1.3185,1.672666,0.5486,0.554166,0.5486,0.544415


[I 2025-03-27 03:04:02,734] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.001156277862798058, 'weight_decay': 0.003, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5561,1.98556,0.4903,0.538924,0.4903,0.480062
2,1.7139,1.818969,0.5173,0.545166,0.5173,0.516902
3,1.5443,1.74907,0.5332,0.547396,0.5332,0.5267
4,1.4487,1.807276,0.5207,0.548459,0.5207,0.517891
5,1.3855,1.725709,0.5384,0.554658,0.5384,0.533663
6,1.3306,1.672375,0.5495,0.555061,0.5495,0.545158
7,1.2895,1.668923,0.5482,0.558226,0.5482,0.547283
8,1.2588,1.711775,0.5452,0.550621,0.5452,0.540685
9,1.2298,1.725692,0.5436,0.555117,0.5436,0.540517
10,1.2145,1.70521,0.5438,0.552119,0.5438,0.542579


[I 2025-03-27 03:11:55,661] Trial 71 finished with value: 0.5425789704749038 and parameters: {'learning_rate': 0.001156277862798058, 'weight_decay': 0.003, 'warmup_steps': 16}. Best is trial 53 with value: 0.5431165419813282.


Trial 72 with params: {'learning_rate': 0.00094951184733998, 'weight_decay': 0.004, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6694,2.034602,0.4861,0.533281,0.4861,0.47634
2,1.7577,1.83614,0.5126,0.539729,0.5126,0.512326
3,1.579,1.755904,0.5322,0.542693,0.5322,0.525381
4,1.4808,1.806228,0.5228,0.549076,0.5228,0.519728
5,1.4172,1.724213,0.5371,0.551997,0.5371,0.531985
6,1.3636,1.675578,0.5507,0.556806,0.5507,0.546475
7,1.3241,1.668443,0.5496,0.559409,0.5496,0.548604
8,1.2951,1.70907,0.5436,0.548345,0.5436,0.539086
9,1.2684,1.721893,0.5426,0.553427,0.5426,0.539344
10,1.255,1.70355,0.5436,0.552885,0.5436,0.542495


[I 2025-03-27 03:19:56,357] Trial 72 finished with value: 0.5424945578331113 and parameters: {'learning_rate': 0.00094951184733998, 'weight_decay': 0.004, 'warmup_steps': 20}. Best is trial 53 with value: 0.5431165419813282.


Trial 73 with params: {'learning_rate': 5.953168512495511e-05, 'weight_decay': 0.01, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.334,4.033273,0.2212,0.252997,0.2212,0.212513
2,3.759,3.534473,0.3504,0.369155,0.3504,0.339903
3,3.3557,3.223543,0.4027,0.406212,0.4027,0.385294
4,3.0714,3.015764,0.4088,0.430121,0.4088,0.393849
5,2.8738,2.858835,0.4399,0.449089,0.4399,0.422264
6,2.7332,2.747231,0.4562,0.452651,0.4562,0.439889


[I 2025-03-27 03:24:37,080] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0010178611363031746, 'weight_decay': 0.003, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6062,2.012629,0.4893,0.536318,0.4893,0.478982
2,1.7389,1.828556,0.5146,0.541655,0.5146,0.514428
3,1.565,1.752221,0.5326,0.544141,0.5326,0.525963
4,1.4682,1.805548,0.5218,0.548249,0.5218,0.518631
5,1.405,1.723796,0.5378,0.553271,0.5378,0.532876
6,1.3511,1.673826,0.5495,0.555513,0.5495,0.545085
7,1.3111,1.667992,0.5496,0.559277,0.5496,0.548701
8,1.2817,1.709342,0.5441,0.548804,0.5441,0.539421
9,1.2543,1.722582,0.5433,0.554157,0.5433,0.540086
10,1.2403,1.703589,0.5423,0.551039,0.5423,0.541103


[I 2025-03-27 03:32:28,006] Trial 74 finished with value: 0.5411028837658546 and parameters: {'learning_rate': 0.0010178611363031746, 'weight_decay': 0.003, 'warmup_steps': 13}. Best is trial 53 with value: 0.5431165419813282.


Trial 75 with params: {'learning_rate': 0.0003289057583632567, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3797,2.621669,0.4338,0.473674,0.4338,0.419116
2,2.2299,2.115348,0.4846,0.505767,0.4846,0.482087
3,1.9218,1.956676,0.5112,0.515349,0.5112,0.502607
4,1.779,1.943764,0.5082,0.527901,0.5082,0.503113
5,1.6956,1.84829,0.5254,0.53102,0.5254,0.517854
6,1.6358,1.79644,0.5404,0.543287,0.5404,0.535592


[I 2025-03-27 03:37:11,455] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.0022792789040751984, 'weight_decay': 0.002, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3425,1.930317,0.491,0.556348,0.491,0.484827
2,1.6583,1.850032,0.5108,0.550688,0.5108,0.510146
3,1.4992,1.79405,0.5305,0.555729,0.5305,0.522287


[I 2025-03-27 03:39:33,765] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.000982448155054577, 'weight_decay': 0.002, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6514,2.025264,0.4869,0.53416,0.4869,0.476731
2,1.7495,1.832591,0.5136,0.541171,0.5136,0.513528
3,1.5725,1.754079,0.5326,0.544139,0.5326,0.526039
4,1.4748,1.805843,0.5221,0.548143,0.5221,0.518939
5,1.4113,1.723922,0.5373,0.55251,0.5373,0.532395
6,1.3575,1.674647,0.5504,0.55654,0.5504,0.546148
7,1.3177,1.668151,0.5495,0.559166,0.5495,0.548483
8,1.2885,1.709103,0.5436,0.548124,0.5436,0.538934
9,1.2614,1.722171,0.5424,0.553267,0.5424,0.539141
10,1.2477,1.703455,0.5429,0.552124,0.5429,0.541784


[I 2025-03-27 03:47:22,302] Trial 77 finished with value: 0.5417843152471044 and parameters: {'learning_rate': 0.000982448155054577, 'weight_decay': 0.002, 'warmup_steps': 20}. Best is trial 53 with value: 0.5431165419813282.


Trial 78 with params: {'learning_rate': 0.00083683203926323, 'weight_decay': 0.002, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7202,2.07052,0.4842,0.526587,0.4842,0.47352
2,1.7892,1.851077,0.5108,0.537267,0.5108,0.510367
3,1.6045,1.764715,0.5321,0.541593,0.5321,0.525154
4,1.5045,1.809582,0.5224,0.547399,0.5224,0.518908
5,1.4405,1.727173,0.5371,0.5506,0.5371,0.5316
6,1.3875,1.680417,0.5515,0.557414,0.5515,0.547192
7,1.3489,1.6709,0.5493,0.558662,0.5493,0.548316
8,1.3208,1.710451,0.544,0.548482,0.544,0.53953
9,1.2954,1.722367,0.5415,0.552125,0.5415,0.538252
10,1.2831,1.70524,0.5427,0.551964,0.5427,0.541536


[I 2025-03-27 03:55:15,571] Trial 78 finished with value: 0.5415358345737539 and parameters: {'learning_rate': 0.00083683203926323, 'weight_decay': 0.002, 'warmup_steps': 15}. Best is trial 53 with value: 0.5431165419813282.


Trial 79 with params: {'learning_rate': 0.001114049520635619, 'weight_decay': 0.002, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.544,1.989871,0.4913,0.538392,0.4913,0.481015
2,1.7185,1.820907,0.5165,0.543976,0.5165,0.516157
3,1.5491,1.749384,0.5318,0.545363,0.5318,0.525395
4,1.4535,1.806248,0.52,0.546866,0.52,0.517012
5,1.3906,1.724752,0.5383,0.554003,0.5383,0.533475
6,1.3362,1.672566,0.5495,0.555206,0.5495,0.545255
7,1.2955,1.668413,0.5489,0.558838,0.5489,0.548038
8,1.2652,1.710772,0.5451,0.550388,0.5451,0.540751
9,1.2368,1.724533,0.5442,0.555672,0.5442,0.541104
10,1.2219,1.704585,0.5433,0.551477,0.5433,0.542039


[I 2025-03-27 04:03:05,203] Trial 79 finished with value: 0.5420391129831015 and parameters: {'learning_rate': 0.001114049520635619, 'weight_decay': 0.002, 'warmup_steps': 8}. Best is trial 53 with value: 0.5431165419813282.


Trial 80 with params: {'learning_rate': 0.0006950832607648327, 'weight_decay': 0.001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.826,2.139852,0.4784,0.514771,0.4784,0.467251
2,1.8469,1.880551,0.5075,0.53229,0.5075,0.506919
3,1.649,1.784552,0.5298,0.538653,0.5298,0.522926
4,1.5451,1.820616,0.5222,0.545922,0.5222,0.518493
5,1.4796,1.737085,0.5368,0.547856,0.5368,0.530756
6,1.4268,1.69158,0.5534,0.559247,0.5534,0.549054
7,1.3891,1.678577,0.5484,0.557456,0.5484,0.547184
8,1.3619,1.716958,0.5436,0.547529,0.5436,0.53916
9,1.3381,1.727229,0.5403,0.551122,0.5403,0.537112
10,1.3271,1.711689,0.5445,0.553166,0.5445,0.54311


[I 2025-03-27 04:10:54,652] Trial 80 finished with value: 0.5431100399061425 and parameters: {'learning_rate': 0.0006950832607648327, 'weight_decay': 0.001, 'warmup_steps': 13}. Best is trial 53 with value: 0.5431165419813282.


Trial 81 with params: {'learning_rate': 0.0011510959704894446, 'weight_decay': 0.002, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5659,1.987332,0.49,0.5386,0.49,0.47977
2,1.7154,1.819403,0.5171,0.544782,0.5171,0.516708
3,1.5453,1.749199,0.5328,0.546882,0.5328,0.52632
4,1.4495,1.807215,0.5207,0.548253,0.5207,0.517828
5,1.3862,1.72561,0.5384,0.554424,0.5384,0.533585
6,1.3314,1.67238,0.5495,0.555019,0.5495,0.545181
7,1.2902,1.668828,0.5485,0.558462,0.5485,0.547562
8,1.2596,1.711654,0.5453,0.550719,0.5453,0.540803
9,1.2306,1.725559,0.5435,0.554969,0.5435,0.54042
10,1.2153,1.705084,0.5434,0.551784,0.5434,0.5422


[I 2025-03-27 04:18:53,725] Trial 81 finished with value: 0.5422001374713306 and parameters: {'learning_rate': 0.0011510959704894446, 'weight_decay': 0.002, 'warmup_steps': 18}. Best is trial 53 with value: 0.5431165419813282.


Trial 82 with params: {'learning_rate': 0.001120945738705142, 'weight_decay': 0.002, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5518,1.989734,0.49,0.537127,0.49,0.479761
2,1.7182,1.820654,0.5166,0.543881,0.5166,0.516196
3,1.5484,1.749365,0.532,0.545687,0.532,0.525606
4,1.4528,1.806437,0.5202,0.547278,0.5202,0.517237
5,1.3898,1.72491,0.5383,0.554048,0.5383,0.53345
6,1.3353,1.6725,0.5499,0.555549,0.5499,0.5456
7,1.2944,1.668473,0.5487,0.558653,0.5487,0.547844
8,1.2641,1.710934,0.545,0.550303,0.545,0.540634
9,1.2356,1.724718,0.544,0.555457,0.544,0.540877
10,1.2206,1.704652,0.5429,0.551158,0.5429,0.541666


[I 2025-03-27 04:26:46,580] Trial 82 finished with value: 0.5416655463417341 and parameters: {'learning_rate': 0.001120945738705142, 'weight_decay': 0.002, 'warmup_steps': 11}. Best is trial 53 with value: 0.5431165419813282.


Trial 83 with params: {'learning_rate': 0.0006120818993246993, 'weight_decay': 0.0, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.899,2.196344,0.4727,0.507661,0.4727,0.461002
2,1.8928,1.905857,0.5058,0.530305,0.5058,0.50533
3,1.6838,1.802699,0.5274,0.535119,0.5274,0.520471


[I 2025-03-27 04:29:08,799] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.000674552220757532, 'weight_decay': 0.002, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8491,2.153625,0.4773,0.513955,0.4773,0.466045
2,1.8579,1.886363,0.5076,0.53226,0.5076,0.507049
3,1.6572,1.788581,0.5294,0.5377,0.5294,0.522456
4,1.5523,1.823115,0.5216,0.545105,0.5216,0.51796
5,1.4865,1.739327,0.5371,0.547725,0.5371,0.530991
6,1.4336,1.69385,0.5533,0.559081,0.5533,0.54896
7,1.396,1.680317,0.5482,0.557332,0.5482,0.547046
8,1.3689,1.718552,0.5426,0.546339,0.5426,0.538116
9,1.3453,1.728488,0.54,0.550769,0.54,0.536718
10,1.3345,1.713194,0.5443,0.553171,0.5443,0.54288


[I 2025-03-27 04:37:03,307] Trial 84 finished with value: 0.5428803050697599 and parameters: {'learning_rate': 0.000674552220757532, 'weight_decay': 0.002, 'warmup_steps': 14}. Best is trial 53 with value: 0.5431165419813282.


Trial 85 with params: {'learning_rate': 0.0005358786474528965, 'weight_decay': 0.001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0024,2.269846,0.4658,0.50176,0.4658,0.454137
2,1.9503,1.938532,0.5015,0.524228,0.5015,0.500881
3,1.7253,1.826352,0.5235,0.530791,0.5235,0.516379


[I 2025-03-27 04:39:26,138] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.00042062570512767656, 'weight_decay': 0.002, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1698,2.422329,0.4527,0.490384,0.4527,0.440126
2,2.0709,2.012477,0.4943,0.515776,0.4943,0.493176
3,1.812,1.881091,0.5189,0.524886,0.5189,0.511072
4,1.6876,1.88804,0.5142,0.535522,0.5142,0.510063
5,1.613,1.79764,0.5324,0.540124,0.5324,0.525605
6,1.5572,1.749592,0.5456,0.549316,0.5456,0.540919


[I 2025-03-27 04:44:08,906] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0004191411023553298, 'weight_decay': 0.004, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2492,2.449729,0.4503,0.489208,0.4503,0.437799
2,2.0869,2.019058,0.4925,0.514478,0.4925,0.491304
3,1.8182,1.884182,0.5189,0.52451,0.5189,0.51087


[I 2025-03-27 04:46:30,526] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0005004376561176635, 'weight_decay': 0.004, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0625,2.313296,0.4628,0.499367,0.4628,0.451313
2,1.9837,1.957992,0.5003,0.522395,0.5003,0.499638
3,1.7489,1.840468,0.5224,0.529379,0.5224,0.515011


[I 2025-03-27 04:48:51,648] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0006175278065880498, 'weight_decay': 0.002, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8753,2.188513,0.4738,0.508983,0.4738,0.462223
2,1.8873,1.903312,0.5057,0.530093,0.5057,0.505307
3,1.6805,1.801016,0.528,0.535418,0.528,0.520997
4,1.5737,1.83122,0.52,0.543513,0.52,0.516533
5,1.5071,1.746698,0.5363,0.546715,0.5363,0.530338
6,1.4542,1.701319,0.5534,0.559194,0.5534,0.549005
7,1.4169,1.68615,0.5474,0.556247,0.5474,0.546277
8,1.3902,1.723972,0.5415,0.544839,0.5415,0.536907
9,1.3672,1.732946,0.5392,0.549834,0.5392,0.535809
10,1.3569,1.718458,0.5427,0.551427,0.5427,0.541152


[I 2025-03-27 04:56:45,359] Trial 89 finished with value: 0.5411515848667535 and parameters: {'learning_rate': 0.0006175278065880498, 'weight_decay': 0.002, 'warmup_steps': 5}. Best is trial 53 with value: 0.5431165419813282.


Trial 90 with params: {'learning_rate': 0.003112351489892698, 'weight_decay': 0.002, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2939,1.942998,0.4911,0.558747,0.4911,0.485837
2,1.6862,1.924958,0.5002,0.552654,0.5002,0.500477
3,1.5273,1.864682,0.5199,0.552781,0.5199,0.511182
4,1.4276,1.954946,0.5017,0.542179,0.5017,0.499416
5,1.355,1.861645,0.5168,0.551102,0.5168,0.514609
6,1.2822,1.763291,0.5375,0.550799,0.5375,0.533623


[I 2025-03-27 05:01:28,413] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0005658002598835893, 'weight_decay': 0.0, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9827,2.24384,0.4675,0.502637,0.4675,0.455842
2,1.9289,1.925537,0.5024,0.525034,0.5024,0.501523
3,1.7089,1.816573,0.5248,0.531691,0.5248,0.517481


[I 2025-03-27 05:03:49,520] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0005869035074065617, 'weight_decay': 0.003, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9424,2.220861,0.4707,0.506413,0.4707,0.459079
2,1.9115,1.916039,0.5045,0.527692,0.5045,0.503713
3,1.6969,1.809888,0.5265,0.533564,0.5265,0.519347
4,1.5877,1.837097,0.519,0.541863,0.519,0.515416
5,1.52,1.751845,0.5371,0.547064,0.5371,0.531037
6,1.4668,1.706289,0.5522,0.557703,0.5522,0.547728
7,1.4295,1.690124,0.5465,0.555539,0.5465,0.545469
8,1.4028,1.727731,0.5411,0.543926,0.5411,0.536303
9,1.3801,1.73608,0.538,0.54857,0.538,0.53461
10,1.37,1.721949,0.5425,0.551326,0.5425,0.540981


[I 2025-03-27 05:11:43,147] Trial 92 finished with value: 0.5409805313291968 and parameters: {'learning_rate': 0.0005869035074065617, 'weight_decay': 0.003, 'warmup_steps': 14}. Best is trial 53 with value: 0.5431165419813282.


Trial 93 with params: {'learning_rate': 0.0017941333312675633, 'weight_decay': 0.004, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3857,1.934105,0.492,0.551824,0.492,0.484278
2,1.6606,1.820001,0.516,0.550232,0.516,0.515636
3,1.5005,1.765523,0.5323,0.552775,0.5323,0.525316
4,1.4063,1.841972,0.5148,0.548962,0.5148,0.512424
5,1.3411,1.75818,0.5312,0.553756,0.5312,0.527188
6,1.281,1.686829,0.5466,0.554027,0.5466,0.542633


[I 2025-03-27 05:16:27,250] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0008184572778045913, 'weight_decay': 0.001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7255,2.076881,0.483,0.523898,0.483,0.472321
2,1.7948,1.853952,0.5101,0.536388,0.5101,0.509717
3,1.6091,1.766584,0.5323,0.542021,0.5323,0.5255
4,1.5089,1.81049,0.5226,0.547481,0.5226,0.519051
5,1.4448,1.728001,0.5372,0.550649,0.5372,0.531726
6,1.3919,1.681494,0.552,0.55782,0.552,0.547741
7,1.3534,1.671553,0.5491,0.558391,0.5491,0.54811
8,1.3255,1.710933,0.5444,0.548606,0.5444,0.539838
9,1.3003,1.722679,0.5416,0.552431,0.5416,0.538431
10,1.2882,1.705765,0.5429,0.552209,0.5429,0.541736


[I 2025-03-27 05:24:21,471] Trial 94 finished with value: 0.5417363461944461 and parameters: {'learning_rate': 0.0008184572778045913, 'weight_decay': 0.001, 'warmup_steps': 13}. Best is trial 53 with value: 0.5431165419813282.


Trial 95 with params: {'learning_rate': 0.00041993227783222053, 'weight_decay': 0.003, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1609,2.420384,0.4527,0.489348,0.4527,0.440026
2,2.0702,2.012408,0.4947,0.516235,0.4947,0.493623
3,1.812,1.881238,0.5191,0.525171,0.5191,0.51136


[I 2025-03-27 05:26:42,034] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.0006410824446655911, 'weight_decay': 0.003, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9015,2.180846,0.4741,0.510029,0.4741,0.462378
2,1.8789,1.897312,0.5058,0.530836,0.5058,0.505356
3,1.672,1.796129,0.5285,0.536251,0.5285,0.52145
4,1.5653,1.827903,0.5201,0.543643,0.5201,0.516554
5,1.4986,1.743547,0.5364,0.547137,0.5364,0.53035
6,1.4456,1.69811,0.553,0.558684,0.553,0.548678
7,1.4081,1.683586,0.5473,0.556512,0.5473,0.546183
8,1.3811,1.721573,0.5422,0.545455,0.5422,0.537561
9,1.3578,1.730941,0.5389,0.549886,0.5389,0.535607
10,1.3473,1.716038,0.5434,0.552474,0.5434,0.5419


[I 2025-03-27 05:34:32,384] Trial 96 finished with value: 0.5419000191611286 and parameters: {'learning_rate': 0.0006410824446655911, 'weight_decay': 0.003, 'warmup_steps': 19}. Best is trial 53 with value: 0.5431165419813282.


Trial 97 with params: {'learning_rate': 0.0009735386197879122, 'weight_decay': 0.002, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6217,2.023091,0.488,0.534363,0.488,0.478008
2,1.7484,1.832686,0.5142,0.541485,0.5142,0.514103
3,1.5729,1.754204,0.5329,0.544039,0.5329,0.526233
4,1.4756,1.80572,0.5223,0.548529,0.5223,0.519142
5,1.4124,1.723899,0.5373,0.552317,0.5373,0.532241
6,1.3588,1.674848,0.5505,0.556789,0.5505,0.54634
7,1.3192,1.66819,0.5497,0.559526,0.5497,0.548778
8,1.2902,1.709057,0.5436,0.548047,0.5436,0.538973
9,1.2633,1.722066,0.5428,0.553581,0.5428,0.539524
10,1.2497,1.703535,0.5432,0.552178,0.5432,0.541951


[I 2025-03-27 05:42:26,285] Trial 97 finished with value: 0.5419506951137826 and parameters: {'learning_rate': 0.0009735386197879122, 'weight_decay': 0.002, 'warmup_steps': 11}. Best is trial 53 with value: 0.5431165419813282.


Trial 98 with params: {'learning_rate': 0.0006794428438392101, 'weight_decay': 0.002, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8558,2.152617,0.4777,0.514503,0.4777,0.466354
2,1.8566,1.885396,0.5073,0.531978,0.5073,0.506678
3,1.6557,1.78778,0.5299,0.538526,0.5299,0.523013
4,1.5509,1.822578,0.5216,0.545145,0.5216,0.517973
5,1.4849,1.738815,0.537,0.547859,0.537,0.530916
6,1.432,1.693331,0.5531,0.558815,0.5531,0.548668
7,1.3944,1.679906,0.5484,0.557517,0.5484,0.547187
8,1.3673,1.718171,0.543,0.546844,0.543,0.538535
9,1.3436,1.728181,0.5399,0.550525,0.5399,0.536614
10,1.3327,1.712827,0.5441,0.553077,0.5441,0.542703


[I 2025-03-27 05:50:17,969] Trial 98 finished with value: 0.5427033324044128 and parameters: {'learning_rate': 0.0006794428438392101, 'weight_decay': 0.002, 'warmup_steps': 17}. Best is trial 53 with value: 0.5431165419813282.


Trial 99 with params: {'learning_rate': 0.0005440209150499404, 'weight_decay': 0.002, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9991,2.263019,0.4661,0.501789,0.4661,0.454372
2,1.9445,1.934924,0.5017,0.524457,0.5017,0.501082
3,1.7208,1.823609,0.5234,0.530354,0.5234,0.516157
4,1.6087,1.846548,0.5185,0.541182,0.5185,0.514999
5,1.5396,1.760255,0.5353,0.545147,0.5353,0.529198
6,1.486,1.714387,0.5498,0.554931,0.5498,0.545339
7,1.4488,1.696782,0.5468,0.556024,0.5468,0.545759
8,1.4221,1.734124,0.5408,0.543182,0.5408,0.535805
9,1.3998,1.741465,0.5376,0.548175,0.5376,0.534188
10,1.3902,1.727977,0.5423,0.550734,0.5423,0.540552


[I 2025-03-27 05:58:09,573] Trial 99 finished with value: 0.5405520630840941 and parameters: {'learning_rate': 0.0005440209150499404, 'weight_decay': 0.002, 'warmup_steps': 15}. Best is trial 53 with value: 0.5431165419813282.


Trial 100 with params: {'learning_rate': 0.000749880935222329, 'weight_decay': 0.003, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7599,2.105827,0.4808,0.518796,0.4808,0.46979
2,1.8197,1.866782,0.5098,0.535253,0.5098,0.509149
3,1.6291,1.775165,0.531,0.540572,0.531,0.524302
4,1.5273,1.815087,0.5227,0.546954,0.5227,0.519197
5,1.4628,1.732204,0.537,0.549279,0.537,0.531305
6,1.41,1.686396,0.5531,0.558851,0.5531,0.548759
7,1.372,1.674797,0.5482,0.557647,0.5482,0.547161
8,1.3446,1.713641,0.5442,0.548346,0.5442,0.539728
9,1.3203,1.724629,0.5409,0.551635,0.5409,0.537625
10,1.3088,1.708512,0.5438,0.552979,0.5438,0.542539


[I 2025-03-27 06:06:02,868] Trial 100 finished with value: 0.5425386039961169 and parameters: {'learning_rate': 0.000749880935222329, 'weight_decay': 0.003, 'warmup_steps': 8}. Best is trial 53 with value: 0.5431165419813282.


Trial 101 with params: {'learning_rate': 0.0004339347213889704, 'weight_decay': 0.003, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1429,2.399488,0.4554,0.491652,0.4554,0.442853
2,2.0531,2.001404,0.4953,0.516935,0.4953,0.494228
3,1.7996,1.872959,0.5192,0.525094,0.5192,0.511472


[I 2025-03-27 06:08:24,897] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0025496631466011997, 'weight_decay': 0.007, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3472,1.932109,0.4917,0.558063,0.4917,0.485466
2,1.6658,1.871648,0.508,0.551169,0.508,0.50767
3,1.5061,1.814684,0.5281,0.554323,0.5281,0.51913
4,1.4092,1.904801,0.5092,0.546926,0.5092,0.506633
5,1.3396,1.81551,0.5214,0.552323,0.5214,0.51873
6,1.2722,1.727074,0.5404,0.550992,0.5404,0.536614


[I 2025-03-27 06:13:21,831] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0008713882993462967, 'weight_decay': 0.002, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6608,2.052251,0.4857,0.529544,0.4857,0.475606
2,1.7746,1.84484,0.5111,0.53737,0.5111,0.510618
3,1.5944,1.761041,0.5324,0.542863,0.5324,0.525742
4,1.4958,1.807896,0.5225,0.547944,0.5225,0.51902
5,1.4324,1.725763,0.5371,0.551198,0.5371,0.531677
6,1.3794,1.678603,0.5505,0.55666,0.5505,0.54631
7,1.3406,1.66987,0.5499,0.559283,0.5499,0.548848
8,1.3124,1.709682,0.5435,0.548165,0.5435,0.53911
9,1.2867,1.721957,0.5421,0.552663,0.5421,0.538836
10,1.2741,1.704512,0.5423,0.55148,0.5423,0.541129


[I 2025-03-27 06:21:14,941] Trial 103 finished with value: 0.5411287317009876 and parameters: {'learning_rate': 0.0008713882993462967, 'weight_decay': 0.002, 'warmup_steps': 5}. Best is trial 53 with value: 0.5431165419813282.


Trial 104 with params: {'learning_rate': 0.0011031499802265311, 'weight_decay': 0.005, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5667,1.993988,0.4904,0.536764,0.4904,0.48008
2,1.7219,1.821934,0.5158,0.54293,0.5158,0.515507
3,1.5513,1.749733,0.5323,0.545484,0.5323,0.525873
4,1.4553,1.806174,0.5207,0.547695,0.5207,0.517751
5,1.3923,1.724618,0.5382,0.553801,0.5382,0.533311
6,1.3378,1.67265,0.5494,0.555001,0.5494,0.545059


[I 2025-03-27 06:26:04,066] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.000761557420760376, 'weight_decay': 0.004, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7469,2.099395,0.4809,0.518862,0.4809,0.469941
2,1.8145,1.864174,0.51,0.535337,0.51,0.509291
3,1.6252,1.773449,0.5309,0.540471,0.5309,0.524241
4,1.5238,1.814091,0.5233,0.547862,0.5233,0.519942
5,1.4595,1.731315,0.5374,0.55003,0.5374,0.531676
6,1.4067,1.685411,0.5526,0.558291,0.5526,0.548233
7,1.3687,1.674151,0.5487,0.558082,0.5487,0.547607
8,1.3412,1.713061,0.5443,0.548554,0.5443,0.539855
9,1.3167,1.72423,0.5405,0.551463,0.5405,0.537255
10,1.3051,1.707987,0.544,0.553428,0.544,0.542856


[I 2025-03-27 06:33:54,417] Trial 105 finished with value: 0.5428561691229284 and parameters: {'learning_rate': 0.000761557420760376, 'weight_decay': 0.004, 'warmup_steps': 7}. Best is trial 53 with value: 0.5431165419813282.


Trial 106 with params: {'learning_rate': 0.00043820390039899925, 'weight_decay': 0.006, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1356,2.392748,0.4562,0.49229,0.4562,0.443791
2,2.0478,1.998078,0.4955,0.51703,0.4955,0.49443
3,1.7958,1.870511,0.519,0.525028,0.519,0.511269
4,1.6739,1.880319,0.5145,0.535859,0.5145,0.510371
5,1.6004,1.790674,0.5319,0.539828,0.5319,0.525162
6,1.5452,1.743094,0.5457,0.549216,0.5457,0.54089


[I 2025-03-27 06:38:34,673] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.001004809658751685, 'weight_decay': 0.003, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5877,2.012833,0.4891,0.535167,0.4891,0.478879
2,1.7397,1.829226,0.5143,0.541709,0.5143,0.514334
3,1.5664,1.752537,0.5327,0.544353,0.5327,0.526054
4,1.4698,1.805417,0.5221,0.548668,0.5221,0.519017
5,1.4069,1.723711,0.5378,0.553058,0.5378,0.532851
6,1.3532,1.674093,0.5496,0.555635,0.5496,0.545246
7,1.3134,1.668022,0.5498,0.55962,0.5498,0.548926
8,1.2841,1.709191,0.5443,0.548986,0.5443,0.539626
9,1.2569,1.722401,0.5433,0.554082,0.5433,0.540051
10,1.2431,1.703578,0.5427,0.551319,0.5427,0.541433


[I 2025-03-27 06:46:24,249] Trial 107 finished with value: 0.5414333228779177 and parameters: {'learning_rate': 0.001004809658751685, 'weight_decay': 0.003, 'warmup_steps': 6}. Best is trial 53 with value: 0.5431165419813282.


Trial 108 with params: {'learning_rate': 0.0017049612762309162, 'weight_decay': 0.003, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3768,1.934597,0.4937,0.551842,0.4937,0.485856
2,1.662,1.816144,0.5177,0.551384,0.5177,0.517527
3,1.5023,1.761289,0.5328,0.552365,0.5328,0.526144
4,1.4084,1.835236,0.5159,0.548584,0.5159,0.513573
5,1.3439,1.752113,0.5334,0.555048,0.5334,0.529416
6,1.2847,1.68333,0.5465,0.553859,0.5465,0.542695


[I 2025-03-27 06:51:05,343] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.00020414501567401895, 'weight_decay': 0.001, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7234,3.055827,0.3979,0.442827,0.3979,0.378669
2,2.6169,2.401837,0.4659,0.482752,0.4659,0.458747
3,2.2056,2.17223,0.494,0.495724,0.494,0.483086
4,2.0104,2.106246,0.4876,0.505054,0.4876,0.480585
5,1.8996,1.997494,0.5079,0.512234,0.5079,0.498476
6,1.8255,1.934019,0.523,0.522436,0.523,0.516604


[I 2025-03-27 06:55:47,713] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0015364403780285263, 'weight_decay': 0.003, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4306,1.943335,0.4935,0.54716,0.4935,0.484591
2,1.6723,1.812201,0.5176,0.550012,0.5176,0.517586
3,1.5103,1.755029,0.5335,0.55141,0.5335,0.526877
4,1.4162,1.824304,0.5184,0.550043,0.5184,0.516408
5,1.3521,1.741978,0.5359,0.55673,0.5359,0.532036
6,1.2943,1.677842,0.5469,0.55342,0.5469,0.543005


[I 2025-03-27 07:00:28,982] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0005906908330834227, 'weight_decay': 0.004, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9024,2.210051,0.4712,0.506397,0.4712,0.459428
2,1.9046,1.913145,0.5048,0.528333,0.5048,0.504197
3,1.6935,1.808174,0.5264,0.533382,0.5264,0.519249


[I 2025-03-27 07:02:50,744] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.00047971761056357626, 'weight_decay': 0.004, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0648,2.332115,0.4616,0.4983,0.4616,0.44992
2,2.0002,1.968924,0.4991,0.521247,0.4991,0.498507
3,1.7622,1.849028,0.5212,0.527898,0.5212,0.513708
4,1.6451,1.864737,0.5172,0.539662,0.5172,0.513603
5,1.5738,1.776669,0.5344,0.542578,0.5344,0.527994
6,1.5195,1.729962,0.5472,0.551603,0.5472,0.542587


[I 2025-03-27 07:07:31,844] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0006108367192502101, 'weight_decay': 0.003, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9004,2.197361,0.4727,0.507715,0.4727,0.461023
2,1.8936,1.906311,0.5057,0.530042,0.5057,0.505229
3,1.6844,1.803007,0.5271,0.534825,0.5271,0.520175


[I 2025-03-27 07:09:56,292] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0005665590587384506, 'weight_decay': 0.0, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9382,2.233223,0.4693,0.505529,0.4693,0.457957
2,1.9227,1.923263,0.5037,0.526517,0.5037,0.502989
3,1.7065,1.815509,0.5249,0.531878,0.5249,0.51772
4,1.5966,1.841041,0.5187,0.54134,0.5187,0.515114
5,1.5287,1.755464,0.5367,0.54617,0.5367,0.530485
6,1.4754,1.709827,0.5503,0.555834,0.5503,0.545862
7,1.4383,1.693054,0.5472,0.55637,0.5472,0.546158
8,1.4117,1.730539,0.5408,0.543433,0.5408,0.535864
9,1.3892,1.738453,0.5384,0.549114,0.5384,0.535091
10,1.3794,1.72468,0.5429,0.551788,0.5429,0.54133


[I 2025-03-27 07:17:52,379] Trial 114 finished with value: 0.5413304703230396 and parameters: {'learning_rate': 0.0005665590587384506, 'weight_decay': 0.0, 'warmup_steps': 6}. Best is trial 53 with value: 0.5431165419813282.


Trial 115 with params: {'learning_rate': 0.0009793405608477534, 'weight_decay': 0.001, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6298,2.023014,0.4874,0.533602,0.4874,0.477308
2,1.7481,1.832337,0.5139,0.541311,0.5139,0.513865
3,1.5722,1.754022,0.5332,0.544559,0.5332,0.526564
4,1.4748,1.805724,0.5222,0.54846,0.5222,0.51907
5,1.4115,1.723881,0.5372,0.552248,0.5372,0.532229
6,1.3579,1.674716,0.5503,0.556453,0.5503,0.546034
7,1.3181,1.668161,0.5498,0.559419,0.5498,0.5488
8,1.289,1.709081,0.5437,0.548229,0.5437,0.53906
9,1.2621,1.722111,0.5427,0.553451,0.5427,0.539423
10,1.2484,1.703516,0.543,0.552151,0.543,0.541856


[I 2025-03-27 07:25:46,347] Trial 115 finished with value: 0.5418562531015337 and parameters: {'learning_rate': 0.0009793405608477534, 'weight_decay': 0.001, 'warmup_steps': 14}. Best is trial 53 with value: 0.5431165419813282.


Trial 116 with params: {'learning_rate': 0.0007487915576425608, 'weight_decay': 0.004, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7608,2.106392,0.4809,0.518936,0.4809,0.469918
2,1.8202,1.866995,0.5098,0.535243,0.5098,0.509139
3,1.6294,1.775327,0.531,0.540534,0.531,0.524298
4,1.5276,1.815191,0.5228,0.547009,0.5228,0.519292
5,1.4631,1.732274,0.5371,0.549394,0.5371,0.531411
6,1.4103,1.686474,0.5531,0.558799,0.5531,0.54874
7,1.3724,1.674874,0.5484,0.557854,0.5484,0.547375
8,1.345,1.713679,0.5442,0.548357,0.5442,0.539738
9,1.3206,1.724696,0.5409,0.551644,0.5409,0.537624
10,1.3091,1.708558,0.544,0.553148,0.544,0.542743


[I 2025-03-27 07:33:38,083] Trial 116 finished with value: 0.5427432288154459 and parameters: {'learning_rate': 0.0007487915576425608, 'weight_decay': 0.004, 'warmup_steps': 8}. Best is trial 53 with value: 0.5431165419813282.


Trial 117 with params: {'learning_rate': 0.00012486032116326294, 'weight_decay': 0.004, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.034,3.516353,0.3446,0.384582,0.3446,0.325881
2,3.1014,2.83027,0.4382,0.451138,0.4382,0.427532
3,2.621,2.521075,0.468,0.469986,0.468,0.45381


[I 2025-03-27 07:36:00,843] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0011442523167400499, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5351,1.984752,0.4913,0.538694,0.4913,0.48105
2,1.7137,1.819175,0.5173,0.544995,0.5173,0.516919
3,1.545,1.748981,0.5322,0.546134,0.5322,0.525729
4,1.4496,1.806862,0.5202,0.547712,0.5202,0.517399
5,1.3867,1.725379,0.5378,0.5539,0.5378,0.532982
6,1.3321,1.672398,0.5496,0.555211,0.5496,0.545324
7,1.2911,1.668744,0.5488,0.558804,0.5488,0.547918
8,1.2605,1.711471,0.5454,0.551004,0.5454,0.541036
9,1.2317,1.725338,0.5441,0.555673,0.5441,0.541033
10,1.2166,1.705072,0.5433,0.551553,0.5433,0.54206


[I 2025-03-27 07:43:55,094] Trial 118 finished with value: 0.5420600409827896 and parameters: {'learning_rate': 0.0011442523167400499, 'weight_decay': 0.007, 'warmup_steps': 9}. Best is trial 53 with value: 0.5431165419813282.


Trial 119 with params: {'learning_rate': 0.0007047155739188424, 'weight_decay': 0.005, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8061,2.131993,0.4789,0.516335,0.4789,0.467767
2,1.8409,1.877671,0.5077,0.532515,0.5077,0.507079
3,1.645,1.782623,0.5301,0.538881,0.5301,0.523168
4,1.5416,1.819472,0.5227,0.546341,0.5227,0.518888
5,1.4764,1.736107,0.5371,0.548328,0.5371,0.531059
6,1.4236,1.690572,0.5532,0.559125,0.5532,0.548955
7,1.3859,1.677812,0.5485,0.557891,0.5485,0.547447
8,1.3587,1.716288,0.5434,0.547268,0.5434,0.538962
9,1.3348,1.726684,0.5404,0.551325,0.5404,0.537264
10,1.3237,1.711066,0.5444,0.553163,0.5444,0.543042


[I 2025-03-27 07:51:46,650] Trial 119 finished with value: 0.543042418146909 and parameters: {'learning_rate': 0.0007047155739188424, 'weight_decay': 0.005, 'warmup_steps': 10}. Best is trial 53 with value: 0.5431165419813282.


Trial 120 with params: {'learning_rate': 0.0009801820835306787, 'weight_decay': 0.005, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6218,2.02178,0.4883,0.534701,0.4883,0.478318
2,1.7472,1.832075,0.5141,0.541521,0.5141,0.514015
3,1.5717,1.753903,0.5333,0.54454,0.5333,0.526627
4,1.4745,1.805665,0.5222,0.548417,0.5222,0.519055
5,1.4113,1.723853,0.5373,0.552335,0.5373,0.532348
6,1.3577,1.674697,0.5501,0.556335,0.5501,0.54583
7,1.318,1.66816,0.5498,0.559472,0.5498,0.548801
8,1.2889,1.709093,0.5437,0.548225,0.5437,0.539023
9,1.2619,1.722115,0.5429,0.553614,0.5429,0.539604
10,1.2482,1.70351,0.5431,0.552128,0.5431,0.541859


[I 2025-03-27 07:59:49,885] Trial 120 finished with value: 0.5418590074169033 and parameters: {'learning_rate': 0.0009801820835306787, 'weight_decay': 0.005, 'warmup_steps': 12}. Best is trial 53 with value: 0.5431165419813282.


Trial 121 with params: {'learning_rate': 0.0003968141929951229, 'weight_decay': 0.005, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2206,2.466821,0.4486,0.486175,0.4486,0.435596
2,2.1058,2.034377,0.4922,0.513866,0.4922,0.490902
3,1.836,1.89713,0.5182,0.523586,0.5182,0.510183
4,1.7079,1.899781,0.5133,0.534275,0.5133,0.508886
5,1.6314,1.808269,0.5315,0.538606,0.5315,0.524484
6,1.5749,1.759447,0.5454,0.548987,0.5454,0.540691


[I 2025-03-27 08:04:34,171] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.001076307344625229, 'weight_decay': 0.004, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5441,1.9958,0.4903,0.537267,0.4903,0.479983
2,1.7242,1.823269,0.5154,0.542822,0.5154,0.515288
3,1.5542,1.750113,0.5333,0.546185,0.5333,0.52697
4,1.4585,1.805652,0.5211,0.547546,0.5211,0.51806
5,1.3958,1.724112,0.5378,0.553645,0.5378,0.53305
6,1.3417,1.672913,0.5494,0.555138,0.5494,0.545118


[I 2025-03-27 08:09:18,647] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0006230789685131226, 'weight_decay': 0.006, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8797,2.186174,0.474,0.509398,0.474,0.462494
2,1.885,1.901729,0.506,0.530791,0.506,0.505695
3,1.6783,1.799766,0.5282,0.535665,0.5282,0.521141
4,1.5716,1.830375,0.5202,0.543512,0.5202,0.516594
5,1.505,1.745911,0.5357,0.546158,0.5357,0.529736
6,1.452,1.700515,0.5537,0.559517,0.5537,0.549299
7,1.4147,1.685488,0.5477,0.556504,0.5477,0.54656
8,1.388,1.723361,0.5421,0.545678,0.5421,0.53756
9,1.3649,1.732447,0.5388,0.549565,0.5388,0.535474
10,1.3546,1.717862,0.5429,0.551789,0.5429,0.541422


[I 2025-03-27 08:17:10,930] Trial 123 finished with value: 0.5414222549049778 and parameters: {'learning_rate': 0.0006230789685131226, 'weight_decay': 0.006, 'warmup_steps': 8}. Best is trial 53 with value: 0.5431165419813282.


Trial 124 with params: {'learning_rate': 0.0038399418051914304, 'weight_decay': 0.01, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3156,1.98647,0.4844,0.555836,0.4844,0.480251
2,1.7349,2.012311,0.4892,0.555845,0.4892,0.490028
3,1.5738,1.940552,0.5104,0.549098,0.5104,0.501129


[I 2025-03-27 08:19:32,878] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0011632024291385393, 'weight_decay': 0.005, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5241,1.981159,0.4917,0.53898,0.4917,0.481505
2,1.7104,1.818087,0.5181,0.545907,0.5181,0.517749
3,1.5424,1.748786,0.5328,0.547337,0.5328,0.526413
4,1.4472,1.807254,0.5203,0.548078,0.5203,0.517614
5,1.3843,1.725807,0.5383,0.554782,0.5383,0.533583
6,1.3296,1.672363,0.5496,0.555217,0.5496,0.545334
7,1.2884,1.66898,0.5489,0.55883,0.5489,0.548032
8,1.2577,1.711907,0.5452,0.550718,0.5452,0.54076
9,1.2287,1.725915,0.5436,0.555155,0.5436,0.540579
10,1.2134,1.705416,0.544,0.552206,0.544,0.542685


[I 2025-03-27 08:27:35,820] Trial 125 finished with value: 0.542684970999386 and parameters: {'learning_rate': 0.0011632024291385393, 'weight_decay': 0.005, 'warmup_steps': 8}. Best is trial 53 with value: 0.5431165419813282.


Trial 126 with params: {'learning_rate': 0.0030544790673010355, 'weight_decay': 0.005, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2859,1.940649,0.4927,0.559763,0.4927,0.487216
2,1.6826,1.91887,0.5011,0.552785,0.5011,0.501231
3,1.5241,1.858917,0.5204,0.552393,0.5204,0.511605


[I 2025-03-27 08:29:58,643] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0009549037217930623, 'weight_decay': 0.004, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6282,2.027823,0.4876,0.534087,0.4876,0.477848
2,1.7527,1.834594,0.5135,0.540494,0.5135,0.5133
3,1.5764,1.755176,0.5327,0.543516,0.5327,0.525906
4,1.479,1.805912,0.5223,0.548858,0.5223,0.519265
5,1.4158,1.72406,0.5373,0.552086,0.5373,0.532196
6,1.3623,1.675406,0.5502,0.556529,0.5502,0.546119
7,1.3228,1.668366,0.5496,0.559547,0.5496,0.548637
8,1.2939,1.70902,0.5436,0.548152,0.5436,0.539054
9,1.2672,1.721906,0.5425,0.553035,0.5425,0.539156


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-27 08:45:54,753] Trial 128 finished with value: 0.5418911785430032 and parameters: {'learning_rate': 0.000897368629012954, 'weight_decay': 0.005, 'warmup_steps': 18}. Best is trial 53 with value: 0.5431165419813282.


Trial 129 with params: {'learning_rate': 0.0005378047232046017, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9855,2.264342,0.4655,0.501499,0.4655,0.453952
2,1.9467,1.936879,0.5019,0.524832,0.5019,0.501392
3,1.7235,1.825361,0.5236,0.530986,0.5236,0.516543


[I 2025-03-27 08:48:18,217] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0008027687508333916, 'weight_decay': 0.01, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7717,2.089328,0.482,0.52315,0.482,0.471224
2,1.8041,1.857828,0.5094,0.535614,0.5094,0.508897
3,1.615,1.76881,0.5321,0.541997,0.5321,0.525267
4,1.5138,1.811635,0.5226,0.54719,0.5226,0.519078
5,1.4492,1.72893,0.5372,0.550322,0.5372,0.531509
6,1.3961,1.682536,0.552,0.557787,0.552,0.547686
7,1.3576,1.672202,0.5489,0.558107,0.5489,0.547839
8,1.3297,1.711477,0.5449,0.549188,0.5449,0.540349
9,1.3047,1.723037,0.5409,0.551894,0.5409,0.537745
10,1.2927,1.706235,0.5432,0.552634,0.5432,0.542027


[I 2025-03-27 08:56:06,998] Trial 130 finished with value: 0.5420266494144621 and parameters: {'learning_rate': 0.0008027687508333916, 'weight_decay': 0.01, 'warmup_steps': 22}. Best is trial 53 with value: 0.5431165419813282.


Trial 131 with params: {'learning_rate': 0.0010412517011412466, 'weight_decay': 0.004, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5802,2.005374,0.4896,0.535616,0.4896,0.478966
2,1.7327,1.826248,0.5144,0.541834,0.5144,0.514367
3,1.5604,1.751241,0.5331,0.544994,0.5331,0.526503
4,1.4641,1.805492,0.521,0.547793,0.521,0.51799
5,1.4011,1.723857,0.5379,0.553403,0.5379,0.533092
6,1.3472,1.673397,0.5495,0.555356,0.5495,0.545065


[I 2025-03-27 09:00:53,450] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.0009275611171579508, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6402,2.035443,0.488,0.533131,0.488,0.478106
2,1.7596,1.837673,0.5129,0.539634,0.5129,0.512586
3,1.582,1.75686,0.5331,0.54385,0.5331,0.526453
4,1.4842,1.806371,0.5225,0.548508,0.5225,0.519326
5,1.4209,1.724419,0.5373,0.552289,0.5373,0.532126
6,1.3676,1.676285,0.5511,0.55739,0.5511,0.547013
7,1.3283,1.66871,0.5498,0.559686,0.5498,0.548932
8,1.2997,1.709104,0.543,0.547571,0.543,0.538487
9,1.2733,1.721783,0.5426,0.553104,0.5426,0.539247
10,1.2602,1.703765,0.5437,0.552909,0.5437,0.542548


[I 2025-03-27 09:08:51,460] Trial 132 finished with value: 0.542547615222496 and parameters: {'learning_rate': 0.0009275611171579508, 'weight_decay': 0.005, 'warmup_steps': 9}. Best is trial 53 with value: 0.5431165419813282.


Trial 133 with params: {'learning_rate': 0.0015086904829558132, 'weight_decay': 0.006, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4072,1.942857,0.4942,0.548796,0.4942,0.485501
2,1.6721,1.811557,0.5183,0.549942,0.5183,0.518251
3,1.511,1.753894,0.5344,0.552915,0.5344,0.528031
4,1.4171,1.822305,0.5185,0.549616,0.5185,0.516205
5,1.3536,1.740252,0.5361,0.556571,0.5361,0.532219
6,1.2961,1.677089,0.5471,0.553595,0.5471,0.543212


[I 2025-03-27 09:13:35,765] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0007731204532210576, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7448,2.094897,0.4817,0.521216,0.4817,0.470829
2,1.8105,1.862009,0.5098,0.53532,0.5098,0.509239
3,1.6218,1.771941,0.531,0.541005,0.531,0.524387
4,1.5207,1.813283,0.5225,0.546794,0.5225,0.519063
5,1.4563,1.730566,0.5371,0.549798,0.5371,0.531367
6,1.4036,1.684553,0.5517,0.557592,0.5517,0.547414
7,1.3654,1.673551,0.5488,0.558291,0.5488,0.547775
8,1.3379,1.712538,0.544,0.548374,0.544,0.539577
9,1.3132,1.723831,0.5405,0.551385,0.5405,0.537268
10,1.3015,1.707439,0.5433,0.552552,0.5433,0.542118


[I 2025-03-27 09:21:29,461] Trial 134 finished with value: 0.5421179487097323 and parameters: {'learning_rate': 0.0007731204532210576, 'weight_decay': 0.005, 'warmup_steps': 9}. Best is trial 53 with value: 0.5431165419813282.


Trial 135 with params: {'learning_rate': 0.0009180524138142814, 'weight_decay': 0.005, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6387,2.037497,0.4873,0.531593,0.4873,0.477187
2,1.7616,1.8387,0.5128,0.539464,0.5128,0.512494
3,1.5838,1.757452,0.533,0.543881,0.533,0.526433
4,1.4859,1.806537,0.5224,0.548277,0.5224,0.519239
5,1.4227,1.724551,0.537,0.551785,0.537,0.531733
6,1.3695,1.676636,0.5511,0.557232,0.5511,0.54693
7,1.3303,1.668855,0.5498,0.559532,0.5498,0.548873
8,1.3017,1.709139,0.543,0.547574,0.543,0.538516
9,1.2755,1.721759,0.5424,0.55302,0.5424,0.539097
10,1.2624,1.703873,0.5436,0.552636,0.5436,0.542314


[I 2025-03-27 09:29:20,953] Trial 135 finished with value: 0.5423142624010093 and parameters: {'learning_rate': 0.0009180524138142814, 'weight_decay': 0.005, 'warmup_steps': 7}. Best is trial 53 with value: 0.5431165419813282.


Trial 136 with params: {'learning_rate': 8.251692766362866e-05, 'weight_decay': 0.007, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2112,3.826159,0.2874,0.328658,0.2874,0.274294
2,3.4861,3.229832,0.4011,0.414199,0.4011,0.390689
3,3.0292,2.900081,0.4379,0.441837,0.4379,0.421562


[I 2025-03-27 09:31:41,862] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0015212048832530319, 'weight_decay': 0.005, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4234,1.943329,0.4944,0.548043,0.4944,0.485341
2,1.6725,1.811866,0.5179,0.549937,0.5179,0.517878
3,1.5108,1.754421,0.5339,0.551823,0.5339,0.52727
4,1.4167,1.823276,0.5184,0.549589,0.5184,0.516255
5,1.3529,1.741104,0.5362,0.55693,0.5362,0.53231
6,1.2953,1.677424,0.5469,0.553405,0.5469,0.543004


[I 2025-03-27 09:36:22,346] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.0008080784719429661, 'weight_decay': 0.002, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7368,2.081922,0.4828,0.523362,0.4828,0.472153
2,1.7989,1.855913,0.5095,0.535754,0.5095,0.509154
3,1.6122,1.767796,0.5328,0.542642,0.5328,0.526004
4,1.5117,1.811095,0.5224,0.547085,0.5224,0.518928
5,1.4474,1.728549,0.5374,0.550455,0.5374,0.531749
6,1.3945,1.682157,0.5521,0.557995,0.5521,0.547827
7,1.3561,1.67195,0.549,0.558272,0.549,0.547994
8,1.3283,1.711266,0.5444,0.548634,0.5444,0.539855
9,1.3032,1.722899,0.5413,0.552297,0.5413,0.538196
10,1.2911,1.706093,0.5433,0.552812,0.5433,0.542172


[I 2025-03-27 09:44:14,012] Trial 138 finished with value: 0.5421717200990623 and parameters: {'learning_rate': 0.0008080784719429661, 'weight_decay': 0.002, 'warmup_steps': 14}. Best is trial 53 with value: 0.5431165419813282.


Trial 139 with params: {'learning_rate': 0.0006506569873610713, 'weight_decay': 0.002, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8879,2.173075,0.4756,0.511842,0.4756,0.464076
2,1.8729,1.894104,0.5068,0.531862,0.5068,0.506406
3,1.6677,1.79389,0.5287,0.536616,0.5287,0.521669


[I 2025-03-27 09:46:34,622] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.00011207516388168525, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0537,3.581161,0.3364,0.375393,0.3364,0.318653
2,3.1882,2.922336,0.4309,0.445718,0.4309,0.42005
3,2.7157,2.608085,0.4611,0.463882,0.4611,0.44613
4,2.4461,2.455689,0.4584,0.475445,0.4584,0.447379
5,2.2831,2.319593,0.4816,0.484618,0.4816,0.468177
6,2.1757,2.232714,0.499,0.494924,0.499,0.488684


[I 2025-03-27 09:51:14,786] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.001108117376619337, 'weight_decay': 0.003, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5397,1.990313,0.4913,0.538442,0.4913,0.48086
2,1.7191,1.821153,0.5165,0.543917,0.5165,0.516188
3,1.5497,1.749425,0.5321,0.545602,0.5321,0.525727
4,1.4542,1.806127,0.5203,0.547151,0.5203,0.517333
5,1.3914,1.724602,0.5381,0.554171,0.5381,0.533255
6,1.337,1.672607,0.5496,0.555402,0.5496,0.545365
7,1.2963,1.668343,0.5492,0.559255,0.5492,0.548399
8,1.2662,1.710639,0.5447,0.549956,0.5447,0.54038
9,1.2378,1.724364,0.5443,0.555626,0.5443,0.541188
10,1.223,1.704508,0.5433,0.551356,0.5433,0.541992


[I 2025-03-27 09:59:06,000] Trial 141 finished with value: 0.5419922451535014 and parameters: {'learning_rate': 0.001108117376619337, 'weight_decay': 0.003, 'warmup_steps': 6}. Best is trial 53 with value: 0.5431165419813282.


Trial 142 with params: {'learning_rate': 0.001060518148362699, 'weight_decay': 0.004, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5677,2.000703,0.4902,0.536789,0.4902,0.47973
2,1.7285,1.824657,0.5144,0.54204,0.5144,0.514383
3,1.5572,1.750612,0.5331,0.545407,0.5331,0.526596
4,1.4611,1.805604,0.5209,0.5476,0.5209,0.517858
5,1.3982,1.723996,0.5377,0.55346,0.5377,0.532929
6,1.3441,1.673105,0.5495,0.55501,0.5495,0.545068
7,1.3038,1.668049,0.5492,0.55931,0.5492,0.548482
8,1.2741,1.709827,0.5446,0.549544,0.5446,0.540071
9,1.2462,1.723304,0.5433,0.554234,0.5433,0.540128
10,1.2318,1.703933,0.5432,0.551476,0.5432,0.541903


[I 2025-03-27 10:06:54,835] Trial 142 finished with value: 0.541903431467526 and parameters: {'learning_rate': 0.001060518148362699, 'weight_decay': 0.004, 'warmup_steps': 8}. Best is trial 53 with value: 0.5431165419813282.


Trial 143 with params: {'learning_rate': 0.0006975261685678177, 'weight_decay': 0.002, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8126,2.136326,0.4789,0.515928,0.4789,0.467778
2,1.8445,1.879546,0.5075,0.532457,0.5075,0.506944
3,1.6476,1.783926,0.5298,0.538762,0.5298,0.523003


[I 2025-03-27 10:09:17,376] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.0008200401116663501, 'weight_decay': 0.004, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7282,2.076803,0.4832,0.524391,0.4832,0.472561
2,1.7946,1.853782,0.51,0.536208,0.51,0.509615
3,1.6089,1.766459,0.5323,0.542039,0.5323,0.525496
4,1.5086,1.81042,0.5223,0.546974,0.5223,0.518716
5,1.4445,1.727943,0.5373,0.55071,0.5373,0.531795
6,1.3915,1.681396,0.5518,0.557685,0.5518,0.547556
7,1.353,1.671499,0.5491,0.558383,0.5491,0.548084
8,1.3251,1.710904,0.5443,0.548558,0.5443,0.539757
9,1.2999,1.722644,0.5416,0.55247,0.5416,0.538433
10,1.2878,1.70572,0.5428,0.552092,0.5428,0.541641


[I 2025-03-27 10:17:07,369] Trial 144 finished with value: 0.541640764979319 and parameters: {'learning_rate': 0.0008200401116663501, 'weight_decay': 0.004, 'warmup_steps': 14}. Best is trial 53 with value: 0.5431165419813282.


Trial 145 with params: {'learning_rate': 0.00031820337008017877, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3508,2.628218,0.4346,0.476239,0.4346,0.420163
2,2.241,2.126024,0.4832,0.503381,0.4832,0.480317
3,1.934,1.966236,0.5103,0.514849,0.5103,0.501658


[I 2025-03-27 10:19:27,892] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0006563846861062976, 'weight_decay': 0.003, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8486,2.1624,0.4763,0.511674,0.4763,0.464754
2,1.8657,1.891025,0.5073,0.532479,0.5073,0.506973
3,1.6638,1.792086,0.5289,0.537001,0.5289,0.521938
4,1.5586,1.825377,0.5213,0.544735,0.5213,0.517684
5,1.4926,1.741418,0.5364,0.547167,0.5364,0.53035
6,1.4398,1.696013,0.5528,0.558378,0.5528,0.548454
7,1.4023,1.681978,0.5479,0.557174,0.5479,0.546861
8,1.3754,1.720067,0.5424,0.545887,0.5424,0.537804
9,1.352,1.72972,0.5398,0.550755,0.5398,0.536472
10,1.3414,1.714701,0.5433,0.552121,0.5433,0.54179


[I 2025-03-27 10:27:16,366] Trial 146 finished with value: 0.541790433458555 and parameters: {'learning_rate': 0.0006563846861062976, 'weight_decay': 0.003, 'warmup_steps': 9}. Best is trial 53 with value: 0.5431165419813282.


Trial 147 with params: {'learning_rate': 0.0004109755218575589, 'weight_decay': 0.001, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.201,2.443316,0.4509,0.488793,0.4509,0.438118
2,2.0866,2.021791,0.4929,0.514864,0.4929,0.491753
3,1.8221,1.887655,0.5185,0.524168,0.5185,0.510544


[I 2025-03-27 10:29:36,015] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0012654142752276925, 'weight_decay': 0.001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4942,1.966831,0.4922,0.542861,0.4922,0.482354
2,1.6966,1.814022,0.518,0.547152,0.518,0.517962
3,1.5308,1.748888,0.5337,0.550168,0.5337,0.527386
4,1.4361,1.810481,0.521,0.550129,0.521,0.51843
5,1.373,1.729006,0.5379,0.556396,0.5379,0.53345
6,1.3175,1.672747,0.5489,0.554458,0.5489,0.544677


[I 2025-03-27 10:34:19,828] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0013581673151824224, 'weight_decay': 0.003, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4689,1.95657,0.4923,0.545856,0.4923,0.482804
2,1.6864,1.812019,0.518,0.548718,0.518,0.518284
3,1.5223,1.750192,0.535,0.552113,0.535,0.528476
4,1.4279,1.814465,0.5203,0.550231,0.5203,0.517937
5,1.3645,1.732801,0.5371,0.556802,0.5371,0.533058
6,1.3082,1.673868,0.5476,0.553784,0.5476,0.543542


[I 2025-03-27 10:39:05,651] Trial 149 pruned. 


In [None]:
print(best_base_head)

BestRun(run_id='53', objective=0.5431165419813282, hyperparameters={'learning_rate': 0.0007167369272358462, 'weight_decay': 0.001, 'warmup_steps': 9}, run_summary=None)


In [None]:
base.reset_seed()

## Prohledávání s destilací s doučením klasifikační hlavy předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [None]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-head-KD_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-head-KD_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [None]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [None]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [None]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.freeze_model(base.get_mobilenet(100))
)

Nastavení prohledávání.

In [None]:
best_distill_head = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-03-27 10:39:06,404] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3604,2.799643,0.4216,0.48553,0.4216,0.405836
2,2.5178,2.410802,0.484,0.50074,0.484,0.477515
3,2.2884,2.284865,0.5061,0.510731,0.5061,0.493501


[I 2025-03-27 10:41:33,114] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8355,3.467639,0.3098,0.37437,0.3098,0.295461
2,3.1781,2.976775,0.4171,0.447151,0.4171,0.410129
3,2.8313,2.743736,0.4469,0.459956,0.4469,0.431014
4,2.6351,2.636091,0.4489,0.476255,0.4489,0.435896
5,2.5185,2.531711,0.4711,0.489057,0.4711,0.456564
6,2.4413,2.464776,0.4875,0.488012,0.4875,0.474211


[I 2025-03-27 10:46:14,100] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0157,3.779981,0.1983,0.254988,0.1983,0.188885
2,3.5734,3.404327,0.3275,0.376322,0.3275,0.325139
3,3.2753,3.175479,0.3856,0.404214,0.3856,0.370861


[I 2025-03-27 10:48:38,168] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7726,3.379902,0.3284,0.395512,0.3284,0.312616
2,3.0813,2.884003,0.4295,0.455578,0.4295,0.42174
3,2.7414,2.662977,0.4568,0.467833,0.4568,0.441326
4,2.5568,2.569484,0.4572,0.48197,0.4572,0.444695
5,2.4497,2.470266,0.4768,0.493389,0.4768,0.462171
6,2.3789,2.407724,0.495,0.495925,0.495,0.482908


[I 2025-03-27 10:53:19,210] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7768,2.322045,0.4859,0.52349,0.4859,0.471191
2,2.1579,2.181485,0.5196,0.537551,0.5196,0.517037
3,2.0477,2.11621,0.5305,0.537256,0.5305,0.521062
4,1.9914,2.146316,0.523,0.541893,0.523,0.517223
5,1.9584,2.086477,0.5365,0.548262,0.5365,0.529385
6,1.9268,2.046953,0.5467,0.548855,0.5467,0.540762


[I 2025-03-27 10:58:02,538] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5722,2.221004,0.5037,0.544409,0.5037,0.497262
2,2.1032,2.157735,0.518,0.551647,0.518,0.517357
3,2.0244,2.110773,0.5332,0.547131,0.5332,0.523623
4,1.9765,2.147383,0.522,0.547077,0.522,0.516671
5,1.9476,2.093175,0.5327,0.54978,0.5327,0.52663
6,1.9128,2.040677,0.5492,0.558593,0.5492,0.544619
7,1.8887,2.025416,0.5501,0.554127,0.5501,0.545469
8,1.8653,2.067495,0.5406,0.546059,0.5406,0.53385
9,1.8412,2.068886,0.5423,0.550838,0.5423,0.537199
10,1.8255,2.054191,0.5414,0.548007,0.5414,0.538008


[I 2025-03-27 11:05:55,986] Trial 5 finished with value: 0.538007698825947 and parameters: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 5 with value: 0.538007698825947.


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7673,2.324297,0.485,0.522178,0.485,0.470138
2,2.1597,2.182842,0.5196,0.537505,0.5196,0.517061
3,2.0493,2.117185,0.531,0.53778,0.531,0.521721
4,1.9927,2.146809,0.5233,0.542085,0.5233,0.517549
5,1.9598,2.087037,0.5361,0.547832,0.5361,0.528963
6,1.9281,2.047771,0.5468,0.548889,0.5468,0.540853


[I 2025-03-27 11:10:38,799] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5264,2.213896,0.5048,0.549118,0.5048,0.499187
2,2.1051,2.164373,0.5142,0.550415,0.5142,0.51341
3,2.03,2.116792,0.5321,0.548219,0.5321,0.523046
4,1.9828,2.149021,0.5216,0.544768,0.5216,0.515756
5,1.9538,2.097429,0.5314,0.549845,0.5314,0.525354
6,1.9181,2.044687,0.5481,0.558503,0.5481,0.543339
7,1.8927,2.028322,0.5495,0.55418,0.5495,0.544928
8,1.8678,2.068547,0.5399,0.545993,0.5399,0.533286
9,1.8418,2.070729,0.5413,0.550515,0.5413,0.536309
10,1.8244,2.054204,0.5413,0.54813,0.5413,0.53794


[I 2025-03-27 11:18:40,361] Trial 7 finished with value: 0.5379401380489027 and parameters: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 5 with value: 0.538007698825947.


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8575,3.534628,0.29,0.349615,0.29,0.276404
2,3.268,3.075189,0.3983,0.43116,0.3983,0.39168
3,2.9329,2.840691,0.4351,0.449278,0.4351,0.419266
4,2.7318,2.721439,0.4385,0.470521,0.4385,0.42507
5,2.6078,2.613294,0.4597,0.480261,0.4597,0.444884
6,2.5245,2.542192,0.4781,0.480687,0.4781,0.463671


[I 2025-03-27 11:23:26,513] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7145,2.281184,0.4944,0.536691,0.4944,0.481787
2,2.1284,2.163613,0.5184,0.53899,0.5184,0.516303
3,2.029,2.104565,0.5326,0.541919,0.5326,0.523617
4,1.9762,2.142502,0.5211,0.543857,0.5211,0.515652
5,1.9451,2.08146,0.5368,0.549169,0.5368,0.530246
6,1.9132,2.038695,0.548,0.551191,0.548,0.542293
7,1.8931,2.024607,0.552,0.55319,0.552,0.546672
8,1.8752,2.068021,0.54,0.543505,0.54,0.532996
9,1.8579,2.069016,0.5427,0.550292,0.5427,0.537587
10,1.8486,2.060781,0.5398,0.546785,0.5398,0.536244


[I 2025-03-27 11:31:18,706] Trial 9 finished with value: 0.5362444082996994 and parameters: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 5 with value: 0.538007698825947.


Trial 10 with params: {'learning_rate': 0.003553256925699131, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5491,2.23313,0.5094,0.550796,0.5094,0.503009
2,2.1782,2.195676,0.5125,0.548786,0.5125,0.510838
3,2.1097,2.189407,0.5246,0.559178,0.5246,0.518369
4,2.0587,2.221045,0.5107,0.54314,0.5107,0.505816
5,2.025,2.138141,0.5262,0.552264,0.5262,0.519985
6,1.9788,2.075727,0.5404,0.55009,0.5404,0.532945
7,1.942,2.058319,0.5421,0.55232,0.5421,0.538185
8,1.9044,2.081481,0.5392,0.547305,0.5392,0.532764
9,1.8638,2.087574,0.5401,0.551714,0.5401,0.534916
10,1.833,2.059893,0.5386,0.544435,0.5386,0.535201


[I 2025-03-27 11:39:12,049] Trial 10 finished with value: 0.5352005832920134 and parameters: {'learning_rate': 0.003553256925699131, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 5 with value: 0.538007698825947.


Trial 11 with params: {'learning_rate': 0.0023774407201803105, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5116,2.210453,0.5058,0.549579,0.5058,0.50014
2,2.1146,2.174118,0.5102,0.550512,0.5102,0.510192
3,2.0425,2.12789,0.5318,0.552932,0.5318,0.523582
4,1.9954,2.15583,0.5212,0.542434,0.5212,0.515181
5,1.9657,2.104086,0.5314,0.554036,0.5314,0.526049
6,1.9284,2.051733,0.5474,0.559024,0.5474,0.542525
7,1.9007,2.033993,0.5478,0.554714,0.5478,0.543485
8,1.8734,2.07056,0.5392,0.545862,0.5392,0.532557
9,1.8446,2.07427,0.5402,0.550098,0.5402,0.535112
10,1.8246,2.054954,0.541,0.54777,0.541,0.537791


[I 2025-03-27 11:47:08,999] Trial 11 finished with value: 0.5377914403176043 and parameters: {'learning_rate': 0.0023774407201803105, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 5 with value: 0.538007698825947.


Trial 12 with params: {'learning_rate': 0.002376024890572026, 'weight_decay': 0.001, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5334,2.212097,0.506,0.546906,0.506,0.500368
2,2.116,2.174897,0.5107,0.55077,0.5107,0.510525
3,2.0432,2.128299,0.5311,0.551828,0.5311,0.522748
4,1.996,2.156241,0.521,0.542304,0.521,0.515102
5,1.966,2.104338,0.531,0.55371,0.531,0.525635
6,1.9287,2.051983,0.5468,0.558381,0.5468,0.541945
7,1.901,2.034217,0.548,0.555132,0.548,0.543803
8,1.8736,2.070714,0.5386,0.545438,0.5386,0.532082
9,1.8448,2.074471,0.5401,0.550126,0.5401,0.535026
10,1.8246,2.055081,0.5407,0.547314,0.5407,0.537416


[I 2025-03-27 11:55:05,086] Trial 12 finished with value: 0.5374159522454527 and parameters: {'learning_rate': 0.002376024890572026, 'weight_decay': 0.001, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 5 with value: 0.538007698825947.


Trial 13 with params: {'learning_rate': 0.003064104261670614, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5315,2.21513,0.5112,0.550038,0.5112,0.504821
2,2.1488,2.186102,0.5116,0.551317,0.5116,0.511886
3,2.0798,2.159104,0.5309,0.557583,0.5309,0.524468
4,2.0305,2.188634,0.5144,0.541225,0.5144,0.50914
5,1.9985,2.12167,0.5303,0.554813,0.5303,0.524511
6,1.9566,2.066877,0.5419,0.553107,0.5419,0.535575


[I 2025-03-27 11:59:52,545] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.003645100232010343, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5635,2.238581,0.5097,0.551448,0.5097,0.503357
2,2.1849,2.198797,0.5121,0.548485,0.5121,0.510209
3,2.1161,2.196207,0.5246,0.55965,0.5246,0.518352
4,2.0647,2.227881,0.5103,0.544216,0.5103,0.505574
5,2.0304,2.141858,0.5252,0.551307,0.5252,0.51879
6,1.9834,2.077591,0.5405,0.55022,0.5405,0.532862


[I 2025-03-27 12:04:34,485] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0003173012733215097, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2355,2.702993,0.436,0.48794,0.436,0.419981
2,2.4451,2.363348,0.4912,0.506939,0.4912,0.485257
3,2.2426,2.251251,0.51,0.51462,0.51,0.498021


[I 2025-03-27 12:06:58,572] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0007549727386624846, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8036,2.34468,0.4821,0.52137,0.4821,0.467041
2,2.1744,2.191613,0.5169,0.534182,0.5169,0.51411
3,2.059,2.123355,0.5297,0.536105,0.5297,0.520463
4,2.0006,2.15024,0.5239,0.541972,0.5239,0.517949
5,1.9668,2.090539,0.536,0.547473,0.536,0.528761
6,1.935,2.052063,0.5463,0.547913,0.5463,0.540128
7,1.9154,2.036384,0.5483,0.548568,0.5483,0.542728
8,1.8991,2.076468,0.5377,0.540028,0.5377,0.530575
9,1.8839,2.078648,0.5395,0.547229,0.5395,0.534375
10,1.877,2.07192,0.5384,0.545936,0.5384,0.534779


[I 2025-03-27 12:14:52,850] Trial 16 finished with value: 0.5347788143846123 and parameters: {'learning_rate': 0.0007549727386624846, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 5 with value: 0.538007698825947.


Trial 17 with params: {'learning_rate': 0.00432979299982574, 'weight_decay': 0.006, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5477,2.275984,0.5027,0.544894,0.5027,0.496701
2,2.2273,2.231897,0.511,0.548814,0.511,0.508061
3,2.1575,2.244281,0.5162,0.561419,0.5162,0.508697


[I 2025-03-27 12:17:15,099] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.004952142492162866, 'weight_decay': 0.004, 'warmup_steps': 16, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.592,2.325087,0.4947,0.546579,0.4947,0.490427
2,2.2731,2.269258,0.5069,0.5489,0.5069,0.504131
3,2.1992,2.283733,0.5064,0.55701,0.5064,0.498953


[I 2025-03-27 12:19:38,677] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0024742711486919313, 'weight_decay': 0.0, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5235,2.210864,0.5069,0.547662,0.5069,0.501219
2,2.1194,2.176828,0.5104,0.54994,0.5104,0.510369
3,2.0475,2.131968,0.5314,0.553177,0.5314,0.523327
4,2.0003,2.159365,0.5207,0.541925,0.5207,0.514681
5,1.9701,2.10646,0.532,0.554499,0.532,0.526626
6,1.9322,2.054239,0.5464,0.558138,0.5464,0.541366
7,1.9038,2.036227,0.5473,0.553818,0.5473,0.542851
8,1.8757,2.071433,0.5387,0.545729,0.5387,0.532195
9,1.8459,2.07562,0.5402,0.550235,0.5402,0.535057
10,1.8249,2.05537,0.5401,0.546573,0.5401,0.536778


[I 2025-03-27 12:27:36,188] Trial 19 finished with value: 0.5367782559264244 and parameters: {'learning_rate': 0.0024742711486919313, 'weight_decay': 0.0, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 5 with value: 0.538007698825947.


Trial 20 with params: {'learning_rate': 0.00018075272535631178, 'weight_decay': 0.001, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5959,3.098129,0.3772,0.447668,0.3772,0.361593
2,2.7826,2.613241,0.4586,0.476227,0.4586,0.449896
3,2.4815,2.438207,0.4871,0.49313,0.4871,0.472983


[I 2025-03-27 12:30:00,151] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.0005462000041391629, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9545,2.445713,0.468,0.507978,0.468,0.452233
2,2.2484,2.235912,0.5092,0.52596,0.5092,0.505555
3,2.1088,2.156168,0.5234,0.529069,0.5234,0.513322
4,2.0412,2.172138,0.5179,0.534524,0.5179,0.511414
5,2.003,2.111154,0.5313,0.542121,0.5313,0.523173
6,1.97,2.074341,0.5412,0.541845,0.5412,0.53456


[I 2025-03-27 12:34:50,635] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.002311646729811649, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5234,2.211762,0.5061,0.549268,0.5061,0.500608
2,2.1129,2.172814,0.5105,0.550393,0.5105,0.510242
3,2.0399,2.125623,0.5314,0.55079,0.5314,0.523038
4,1.9928,2.154082,0.5216,0.543357,0.5216,0.515814
5,1.9632,2.102832,0.5316,0.553382,0.5316,0.526099
6,1.9262,2.05037,0.5482,0.560371,0.5482,0.54353
7,1.899,2.032799,0.5479,0.554459,0.5479,0.543642
8,1.8722,2.070186,0.5386,0.545064,0.5386,0.532004
9,1.844,2.073618,0.5404,0.550178,0.5404,0.535343
10,1.8245,2.054837,0.5413,0.547865,0.5413,0.53795


[I 2025-03-27 12:42:45,005] Trial 22 finished with value: 0.53794992433827 and parameters: {'learning_rate': 0.002311646729811649, 'weight_decay': 0.005, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 5 with value: 0.538007698825947.


Trial 23 with params: {'learning_rate': 0.0022850225841205478, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5127,2.210798,0.5049,0.54817,0.5049,0.49925
2,2.1112,2.171581,0.5107,0.550264,0.5107,0.510407
3,2.0384,2.124261,0.5311,0.549571,0.5311,0.522582
4,1.9913,2.15296,0.5223,0.543962,0.5223,0.516351
5,1.9619,2.102079,0.5315,0.552968,0.5315,0.525958
6,1.9251,2.049459,0.5485,0.560333,0.5485,0.543831
7,1.8981,2.032133,0.5484,0.554759,0.5484,0.544039
8,1.8716,2.069855,0.5389,0.545419,0.5389,0.532336
9,1.8436,2.073117,0.5405,0.550453,0.5405,0.535442
10,1.8244,2.05467,0.5412,0.547997,0.5412,0.537961


[I 2025-03-27 12:50:34,437] Trial 23 finished with value: 0.537961362041645 and parameters: {'learning_rate': 0.0022850225841205478, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 5 with value: 0.538007698825947.


Trial 24 with params: {'learning_rate': 0.0012813137553340846, 'weight_decay': 0.006, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6244,2.252784,0.4991,0.540315,0.4991,0.48844
2,2.1094,2.154201,0.5176,0.54216,0.5176,0.516225
3,2.0195,2.099625,0.5351,0.546853,0.5351,0.526346
4,1.9692,2.143096,0.5222,0.547749,0.5222,0.517178
5,1.9396,2.081656,0.5365,0.549971,0.5365,0.530129
6,1.9073,2.0353,0.5494,0.553741,0.5494,0.543885
7,1.8865,2.02193,0.5516,0.553492,0.5516,0.546505
8,1.8673,2.066175,0.5402,0.544706,0.5402,0.533336
9,1.8482,2.066498,0.5426,0.549572,0.5426,0.537397
10,1.8373,2.056917,0.5414,0.548535,0.5414,0.537994


[I 2025-03-27 12:58:27,067] Trial 24 finished with value: 0.5379940331437115 and parameters: {'learning_rate': 0.0012813137553340846, 'weight_decay': 0.006, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 5 with value: 0.538007698825947.


Trial 25 with params: {'learning_rate': 0.001187006474415616, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6509,2.262241,0.4972,0.5382,0.4972,0.485782
2,2.1153,2.157018,0.5178,0.540152,0.5178,0.515863
3,2.0222,2.100779,0.5346,0.54491,0.5346,0.525721
4,1.9711,2.142453,0.5224,0.546269,0.5224,0.517318
5,1.941,2.080986,0.5362,0.548633,0.5362,0.529641
6,1.9089,2.036145,0.5485,0.551817,0.5485,0.542783
7,1.8885,2.022625,0.5521,0.55361,0.5521,0.546907
8,1.8699,2.066658,0.5405,0.544663,0.5405,0.533627
9,1.8515,2.067174,0.5427,0.549977,0.5427,0.537564
10,1.8413,2.058209,0.5403,0.547199,0.5403,0.536689


[I 2025-03-27 13:06:20,608] Trial 25 finished with value: 0.5366887754445997 and parameters: {'learning_rate': 0.001187006474415616, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}. Best is trial 5 with value: 0.538007698825947.


Trial 26 with params: {'learning_rate': 0.0037141661801770283, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5251,2.238225,0.5073,0.548848,0.5073,0.500961
2,2.1859,2.200113,0.5121,0.548023,0.5121,0.509814
3,2.1182,2.198679,0.5244,0.558897,0.5244,0.518025


[I 2025-03-27 13:08:41,546] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0036508087485551365, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5231,2.234794,0.5079,0.548007,0.5079,0.501052
2,2.1822,2.197838,0.5126,0.548673,0.5126,0.510564
3,2.1144,2.194713,0.5243,0.559251,0.5243,0.518008
4,2.0634,2.226277,0.5104,0.544096,0.5104,0.505709
5,2.0294,2.140979,0.5262,0.551973,0.5262,0.519763
6,1.9824,2.077157,0.5403,0.54985,0.5403,0.532623


[I 2025-03-27 13:13:25,132] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.00038989070394618964, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1335,2.594218,0.4499,0.495815,0.4499,0.434348
2,2.3596,2.305817,0.4993,0.515488,0.4993,0.494492
3,2.1838,2.20838,0.5169,0.522143,0.5169,0.505905
4,2.1019,2.210893,0.5123,0.528337,0.5123,0.504791
5,2.0566,2.146465,0.5267,0.537597,0.5267,0.517807
6,2.0208,2.108679,0.5365,0.536791,0.5365,0.529298


[I 2025-03-27 13:18:06,911] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.004518194788385949, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5543,2.289522,0.5,0.542609,0.5,0.494334
2,2.2401,2.243396,0.51,0.549398,0.51,0.507136
3,2.1694,2.257005,0.5136,0.560819,0.5136,0.506064


[I 2025-03-27 13:20:29,628] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0012988238377680513, 'weight_decay': 0.003, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6778,2.256997,0.498,0.541653,0.498,0.487409
2,2.1126,2.155198,0.5175,0.542512,0.5175,0.516144
3,2.021,2.100489,0.5349,0.546521,0.5349,0.525946
4,1.9702,2.143943,0.5227,0.54865,0.5227,0.517731
5,1.9402,2.082407,0.5366,0.549809,0.5366,0.53018
6,1.9077,2.035582,0.549,0.553379,0.549,0.543517
7,1.8866,2.02217,0.5518,0.553948,0.5518,0.546798
8,1.8672,2.066431,0.5407,0.545211,0.5407,0.533767
9,1.8479,2.066735,0.5429,0.549897,0.5429,0.53765
10,1.8367,2.056865,0.5418,0.54886,0.5418,0.538354


[I 2025-03-27 13:28:25,052] Trial 30 finished with value: 0.5383544360502914 and parameters: {'learning_rate': 0.0012988238377680513, 'weight_decay': 0.003, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}. Best is trial 30 with value: 0.5383544360502914.


Trial 31 with params: {'learning_rate': 0.00274252687891618, 'weight_decay': 0.004, 'warmup_steps': 25, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5581,2.212842,0.5085,0.546538,0.5085,0.501799
2,2.134,2.182003,0.5095,0.549608,0.5095,0.50983
3,2.0627,2.144003,0.5328,0.556564,0.5328,0.525512
4,2.0144,2.171917,0.5184,0.541058,0.5184,0.512802
5,1.9831,2.113268,0.532,0.556037,0.532,0.52657
6,1.9435,2.060728,0.5443,0.555799,0.5443,0.538605
7,1.913,2.042161,0.5459,0.553357,0.5459,0.54157
8,1.8824,2.073651,0.5392,0.546055,0.5392,0.532732
9,1.8499,2.079021,0.5404,0.550742,0.5404,0.535093
10,1.8264,2.056438,0.5395,0.545806,0.5395,0.536236


[I 2025-03-27 13:36:18,359] Trial 31 finished with value: 0.536236153440743 and parameters: {'learning_rate': 0.00274252687891618, 'weight_decay': 0.004, 'warmup_steps': 25, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}. Best is trial 30 with value: 0.5383544360502914.


Trial 32 with params: {'learning_rate': 0.0005894105340013413, 'weight_decay': 0.004, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.983,2.432006,0.4693,0.509762,0.4693,0.453631
2,2.2358,2.226647,0.5107,0.528416,0.5107,0.507532
3,2.098,2.148399,0.5259,0.531625,0.5259,0.515978


[I 2025-03-27 13:38:40,736] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0006295332127232472, 'weight_decay': 0.0, 'warmup_steps': 27, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9494,2.40898,0.4728,0.512289,0.4728,0.457336
2,2.2192,2.216797,0.512,0.530313,0.512,0.508956
3,2.0871,2.141095,0.5276,0.533154,0.5276,0.517833
4,2.0228,2.16157,0.5197,0.537023,0.5197,0.51359
5,1.9861,2.101127,0.5336,0.544458,0.5336,0.525956
6,1.9536,2.063776,0.5446,0.545951,0.5446,0.53826


[I 2025-03-27 13:43:22,334] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.001172701360050942, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7189,2.270824,0.4959,0.539608,0.4959,0.484205
2,2.1212,2.159131,0.5168,0.539789,0.5168,0.515
3,2.0249,2.101969,0.534,0.544162,0.534,0.52506
4,1.9728,2.143134,0.5225,0.546194,0.5225,0.517128
5,1.9422,2.081497,0.5361,0.549095,0.5361,0.529709
6,1.9099,2.036733,0.5488,0.552167,0.5488,0.543144
7,1.8894,2.023053,0.5513,0.552871,0.5513,0.546113
8,1.8707,2.067044,0.5411,0.545124,0.5411,0.534179
9,1.8523,2.067586,0.543,0.550359,0.543,0.537834
10,1.8421,2.058597,0.5407,0.547659,0.5407,0.537176


[I 2025-03-27 13:51:16,166] Trial 34 finished with value: 0.537175858172691 and parameters: {'learning_rate': 0.001172701360050942, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 4.0}. Best is trial 30 with value: 0.5383544360502914.


Trial 35 with params: {'learning_rate': 0.0012466319680832965, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7064,2.263658,0.4972,0.538964,0.4972,0.485938
2,2.1168,2.156899,0.5166,0.540635,0.5166,0.51499
3,2.0228,2.10104,0.535,0.545693,0.535,0.525999
4,1.9713,2.14369,0.523,0.548544,0.523,0.517989
5,1.941,2.081991,0.5376,0.550703,0.5376,0.531227
6,1.9085,2.036007,0.5492,0.553302,0.5492,0.543678
7,1.8877,2.02247,0.5521,0.553958,0.5521,0.547003
8,1.8685,2.066654,0.5407,0.544862,0.5407,0.533734
9,1.8496,2.067029,0.5427,0.549814,0.5427,0.537436
10,1.8388,2.057535,0.541,0.54822,0.541,0.537544


[I 2025-03-27 13:59:06,983] Trial 35 finished with value: 0.5375435381500755 and parameters: {'learning_rate': 0.0012466319680832965, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.0, 'temperature': 5.0}. Best is trial 30 with value: 0.5383544360502914.


Trial 36 with params: {'learning_rate': 0.0013063555693550205, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6127,2.250047,0.4998,0.540851,0.4998,0.489304
2,2.1077,2.153452,0.518,0.543338,0.518,0.516792
3,2.0188,2.099422,0.5344,0.546164,0.5344,0.525393
4,1.9688,2.14326,0.5218,0.547747,0.5218,0.516872
5,1.9394,2.081872,0.536,0.549149,0.536,0.529514
6,1.907,2.035112,0.5494,0.553908,0.5494,0.543917
7,1.886,2.021811,0.5517,0.553716,0.5517,0.546686
8,1.8667,2.066069,0.54,0.544443,0.54,0.533094
9,1.8474,2.066357,0.5429,0.549734,0.5429,0.537639
10,1.8363,2.056589,0.5409,0.548184,0.5409,0.53758


[I 2025-03-27 14:06:59,191] Trial 36 finished with value: 0.5375804700344253 and parameters: {'learning_rate': 0.0013063555693550205, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 30 with value: 0.5383544360502914.


Trial 37 with params: {'learning_rate': 0.000699572064359336, 'weight_decay': 0.005, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8542,2.367989,0.4797,0.520217,0.4797,0.464236
2,2.1908,2.201085,0.5158,0.532963,0.5158,0.512599
3,2.0695,2.130039,0.5291,0.534827,0.5291,0.519637
4,2.0091,2.154361,0.5216,0.539856,0.5216,0.51582
5,1.9742,2.094466,0.5351,0.546068,0.5351,0.527562
6,1.9422,2.056584,0.5466,0.547843,0.5466,0.540255
7,1.9225,2.040492,0.5466,0.547166,0.5466,0.540995
8,1.9063,2.079714,0.5366,0.538773,0.5366,0.5295
9,1.8915,2.082016,0.5382,0.545948,0.5382,0.532993
10,1.8849,2.075456,0.5379,0.54541,0.5379,0.534256


[I 2025-03-27 14:14:48,767] Trial 37 finished with value: 0.5342557497442837 and parameters: {'learning_rate': 0.000699572064359336, 'weight_decay': 0.005, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 30 with value: 0.5383544360502914.


Trial 38 with params: {'learning_rate': 0.00015181932061058664, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6287,3.187971,0.365,0.433129,0.365,0.348934
2,2.8818,2.703648,0.4512,0.47148,0.4512,0.442201
3,2.5697,2.513942,0.477,0.4822,0.477,0.462286
4,2.4129,2.449889,0.4738,0.494645,0.4738,0.462821
5,2.3252,2.361223,0.4911,0.50273,0.4911,0.477492
6,2.2666,2.3071,0.5088,0.510115,0.5088,0.498529


[I 2025-03-27 14:19:35,119] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.001395039612162253, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6281,2.246024,0.5002,0.541754,0.5002,0.490734
2,2.1064,2.152751,0.5185,0.545964,0.5185,0.517318
3,2.0188,2.100283,0.5347,0.547337,0.5347,0.525657
4,1.969,2.144529,0.522,0.547949,0.522,0.517077
5,1.9396,2.083528,0.5356,0.549959,0.5356,0.529257
6,1.9068,2.035248,0.5494,0.554677,0.5494,0.544145
7,1.8854,2.021935,0.5514,0.554079,0.5514,0.54657
8,1.8654,2.066167,0.5409,0.545848,0.5409,0.534079
9,1.8453,2.066396,0.5431,0.550392,0.5431,0.537918
10,1.8334,2.055857,0.5419,0.549017,0.5419,0.538538


[I 2025-03-27 14:27:29,108] Trial 39 finished with value: 0.5385378079463762 and parameters: {'learning_rate': 0.001395039612162253, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 40 with params: {'learning_rate': 0.0007007469718800717, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8617,2.368876,0.4797,0.519554,0.4797,0.464083
2,2.1913,2.201184,0.5152,0.532238,0.5152,0.511983
3,2.0696,2.130044,0.5292,0.534968,0.5292,0.519713
4,2.0091,2.154334,0.5215,0.539655,0.5215,0.515662
5,1.9741,2.094433,0.5352,0.546365,0.5352,0.527804
6,1.9421,2.05653,0.5467,0.548004,0.5467,0.540337
7,1.9224,2.040437,0.5468,0.547248,0.5468,0.541159
8,1.9061,2.079675,0.5368,0.539019,0.5368,0.52971
9,1.8914,2.081961,0.5381,0.545969,0.5381,0.532942
10,1.8847,2.075388,0.5379,0.545403,0.5379,0.534302


[I 2025-03-27 14:35:25,846] Trial 40 finished with value: 0.5343019033798392 and parameters: {'learning_rate': 0.0007007469718800717, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 41 with params: {'learning_rate': 0.0030716374169260418, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5337,2.215488,0.5108,0.549836,0.5108,0.504525
2,2.1494,2.186275,0.5112,0.550998,0.5112,0.511447
3,2.0803,2.159573,0.5308,0.557913,0.5308,0.524371
4,2.031,2.189165,0.5144,0.541522,0.5144,0.509249
5,1.999,2.121963,0.53,0.554392,0.53,0.524157
6,1.957,2.067051,0.5419,0.553047,0.5419,0.535563


[I 2025-03-27 14:40:09,562] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0028940667013327127, 'weight_decay': 0.0, 'warmup_steps': 24, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5544,2.213849,0.5089,0.546164,0.5089,0.501828
2,2.1414,2.184127,0.5109,0.55145,0.5109,0.51124
3,2.071,2.150992,0.534,0.558399,0.534,0.527057
4,2.0221,2.179679,0.5163,0.541178,0.5163,0.510864
5,1.9905,2.117229,0.5309,0.555424,0.5309,0.525175
6,1.9498,2.06384,0.542,0.553411,0.542,0.535908


[I 2025-03-27 14:44:53,234] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.003261451852080609, 'weight_decay': 0.0, 'warmup_steps': 12, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5292,2.220157,0.5107,0.549934,0.5107,0.50415
2,2.1597,2.1891,0.5116,0.551108,0.5116,0.51144
3,2.0913,2.170159,0.5266,0.555617,0.5266,0.520388


[I 2025-03-27 14:47:15,423] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.00412004751998912, 'weight_decay': 0.002, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5477,2.261839,0.5062,0.546993,0.5062,0.499741
2,2.2136,2.219682,0.5111,0.547719,0.5111,0.508329
3,2.1446,2.229371,0.5209,0.562591,0.5209,0.513958
4,2.0924,2.257451,0.508,0.548731,0.508,0.503443
5,2.0559,2.159248,0.5243,0.552062,0.5243,0.517453
6,2.0047,2.087632,0.5399,0.551787,0.5399,0.532178


[I 2025-03-27 14:51:58,453] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0005141318542230131, 'weight_decay': 0.0, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0219,2.477242,0.4644,0.504335,0.4644,0.448655
2,2.2701,2.24826,0.5076,0.524327,0.5076,0.503976
3,2.1221,2.164917,0.5222,0.528082,0.5222,0.511997


[I 2025-03-27 14:54:20,476] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00033681469935903786, 'weight_decay': 0.005, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2412,2.682047,0.439,0.488629,0.439,0.423309
2,2.4254,2.348122,0.494,0.510062,0.494,0.488585
3,2.2266,2.239077,0.5122,0.517748,0.5122,0.500417
4,2.1357,2.234347,0.5082,0.523746,0.5082,0.500137
5,2.0859,2.167344,0.5227,0.533181,0.5227,0.513026
6,2.048,2.128221,0.5349,0.535974,0.5349,0.527741


[I 2025-03-27 14:59:03,883] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0009778054448887782, 'weight_decay': 0.001, 'warmup_steps': 19, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7468,2.294669,0.4912,0.531451,0.4912,0.477956
2,2.138,2.168957,0.5182,0.537738,0.5182,0.515909
3,2.0346,2.107861,0.5313,0.539746,0.5313,0.522208
4,1.9806,2.143166,0.5216,0.542971,0.5216,0.516077
5,1.9488,2.082541,0.5373,0.549191,0.5373,0.530581
6,1.9171,2.040991,0.5478,0.550267,0.5478,0.541869


[I 2025-03-27 15:03:46,141] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0005792970945876458, 'weight_decay': 0.002, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9521,2.429878,0.4705,0.509659,0.4705,0.454941
2,2.2356,2.227643,0.5106,0.527725,0.5106,0.507223
3,2.0994,2.149647,0.5258,0.531612,0.5258,0.515884


[I 2025-03-27 15:06:06,848] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0009472559228590378, 'weight_decay': 0.002, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7439,2.298588,0.4899,0.52942,0.4899,0.476435
2,2.1409,2.171004,0.5188,0.537824,0.5188,0.516474
3,2.0367,2.109191,0.5307,0.53837,0.5307,0.521445
4,1.9823,2.143484,0.522,0.543165,0.522,0.516484
5,1.9504,2.08306,0.537,0.548546,0.537,0.530303
6,1.9187,2.041993,0.5478,0.55081,0.5478,0.541988
7,1.8989,2.027384,0.5513,0.552209,0.5513,0.546019
8,1.8816,2.06991,0.5396,0.542729,0.5396,0.532537
9,1.8651,2.071358,0.5419,0.549376,0.5419,0.536776
10,1.8566,2.063737,0.5396,0.546718,0.5396,0.536059


[I 2025-03-27 15:13:57,194] Trial 49 finished with value: 0.5360594242589777 and parameters: {'learning_rate': 0.0009472559228590378, 'weight_decay': 0.002, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 50 with params: {'learning_rate': 0.0021133792752108674, 'weight_decay': 0.005, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.552,2.215346,0.5038,0.547526,0.5038,0.498136
2,2.108,2.167091,0.5143,0.55243,0.5143,0.514159
3,2.0326,2.118872,0.5318,0.548394,0.5318,0.522876
4,1.9852,2.149988,0.522,0.544831,0.522,0.516159
5,1.9558,2.098708,0.5316,0.550841,0.5316,0.525654
6,1.9198,2.045967,0.5475,0.558104,0.5475,0.542521
7,1.894,2.029257,0.549,0.554173,0.549,0.544443
8,1.8687,2.068927,0.5391,0.545342,0.5391,0.532462
9,1.8423,2.071489,0.5408,0.550259,0.5408,0.53579
10,1.8244,2.054414,0.5415,0.54831,0.5415,0.538124


[I 2025-03-27 15:21:53,742] Trial 50 finished with value: 0.5381244948548762 and parameters: {'learning_rate': 0.0021133792752108674, 'weight_decay': 0.005, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 6.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 51 with params: {'learning_rate': 0.0034522375504316056, 'weight_decay': 0.006, 'warmup_steps': 21, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5515,2.229244,0.51,0.551055,0.51,0.503483
2,2.1724,2.193207,0.5116,0.548396,0.5116,0.510179
3,2.1037,2.182754,0.5252,0.558351,0.5252,0.518994


[I 2025-03-27 15:24:17,320] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0018515352593065028, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.557,2.219958,0.5038,0.54446,0.5038,0.497445
2,2.102,2.15695,0.5188,0.552154,0.5188,0.518156
3,2.0237,2.110245,0.5332,0.546995,0.5332,0.523596
4,1.9759,2.147152,0.5221,0.547784,0.5221,0.516839
5,1.9471,2.092813,0.5322,0.549517,0.5322,0.526231
6,1.9124,2.040413,0.5491,0.558249,0.5491,0.544445
7,1.8884,2.025219,0.5498,0.553798,0.5498,0.545183
8,1.8651,2.067402,0.5408,0.546218,0.5408,0.534023
9,1.8411,2.068717,0.5421,0.550519,0.5421,0.536966
10,1.8255,2.05416,0.5418,0.548372,0.5418,0.538382


[I 2025-03-27 15:32:12,901] Trial 52 finished with value: 0.5383819461093703 and parameters: {'learning_rate': 0.0018515352593065028, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 53 with params: {'learning_rate': 0.0009725034744865333, 'weight_decay': 0.004, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7289,2.293145,0.4919,0.531968,0.4919,0.478775
2,2.1371,2.1688,0.5186,0.538288,0.5186,0.516374
3,2.0344,2.107827,0.5313,0.539453,0.5313,0.522196
4,1.9805,2.143052,0.5219,0.543268,0.5219,0.516406
5,1.9489,2.082482,0.5369,0.548974,0.5369,0.530226
6,1.9172,2.041056,0.5476,0.550281,0.5476,0.541711


[I 2025-03-27 15:36:59,226] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.001455043596908556, 'weight_decay': 0.002, 'warmup_steps': 11, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.607,2.240668,0.5016,0.541936,0.5016,0.492472
2,2.1039,2.151781,0.5192,0.547501,0.5192,0.51832
3,2.0182,2.100821,0.5339,0.546787,0.5339,0.524596
4,1.9688,2.144961,0.5221,0.549228,0.5221,0.517416
5,1.9397,2.084491,0.5345,0.54961,0.5345,0.528232
6,1.9068,2.035353,0.5488,0.554514,0.5488,0.543721
7,1.885,2.021991,0.5511,0.554237,0.5511,0.546438
8,1.8647,2.06614,0.5413,0.546285,0.5413,0.53446
9,1.8441,2.066379,0.5434,0.550749,0.5434,0.538211
10,1.8317,2.055386,0.5416,0.548438,0.5416,0.538136


[I 2025-03-27 15:44:58,214] Trial 54 finished with value: 0.5381361215133965 and parameters: {'learning_rate': 0.001455043596908556, 'weight_decay': 0.002, 'warmup_steps': 11, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 55 with params: {'learning_rate': 0.0013311870865774802, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6485,2.251916,0.4997,0.543261,0.4997,0.489623
2,2.1094,2.15399,0.5184,0.544537,0.5184,0.517172
3,2.0197,2.100113,0.5346,0.546579,0.5346,0.525653
4,1.9694,2.144018,0.5217,0.547601,0.5217,0.516687
5,1.9398,2.082634,0.5366,0.550062,0.5366,0.530226
6,1.9072,2.03535,0.5491,0.55359,0.5491,0.543524
7,1.886,2.021991,0.5519,0.55424,0.5519,0.546998
8,1.8664,2.066279,0.5409,0.545536,0.5409,0.534022
9,1.8469,2.06652,0.5428,0.549803,0.5428,0.537595
10,1.8355,2.056454,0.5414,0.54863,0.5414,0.538044


[I 2025-03-27 15:52:52,064] Trial 55 finished with value: 0.5380437095747704 and parameters: {'learning_rate': 0.0013311870865774802, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 56 with params: {'learning_rate': 0.0007745909440547175, 'weight_decay': 0.003, 'warmup_steps': 22, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8446,2.346017,0.4809,0.519412,0.4809,0.465727
2,2.1744,2.190587,0.5171,0.534563,0.5171,0.514147
3,2.0577,2.122218,0.531,0.537448,0.531,0.52156
4,1.9992,2.149709,0.5235,0.542053,0.5235,0.517647
5,1.9651,2.08977,0.5361,0.547661,0.5361,0.528898
6,1.9332,2.05095,0.5469,0.548681,0.5469,0.540699
7,1.9135,2.035346,0.5488,0.549426,0.5488,0.54318
8,1.897,2.075745,0.5379,0.540136,0.5379,0.530762
9,1.8817,2.077852,0.5394,0.546904,0.5394,0.534216
10,1.8745,2.070978,0.5391,0.546598,0.5391,0.535482


[I 2025-03-27 16:00:51,408] Trial 56 finished with value: 0.5354817426487339 and parameters: {'learning_rate': 0.0007745909440547175, 'weight_decay': 0.003, 'warmup_steps': 22, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 57 with params: {'learning_rate': 0.003128218999614511, 'weight_decay': 0.002, 'warmup_steps': 10, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5231,2.215801,0.5109,0.549607,0.5109,0.504467
2,2.1518,2.186961,0.5112,0.550777,0.5112,0.511215
3,2.0832,2.162303,0.5304,0.557283,0.5304,0.523938
4,2.0337,2.192102,0.5141,0.541942,0.5141,0.509014
5,2.0016,2.123472,0.5295,0.554259,0.5295,0.523798
6,1.9592,2.067966,0.5411,0.55196,0.5411,0.534649


[I 2025-03-27 16:05:32,250] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.002966222074345279, 'weight_decay': 0.007, 'warmup_steps': 6, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5136,2.211731,0.5093,0.546669,0.5093,0.5025
2,2.1424,2.184387,0.5106,0.550726,0.5106,0.511086
3,2.0734,2.153528,0.5332,0.558874,0.5332,0.526686
4,2.0247,2.182369,0.5157,0.541038,0.5157,0.510213
5,1.9931,2.118509,0.5306,0.554942,0.5306,0.52473
6,1.9521,2.064873,0.5417,0.553194,0.5417,0.535591


[I 2025-03-27 16:10:16,362] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0013422368749234546, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.638,2.250192,0.5,0.542226,0.5,0.490117
2,2.1084,2.153606,0.5183,0.544813,0.5183,0.517067
3,2.0193,2.100028,0.5346,0.546959,0.5346,0.525736
4,1.9692,2.144032,0.5214,0.547269,0.5214,0.516357
5,1.9396,2.082701,0.5365,0.550016,0.5365,0.530157
6,1.907,2.035269,0.549,0.553722,0.549,0.543504
7,1.8858,2.021935,0.5518,0.554123,0.5518,0.546875
8,1.8662,2.066234,0.5409,0.545576,0.5409,0.534046
9,1.8466,2.066473,0.543,0.550105,0.543,0.537833
10,1.8351,2.056326,0.5418,0.548953,0.5418,0.538436


[I 2025-03-27 16:18:24,570] Trial 59 finished with value: 0.5384356283962517 and parameters: {'learning_rate': 0.0013422368749234546, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.5, 'temperature': 5.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 60 with params: {'learning_rate': 0.0005835626570272373, 'weight_decay': 0.003, 'warmup_steps': 6, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9272,2.423041,0.4717,0.512271,0.4717,0.45641
2,2.2314,2.225537,0.5105,0.527848,0.5105,0.507145
3,2.0972,2.148309,0.5255,0.531164,0.5255,0.515545


[I 2025-03-27 16:20:51,854] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0025254122882335515, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5322,2.211244,0.5071,0.547762,0.5071,0.501215
2,2.1222,2.178087,0.5099,0.549849,0.5099,0.510198
3,2.0504,2.134192,0.5313,0.553082,0.5313,0.52325
4,2.0029,2.161561,0.5194,0.540829,0.5194,0.513327
5,1.9725,2.107751,0.5325,0.555228,0.5325,0.527093
6,1.9343,2.055532,0.546,0.557796,0.546,0.540909
7,1.9055,2.037357,0.5476,0.554122,0.5476,0.543137
8,1.8769,2.071842,0.5391,0.546151,0.5391,0.532655
9,1.8466,2.076257,0.5406,0.550543,0.5406,0.535394
10,1.8252,2.055577,0.5405,0.546963,0.5405,0.537198


[I 2025-03-27 16:29:00,950] Trial 61 finished with value: 0.5371979983376367 and parameters: {'learning_rate': 0.0025254122882335515, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 62 with params: {'learning_rate': 0.0010359558555573614, 'weight_decay': 0.007, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7155,2.283613,0.4944,0.535221,0.4944,0.481658
2,2.1302,2.164674,0.5192,0.539701,0.5192,0.517107
3,2.0301,2.105232,0.5324,0.54119,0.5324,0.523339
4,1.9771,2.142542,0.5211,0.543685,0.5211,0.51566
5,1.9459,2.081637,0.5368,0.549327,0.5368,0.530236
6,1.9141,2.039191,0.5481,0.551058,0.5481,0.542331
7,1.894,2.025025,0.5516,0.552566,0.5516,0.546247
8,1.8763,2.068307,0.5395,0.542705,0.5395,0.532374
9,1.8591,2.069366,0.5431,0.55068,0.5431,0.538004
10,1.85,2.061262,0.5396,0.546582,0.5396,0.536024


[I 2025-03-27 16:36:55,891] Trial 62 finished with value: 0.5360241189291747 and parameters: {'learning_rate': 0.0010359558555573614, 'weight_decay': 0.007, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 6.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 63 with params: {'learning_rate': 0.0041957438721127805, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5749,2.268958,0.5046,0.546521,0.5046,0.498153
2,2.2208,2.225373,0.5113,0.548497,0.5113,0.508353
3,2.1507,2.236298,0.518,0.560169,0.518,0.510697
4,2.0983,2.262755,0.5072,0.549123,0.5072,0.502936
5,2.0611,2.162977,0.5239,0.552824,0.5239,0.517222
6,2.0092,2.089894,0.5384,0.550887,0.5384,0.53079


[I 2025-03-27 16:41:49,297] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.00043221244686844786, 'weight_decay': 0.0, 'warmup_steps': 12, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0957,2.548743,0.4551,0.498643,0.4551,0.43915
2,2.3242,2.282446,0.5013,0.517737,0.5013,0.496963
3,2.159,2.190694,0.5192,0.52479,0.5192,0.50843


[I 2025-03-27 16:44:14,677] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0011130250756642862, 'weight_decay': 0.007, 'warmup_steps': 18, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7006,2.27398,0.4947,0.537034,0.4947,0.482479
2,2.1233,2.160786,0.5189,0.54066,0.5189,0.516904
3,2.0262,2.102858,0.5329,0.542259,0.5329,0.52389
4,1.974,2.142481,0.522,0.545385,0.522,0.516812
5,1.9432,2.081165,0.5364,0.548702,0.5364,0.529856
6,1.9112,2.037492,0.549,0.552452,0.549,0.54331
7,1.891,2.02364,0.552,0.553446,0.552,0.546737
8,1.8727,2.067383,0.5402,0.544121,0.5402,0.533267
9,1.8549,2.06814,0.5424,0.549886,0.5424,0.537223
10,1.8452,2.059582,0.5407,0.547404,0.5407,0.537073


[I 2025-03-27 16:52:16,140] Trial 65 finished with value: 0.5370734404818664 and parameters: {'learning_rate': 0.0011130250756642862, 'weight_decay': 0.007, 'warmup_steps': 18, 'lambda_param': 0.5, 'temperature': 5.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 66 with params: {'learning_rate': 0.000922776490657557, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7451,2.30244,0.489,0.528101,0.489,0.475308
2,2.1438,2.172863,0.5187,0.537558,0.5187,0.516371
3,2.0386,2.110448,0.5312,0.539125,0.5312,0.521997
4,1.9839,2.143861,0.5217,0.542306,0.5217,0.516053
5,1.9518,2.083577,0.5374,0.549062,0.5374,0.530611
6,1.9202,2.04289,0.5479,0.550561,0.5479,0.541981


[I 2025-03-27 16:57:01,312] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0010031894839810209, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7265,2.288839,0.4919,0.532331,0.4919,0.479085
2,2.1339,2.166804,0.5191,0.538879,0.5191,0.516815
3,2.0323,2.106564,0.5321,0.540817,0.5321,0.523089
4,1.9789,2.142776,0.5212,0.543091,0.5212,0.515658
5,1.9474,2.082032,0.5366,0.54878,0.5366,0.53004
6,1.9156,2.040117,0.5477,0.550282,0.5477,0.541821
7,1.8957,2.025787,0.5516,0.552523,0.5516,0.546221
8,1.8781,2.068823,0.5394,0.542684,0.5394,0.532341
9,1.8611,2.070009,0.5429,0.550718,0.5429,0.537863
10,1.8523,2.062107,0.5396,0.546346,0.5396,0.535976


[I 2025-03-27 17:04:58,905] Trial 67 finished with value: 0.5359761530203321 and parameters: {'learning_rate': 0.0010031894839810209, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 68 with params: {'learning_rate': 0.001746626572556993, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5881,2.225675,0.503,0.543865,0.503,0.496153
2,2.1026,2.154563,0.5194,0.551127,0.5194,0.51847
3,2.022,2.107642,0.5333,0.546977,0.5333,0.523614
4,1.9736,2.146888,0.5225,0.549705,0.5225,0.517579
5,1.9447,2.090721,0.534,0.550434,0.534,0.527878
6,1.9104,2.038741,0.5487,0.557281,0.5487,0.544047
7,1.887,2.024128,0.5507,0.554556,0.5507,0.546147
8,1.8645,2.067031,0.5406,0.546237,0.5406,0.533835
9,1.8414,2.067961,0.5425,0.550718,0.5425,0.537429
10,1.8266,2.054316,0.5417,0.548504,0.5417,0.538318


[I 2025-03-27 17:13:04,577] Trial 68 finished with value: 0.5383184466062507 and parameters: {'learning_rate': 0.001746626572556993, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 69 with params: {'learning_rate': 0.0015177679588309065, 'weight_decay': 0.003, 'warmup_steps': 16, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6109,2.237681,0.5025,0.542654,0.5025,0.494028
2,2.1036,2.151853,0.5192,0.548075,0.5192,0.51832
3,2.0187,2.102057,0.5339,0.54692,0.5339,0.524588
4,1.9695,2.145534,0.5219,0.549276,0.5219,0.517244
5,1.9405,2.085789,0.5351,0.550398,0.5351,0.528911
6,1.9072,2.035859,0.5493,0.555736,0.5493,0.544292
7,1.8851,2.022281,0.5505,0.553695,0.5505,0.545789
8,1.8643,2.066291,0.5414,0.546478,0.5414,0.53459
9,1.8432,2.066596,0.5424,0.55018,0.5424,0.537298
10,1.8303,2.055038,0.5419,0.548843,0.5419,0.538433


[I 2025-03-27 17:21:08,046] Trial 69 finished with value: 0.5384326754471498 and parameters: {'learning_rate': 0.0015177679588309065, 'weight_decay': 0.003, 'warmup_steps': 16, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 70 with params: {'learning_rate': 0.0008363023184333678, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7937,2.324552,0.4848,0.52284,0.4848,0.469873
2,2.1594,2.182085,0.5193,0.537363,0.5193,0.516749
3,2.0484,2.116509,0.5307,0.537367,0.5307,0.521209


[I 2025-03-27 17:23:28,437] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0021304336698284, 'weight_decay': 0.004, 'warmup_steps': 11, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5408,2.214306,0.5048,0.549076,0.5048,0.499045
2,2.1078,2.167421,0.5142,0.55224,0.5142,0.514001
3,2.0328,2.119295,0.5316,0.547906,0.5316,0.522601
4,1.9856,2.150198,0.5217,0.544411,0.5217,0.515818
5,1.9563,2.098954,0.5314,0.550459,0.5314,0.525429
6,1.9202,2.046271,0.5476,0.558617,0.5476,0.542724
7,1.8943,2.029488,0.5491,0.554397,0.5491,0.544593
8,1.8689,2.069,0.5394,0.545578,0.5394,0.532703
9,1.8423,2.071625,0.541,0.550649,0.541,0.53607
10,1.8243,2.054403,0.5414,0.548294,0.5414,0.538062


[I 2025-03-27 17:31:28,253] Trial 71 finished with value: 0.5380623261682448 and parameters: {'learning_rate': 0.0021304336698284, 'weight_decay': 0.004, 'warmup_steps': 11, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 72 with params: {'learning_rate': 0.0009039750229095922, 'weight_decay': 0.004, 'warmup_steps': 19, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7753,2.30948,0.4877,0.526924,0.4877,0.473796
2,2.1486,2.175273,0.5183,0.536956,0.5183,0.515805
3,2.0412,2.111935,0.5301,0.537672,0.5301,0.520796


[I 2025-03-27 17:33:54,017] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.004019650849577295, 'weight_decay': 0.0, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5597,2.256979,0.5066,0.546431,0.5066,0.500021
2,2.2081,2.214948,0.5111,0.54712,0.5111,0.508371
3,2.1389,2.22282,0.5224,0.562301,0.5224,0.515381
4,2.0869,2.252093,0.5089,0.54896,0.5089,0.504457
5,2.0509,2.155663,0.5231,0.549915,0.5231,0.516145
6,2.0005,2.085457,0.5409,0.552033,0.5409,0.533162


[I 2025-03-27 17:38:46,981] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 6.24006692401181e-05, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9719,3.715448,0.2277,0.283019,0.2277,0.216476
2,3.4925,3.315663,0.3515,0.391005,0.3515,0.347664
3,3.1816,3.082042,0.3999,0.418385,0.3999,0.384804


[I 2025-03-27 17:41:12,892] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.003408621162658522, 'weight_decay': 0.004, 'warmup_steps': 20, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5486,2.227304,0.5102,0.55117,0.5102,0.503717
2,2.1696,2.192222,0.5112,0.547866,0.5112,0.50978
3,2.101,2.179915,0.5247,0.556421,0.5247,0.518465
4,2.0504,2.211179,0.5125,0.543601,0.5125,0.507532
5,2.0171,2.133095,0.5273,0.553225,0.5273,0.521321
6,1.9722,2.073154,0.5408,0.550633,0.5408,0.533581


[I 2025-03-27 17:45:57,084] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.0025512711636125176, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5084,2.209457,0.5075,0.547182,0.5075,0.501638
2,2.1217,2.177785,0.5106,0.550438,0.5106,0.510925
3,2.0507,2.134627,0.5311,0.553133,0.5311,0.523114
4,2.0034,2.161972,0.5192,0.540743,0.5192,0.51322
5,1.9732,2.107977,0.5319,0.554395,0.5319,0.526453
6,1.9349,2.055828,0.5453,0.556884,0.5453,0.540157


[I 2025-03-27 17:50:37,616] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0020606360072553545, 'weight_decay': 0.004, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5396,2.214919,0.5042,0.548239,0.5042,0.498472
2,2.1057,2.164647,0.5149,0.551276,0.5149,0.514255
3,2.0301,2.116779,0.5326,0.54853,0.5326,0.523556
4,1.9828,2.149033,0.5211,0.544063,0.5211,0.515164
5,1.9537,2.097406,0.5317,0.550036,0.5317,0.52568
6,1.918,2.044645,0.5479,0.558013,0.5479,0.542953
7,1.8926,2.028252,0.5492,0.554066,0.5492,0.544657
8,1.8677,2.068588,0.5394,0.545575,0.5394,0.532789
9,1.8418,2.070806,0.5413,0.550514,0.5413,0.53633
10,1.8245,2.054306,0.5415,0.54833,0.5415,0.538153


[I 2025-03-27 17:58:47,762] Trial 77 finished with value: 0.5381530753019279 and parameters: {'learning_rate': 0.0020606360072553545, 'weight_decay': 0.004, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 78 with params: {'learning_rate': 0.0017436404617602068, 'weight_decay': 0.002, 'warmup_steps': 20, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5939,2.226268,0.5029,0.543912,0.5029,0.496138
2,2.103,2.154631,0.5194,0.551348,0.5194,0.518518
3,2.0221,2.107661,0.5331,0.546712,0.5331,0.523374
4,1.9736,2.146895,0.5224,0.549607,0.5224,0.517441
5,1.9447,2.09073,0.534,0.55022,0.534,0.527821
6,1.9104,2.038755,0.5487,0.557208,0.5487,0.544009
7,1.887,2.024126,0.5509,0.554742,0.5509,0.546333
8,1.8645,2.067039,0.5408,0.546389,0.5408,0.534021
9,1.8414,2.067966,0.5424,0.550683,0.5424,0.537347
10,1.8266,2.054328,0.5416,0.548425,0.5416,0.538223


[I 2025-03-27 18:06:57,634] Trial 78 finished with value: 0.5382230603310273 and parameters: {'learning_rate': 0.0017436404617602068, 'weight_decay': 0.002, 'warmup_steps': 20, 'lambda_param': 0.2, 'temperature': 6.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 79 with params: {'learning_rate': 0.0011896958688755504, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6916,2.266347,0.4966,0.538221,0.4966,0.485013
2,2.1182,2.157996,0.5169,0.539959,0.5169,0.515202
3,2.0235,2.101353,0.5345,0.544455,0.5345,0.525568
4,1.9719,2.142965,0.5227,0.547152,0.5227,0.517493
5,1.9415,2.081384,0.5365,0.549114,0.5365,0.529951
6,1.9093,2.036391,0.5485,0.551966,0.5485,0.542827
7,1.8887,2.02277,0.5514,0.552976,0.5514,0.546244
8,1.87,2.066832,0.5409,0.545077,0.5409,0.534027
9,1.8515,2.067348,0.5431,0.550299,0.5431,0.537873
10,1.8412,2.058273,0.5405,0.547351,0.5405,0.536894


[I 2025-03-27 18:14:52,763] Trial 79 finished with value: 0.5368942032088826 and parameters: {'learning_rate': 0.0011896958688755504, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 80 with params: {'learning_rate': 0.003784916146058649, 'weight_decay': 0.004, 'warmup_steps': 22, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5608,2.244895,0.5086,0.549681,0.5086,0.502219
2,2.1933,2.203932,0.511,0.54643,0.511,0.508425
3,2.1245,2.205938,0.5229,0.559371,0.5229,0.516274
4,2.0729,2.237321,0.5093,0.545366,0.5093,0.504555
5,2.038,2.146836,0.5245,0.551583,0.5245,0.518162
6,1.9897,2.080353,0.5409,0.550976,0.5409,0.533354


[I 2025-03-27 18:19:39,359] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.004661118798123189, 'weight_decay': 0.003, 'warmup_steps': 12, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5725,2.300618,0.4992,0.546752,0.4992,0.494096
2,2.2515,2.252205,0.5085,0.547721,0.5085,0.50534
3,2.1797,2.267034,0.5113,0.560711,0.5113,0.504276


[I 2025-03-27 18:22:07,292] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.002075788206286668, 'weight_decay': 0.004, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5594,2.216295,0.5043,0.548122,0.5043,0.498488
2,2.1074,2.165885,0.5144,0.551815,0.5144,0.514024
3,2.0314,2.117685,0.5325,0.548084,0.5325,0.523443
4,1.9839,2.149487,0.5212,0.544176,0.5212,0.515306
5,1.9546,2.097946,0.5314,0.550297,0.5314,0.525437
6,1.9187,2.045173,0.5475,0.55739,0.5475,0.542522
7,1.8932,2.028645,0.5491,0.553919,0.5491,0.544495
8,1.8681,2.068739,0.5396,0.545934,0.5396,0.532999
9,1.842,2.071137,0.5413,0.550624,0.5413,0.536335
10,1.8245,2.054351,0.5418,0.548681,0.5418,0.538458


[I 2025-03-27 18:30:14,949] Trial 82 finished with value: 0.5384578659313053 and parameters: {'learning_rate': 0.002075788206286668, 'weight_decay': 0.004, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 83 with params: {'learning_rate': 0.0031118798247196724, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.543,2.217249,0.51,0.548807,0.51,0.503611
2,2.1523,2.187085,0.5107,0.550096,0.5107,0.510858
3,2.0831,2.162081,0.5301,0.557434,0.5301,0.523665
4,2.0335,2.191861,0.5144,0.542049,0.5144,0.509347
5,2.0013,2.123347,0.5293,0.553937,0.5293,0.52363
6,1.959,2.067854,0.5415,0.552589,0.5415,0.535118


[I 2025-03-27 18:34:59,780] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.002050093666485464, 'weight_decay': 0.006, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5557,2.216314,0.5042,0.547755,0.5042,0.498215
2,2.1065,2.164777,0.5151,0.551666,0.5151,0.514646
3,2.0303,2.116745,0.5329,0.548663,0.5329,0.523815
4,1.9828,2.149076,0.5214,0.544353,0.5214,0.515505
5,1.9536,2.09734,0.5311,0.549641,0.5311,0.525091
6,1.9179,2.044554,0.5476,0.55739,0.5476,0.542611
7,1.8925,2.028219,0.5492,0.554069,0.5492,0.544619
8,1.8677,2.068578,0.5399,0.546129,0.5399,0.533354
9,1.8419,2.070828,0.5414,0.550575,0.5414,0.536409
10,1.8245,2.054306,0.5417,0.548505,0.5417,0.538356


[I 2025-03-27 18:42:59,884] Trial 84 finished with value: 0.5383559455194667 and parameters: {'learning_rate': 0.002050093666485464, 'weight_decay': 0.006, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 85 with params: {'learning_rate': 0.0026357478409213785, 'weight_decay': 0.004, 'warmup_steps': 18, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5424,2.21167,0.5079,0.547716,0.5079,0.501474
2,2.1278,2.180121,0.5093,0.549861,0.5093,0.509774
3,2.0564,2.138961,0.5319,0.555021,0.5319,0.524257
4,2.0086,2.166505,0.5187,0.540506,0.5187,0.512878
5,1.9777,2.110451,0.5325,0.556153,0.5325,0.527059
6,1.9389,2.0582,0.5445,0.556277,0.5445,0.53918


[I 2025-03-27 18:47:50,702] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0013483014986645477, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6206,2.24811,0.4997,0.541294,0.4997,0.489905
2,2.107,2.153068,0.5188,0.545049,0.5188,0.517517
3,2.0187,2.099781,0.535,0.547464,0.535,0.526153
4,1.9688,2.143881,0.5217,0.547845,0.5217,0.5167
5,1.9394,2.082643,0.5366,0.55014,0.5366,0.530235
6,1.9068,2.035163,0.5495,0.55439,0.5495,0.544042
7,1.8856,2.021848,0.5515,0.553782,0.5515,0.546572
8,1.866,2.066144,0.5406,0.545347,0.5406,0.533768
9,1.8463,2.066386,0.543,0.550084,0.543,0.537795
10,1.8348,2.05622,0.5416,0.548831,0.5416,0.538251


[I 2025-03-27 18:55:54,561] Trial 86 finished with value: 0.538251253867042 and parameters: {'learning_rate': 0.0013483014986645477, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 87 with params: {'learning_rate': 0.0017671638238727714, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5622,2.222811,0.5044,0.544138,0.5044,0.497571
2,2.1011,2.154377,0.5194,0.551618,0.5194,0.518698
3,2.0216,2.107729,0.5332,0.547324,0.5332,0.523676
4,1.9736,2.146722,0.5218,0.548515,0.5218,0.516758
5,1.9448,2.090914,0.5338,0.550241,0.5338,0.527626
6,1.9106,2.038861,0.5487,0.557383,0.5487,0.544045
7,1.8871,2.024205,0.5503,0.554236,0.5503,0.545772
8,1.8644,2.067013,0.5403,0.545728,0.5403,0.533479
9,1.8412,2.067958,0.5421,0.550426,0.5421,0.537065
10,1.8263,2.054193,0.5418,0.548655,0.5418,0.53845


[I 2025-03-27 19:03:53,179] Trial 87 finished with value: 0.5384499252507436 and parameters: {'learning_rate': 0.0017671638238727714, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.2, 'temperature': 6.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 88 with params: {'learning_rate': 0.0014733336641652578, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5893,2.238105,0.5029,0.543266,0.5029,0.493878
2,2.1025,2.15128,0.5196,0.548162,0.5196,0.518767
3,2.0177,2.100778,0.534,0.547025,0.534,0.524778
4,1.9686,2.144876,0.5221,0.549474,0.5221,0.517478
5,1.9397,2.084626,0.5352,0.550397,0.5352,0.529017
6,1.9067,2.035323,0.5494,0.55516,0.5494,0.544348
7,1.8849,2.021936,0.5512,0.554431,0.5512,0.546576
8,1.8644,2.066052,0.5417,0.546743,0.5417,0.534808
9,1.8437,2.066324,0.5429,0.550356,0.5429,0.537708
10,1.8313,2.055204,0.5414,0.548353,0.5414,0.538013


[I 2025-03-27 19:12:22,157] Trial 88 finished with value: 0.5380125517089104 and parameters: {'learning_rate': 0.0014733336641652578, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 89 with params: {'learning_rate': 0.001046636234871508, 'weight_decay': 0.005, 'warmup_steps': 14, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7093,2.281699,0.4944,0.53581,0.4944,0.481846
2,2.1288,2.163955,0.5192,0.539587,0.5192,0.517065
3,2.0294,2.104792,0.5326,0.541659,0.5326,0.523601
4,1.9765,2.142448,0.5211,0.543902,0.5211,0.515664
5,1.9454,2.08149,0.5367,0.549111,0.5367,0.530147
6,1.9136,2.03891,0.5479,0.550953,0.5479,0.54215
7,1.8935,2.024788,0.552,0.553141,0.552,0.546644
8,1.8757,2.068135,0.5399,0.54338,0.5399,0.532887
9,1.8584,2.069159,0.5431,0.550693,0.5431,0.538008
10,1.8492,2.061007,0.5397,0.546723,0.5397,0.536138


[I 2025-03-27 19:20:33,180] Trial 89 finished with value: 0.5361377070226852 and parameters: {'learning_rate': 0.001046636234871508, 'weight_decay': 0.005, 'warmup_steps': 14, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 90 with params: {'learning_rate': 0.001577477710108579, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5839,2.232339,0.5035,0.543598,0.5035,0.495501
2,2.1015,2.151394,0.5198,0.548718,0.5198,0.5188
3,2.0185,2.102917,0.534,0.54704,0.534,0.524491
4,1.9698,2.145752,0.5227,0.550013,0.5227,0.518
5,1.941,2.086786,0.535,0.550512,0.535,0.528745
6,1.9076,2.036255,0.5499,0.556983,0.5499,0.545102
7,1.8852,2.022503,0.5507,0.554008,0.5507,0.546008
8,1.864,2.066352,0.5412,0.546705,0.5412,0.534452
9,1.8424,2.06676,0.5425,0.550445,0.5425,0.537388
10,1.8291,2.054724,0.5416,0.548758,0.5416,0.53816


[I 2025-03-27 19:28:28,327] Trial 90 finished with value: 0.5381596438867509 and parameters: {'learning_rate': 0.001577477710108579, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 4.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 91 with params: {'learning_rate': 0.0018335006959639471, 'weight_decay': 0.007, 'warmup_steps': 12, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5638,2.220956,0.5041,0.544571,0.5041,0.49758
2,2.1022,2.156549,0.5186,0.552136,0.5186,0.518084
3,2.0234,2.109817,0.5333,0.54706,0.5333,0.523709
4,1.9755,2.147136,0.5221,0.548168,0.5221,0.516901
5,1.9467,2.092484,0.5322,0.549044,0.5322,0.526084
6,1.9121,2.04012,0.5491,0.558078,0.5491,0.544453
7,1.8882,2.025018,0.5503,0.554294,0.5503,0.545764
8,1.865,2.067344,0.5404,0.545894,0.5404,0.533635
9,1.8411,2.068588,0.5421,0.550447,0.5421,0.536992
10,1.8257,2.054156,0.5419,0.548442,0.5419,0.538475


[I 2025-03-27 19:36:27,438] Trial 91 finished with value: 0.5384754624765731 and parameters: {'learning_rate': 0.0018335006959639471, 'weight_decay': 0.007, 'warmup_steps': 12, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 92 with params: {'learning_rate': 0.0031748464934260082, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5105,2.215739,0.5113,0.549929,0.5113,0.504875
2,2.1533,2.187245,0.5106,0.549795,0.5106,0.510462
3,2.0852,2.164282,0.53,0.558614,0.53,0.523789
4,2.0357,2.194287,0.5145,0.542207,0.5145,0.509235
5,2.0036,2.124489,0.5285,0.553312,0.5285,0.522774
6,1.9609,2.068602,0.5413,0.552389,0.5413,0.534873


[I 2025-03-27 19:41:13,446] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.004594476301666744, 'weight_decay': 0.007, 'warmup_steps': 11, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5679,2.295419,0.4997,0.546514,0.4997,0.49426
2,2.2467,2.248153,0.5082,0.54703,0.5082,0.505128
3,2.1752,2.262798,0.5121,0.561327,0.5121,0.504943


[I 2025-03-27 19:43:36,548] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0009418737283766865, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7516,2.300373,0.4894,0.529424,0.4894,0.475934
2,2.1422,2.171607,0.5187,0.537662,0.5187,0.516371
3,2.0373,2.109585,0.5304,0.538075,0.5304,0.521037
4,1.9828,2.143637,0.5222,0.543299,0.5222,0.516677
5,1.9508,2.083222,0.5369,0.548376,0.5369,0.530164
6,1.9191,2.042215,0.548,0.550917,0.548,0.542174
7,1.8993,2.027587,0.551,0.551874,0.551,0.545706
8,1.882,2.070059,0.5395,0.542492,0.5395,0.53241
9,1.8655,2.071517,0.5421,0.549614,0.5421,0.536981
10,1.8571,2.06392,0.5399,0.546997,0.5399,0.536362


[I 2025-03-27 19:51:31,638] Trial 94 finished with value: 0.5363618412450297 and parameters: {'learning_rate': 0.0009418737283766865, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 95 with params: {'learning_rate': 0.0015117711565467257, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6118,2.238071,0.5023,0.542445,0.5023,0.49375
2,2.1037,2.151859,0.5192,0.548011,0.5192,0.518366
3,2.0187,2.101949,0.5339,0.546833,0.5339,0.524551
4,1.9695,2.145499,0.5218,0.549129,0.5218,0.517151
5,1.9404,2.085694,0.535,0.550277,0.535,0.528812
6,1.9072,2.035821,0.5494,0.555644,0.5494,0.544375
7,1.8851,2.022252,0.5503,0.553389,0.5503,0.545556
8,1.8643,2.066276,0.5414,0.546515,0.5414,0.534607
9,1.8433,2.066609,0.5424,0.550104,0.5424,0.537261
10,1.8304,2.055071,0.5419,0.548828,0.5419,0.538423


[I 2025-03-27 19:59:26,592] Trial 95 finished with value: 0.5384232614911761 and parameters: {'learning_rate': 0.0015117711565467257, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 96 with params: {'learning_rate': 0.0017641364381392815, 'weight_decay': 0.008, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5918,2.22542,0.5035,0.544628,0.5035,0.496719
2,2.1031,2.155134,0.5187,0.550926,0.5187,0.51795
3,2.0225,2.108179,0.5334,0.547506,0.5334,0.523777
4,1.9741,2.146977,0.5223,0.54933,0.5223,0.517298
5,1.9452,2.091181,0.5333,0.549665,0.5333,0.527078
6,1.9108,2.039081,0.5484,0.557068,0.5484,0.543726
7,1.8873,2.024355,0.5506,0.554524,0.5506,0.54606
8,1.8646,2.067119,0.5401,0.545675,0.5401,0.533366
9,1.8413,2.068121,0.5424,0.550612,0.5424,0.537357
10,1.8264,2.054296,0.5419,0.548671,0.5419,0.538495


[I 2025-03-27 20:07:25,640] Trial 96 finished with value: 0.538495381478311 and parameters: {'learning_rate': 0.0017641364381392815, 'weight_decay': 0.008, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 5.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 97 with params: {'learning_rate': 0.0014912446416033504, 'weight_decay': 0.006, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6232,2.240199,0.503,0.542864,0.503,0.494126
2,2.1046,2.152113,0.5193,0.548122,0.5193,0.51845
3,2.0189,2.101729,0.534,0.54689,0.534,0.524661
4,1.9695,2.145463,0.5219,0.54909,0.5219,0.517194
5,1.9403,2.085386,0.5346,0.549641,0.5346,0.528361
6,1.9071,2.035733,0.5493,0.555258,0.5493,0.544163
7,1.8852,2.022198,0.5506,0.553649,0.5506,0.545868
8,1.8645,2.066298,0.5413,0.546371,0.5413,0.534495
9,1.8436,2.066579,0.5427,0.550332,0.5427,0.537603
10,1.8309,2.055203,0.542,0.548902,0.542,0.538506


[I 2025-03-27 20:15:20,673] Trial 97 finished with value: 0.538506402844418 and parameters: {'learning_rate': 0.0014912446416033504, 'weight_decay': 0.006, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 6.5}. Best is trial 39 with value: 0.5385378079463762.


Trial 98 with params: {'learning_rate': 0.0036040991507329157, 'weight_decay': 0.01, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5544,2.235831,0.5093,0.550461,0.5093,0.502967
2,2.1817,2.197302,0.5113,0.547496,0.5113,0.509424
3,2.1131,2.193048,0.5249,0.560369,0.5249,0.51872


[I 2025-03-27 20:17:44,141] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0017139720648368928, 'weight_decay': 0.008, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6298,2.230456,0.5023,0.542748,0.5023,0.495197
2,2.1053,2.154767,0.5194,0.550938,0.5194,0.5186
3,2.0227,2.107432,0.533,0.546357,0.533,0.523313
4,1.9737,2.14702,0.5226,0.550047,0.5226,0.517677
5,1.9445,2.090381,0.5338,0.550009,0.5338,0.527464
6,1.9103,2.038551,0.549,0.557554,0.549,0.544367
7,1.887,2.023995,0.5508,0.55445,0.5508,0.546226
8,1.8646,2.067053,0.5406,0.54631,0.5406,0.533836
9,1.8417,2.067892,0.5422,0.550522,0.5422,0.537199
10,1.8271,2.054465,0.5419,0.5487,0.5419,0.538479


[I 2025-03-27 20:25:37,616] Trial 99 finished with value: 0.5384789993989948 and parameters: {'learning_rate': 0.0017139720648368928, 'weight_decay': 0.008, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 39 with value: 0.5385378079463762.


Trial 100 with params: {'learning_rate': 0.0018015420304320652, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.618,2.226466,0.5028,0.544001,0.5028,0.496161
2,2.1055,2.15698,0.5175,0.550907,0.5175,0.517001
3,2.0244,2.109829,0.5331,0.547043,0.5331,0.523542
4,1.9758,2.147384,0.5223,0.548449,0.5223,0.516968
5,1.9466,2.092265,0.5329,0.549693,0.5329,0.52672
6,1.912,2.039984,0.5489,0.557768,0.5489,0.544329
7,1.8882,2.024937,0.5502,0.553957,0.5502,0.545575
8,1.8651,2.067401,0.5403,0.546088,0.5403,0.533615
9,1.8414,2.068558,0.542,0.550384,0.542,0.536856
10,1.8261,2.054311,0.5422,0.548862,0.5422,0.53882


[I 2025-03-27 20:33:26,767] Trial 100 finished with value: 0.5388203850987133 and parameters: {'learning_rate': 0.0018015420304320652, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 100 with value: 0.5388203850987133.


Trial 101 with params: {'learning_rate': 0.0005932056661118013, 'weight_decay': 0.009000000000000001, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9883,2.431536,0.469,0.50937,0.469,0.453274
2,2.2351,2.226012,0.5105,0.528276,0.5105,0.507371
3,2.0973,2.147814,0.5258,0.531609,0.5258,0.515934


[I 2025-03-27 20:35:48,001] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.001014481770627113, 'weight_decay': 0.007, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7514,2.290566,0.4928,0.534862,0.4928,0.479986
2,2.1349,2.166795,0.5192,0.539522,0.5192,0.517054
3,2.0326,2.106496,0.5318,0.540684,0.5318,0.52276
4,1.9788,2.14298,0.5209,0.543124,0.5209,0.515333
5,1.9472,2.082117,0.5371,0.549506,0.5371,0.530573
6,1.9153,2.039948,0.548,0.550836,0.548,0.542182
7,1.8953,2.025632,0.5518,0.552901,0.5518,0.546461
8,1.8776,2.068766,0.5396,0.542878,0.5396,0.532528
9,1.8605,2.069885,0.5427,0.550457,0.5427,0.537629
10,1.8515,2.061867,0.5399,0.546818,0.5399,0.536308


[I 2025-03-27 20:43:37,615] Trial 102 finished with value: 0.53630787763058 and parameters: {'learning_rate': 0.001014481770627113, 'weight_decay': 0.007, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 6.0}. Best is trial 100 with value: 0.5388203850987133.


Trial 103 with params: {'learning_rate': 0.0025686096516417518, 'weight_decay': 0.009000000000000001, 'warmup_steps': 28, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5686,2.213473,0.5069,0.54614,0.5069,0.500618
2,2.1266,2.179561,0.5097,0.549373,0.5097,0.510057
3,2.054,2.136666,0.5317,0.554178,0.5317,0.523871
4,2.006,2.164118,0.5191,0.54061,0.5191,0.513002
5,1.9752,2.109176,0.5325,0.556408,0.5325,0.527103
6,1.9367,2.056922,0.5454,0.556967,0.5454,0.540157


[I 2025-03-27 20:48:20,327] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.004047052334640676, 'weight_decay': 0.007, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5834,2.26133,0.5054,0.545433,0.5054,0.498635
2,2.2119,2.217595,0.5117,0.548184,0.5117,0.508933
3,2.142,2.225985,0.5221,0.562809,0.5221,0.51512
4,2.0897,2.25471,0.5086,0.548717,0.5086,0.503966
5,2.0533,2.157482,0.5231,0.549973,0.5231,0.51599
6,2.0025,2.086431,0.5402,0.551756,0.5402,0.53244


[I 2025-03-27 20:53:05,162] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.0010759316703475488, 'weight_decay': 0.007, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7489,2.28333,0.4932,0.53527,0.4932,0.480616
2,2.1298,2.163529,0.5183,0.53948,0.5183,0.51631
3,2.0294,2.104454,0.5329,0.542127,0.5329,0.523828
4,1.9762,2.142874,0.5221,0.545242,0.5221,0.516709
5,1.9449,2.08162,0.5368,0.549223,0.5368,0.530166
6,1.9129,2.038463,0.5481,0.55135,0.5481,0.542331


[I 2025-03-27 20:57:51,660] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.003668883915025111, 'weight_decay': 0.007, 'warmup_steps': 32, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5783,2.24106,0.5084,0.550316,0.5084,0.502153
2,2.1876,2.200026,0.512,0.547847,0.512,0.509954
3,2.1183,2.198545,0.5237,0.558848,0.5237,0.517334
4,2.0668,2.230101,0.51,0.544514,0.51,0.505323
5,2.0322,2.143089,0.525,0.550629,0.525,0.518432
6,1.9849,2.078196,0.5406,0.550458,0.5406,0.533071


[I 2025-03-27 21:02:38,189] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.001466052915780771, 'weight_decay': 0.007, 'warmup_steps': 24, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6412,2.243198,0.5027,0.543833,0.5027,0.49381
2,2.1062,2.15258,0.5191,0.547427,0.5191,0.518054
3,2.0194,2.101544,0.534,0.546884,0.534,0.524629
4,1.9697,2.1454,0.5219,0.548721,0.5219,0.517156
5,1.9403,2.085055,0.5343,0.549348,0.5343,0.528047
6,1.9072,2.03568,0.5488,0.554676,0.5488,0.543714
7,1.8853,2.022199,0.5512,0.55439,0.5512,0.546529
8,1.8648,2.066328,0.5415,0.546467,0.5415,0.534713
9,1.844,2.066556,0.5429,0.550305,0.5429,0.537709
10,1.8315,2.055388,0.5419,0.548877,0.5419,0.538443


[I 2025-03-27 21:10:33,998] Trial 107 finished with value: 0.5384426675208958 and parameters: {'learning_rate': 0.001466052915780771, 'weight_decay': 0.007, 'warmup_steps': 24, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 100 with value: 0.5388203850987133.


Trial 108 with params: {'learning_rate': 0.0032579391246546933, 'weight_decay': 0.009000000000000001, 'warmup_steps': 24, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5554,2.222495,0.5093,0.548306,0.5093,0.502621
2,2.1614,2.189499,0.5111,0.550385,0.5111,0.510856
3,2.0923,2.170871,0.5267,0.55599,0.5267,0.520393


[I 2025-03-27 21:12:57,308] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0006428653247261816, 'weight_decay': 0.006, 'warmup_steps': 17, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9115,2.396757,0.4749,0.51606,0.4749,0.459706
2,2.2112,2.212759,0.5134,0.531713,0.5134,0.510608
3,2.0826,2.138458,0.5281,0.533454,0.5281,0.518414
4,2.0195,2.159803,0.5196,0.53691,0.5196,0.513608
5,1.9834,2.099585,0.5343,0.545273,0.5343,0.526663
6,1.951,2.062164,0.5454,0.546944,0.5454,0.539138


[I 2025-03-27 21:17:47,947] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0024958595710213, 'weight_decay': 0.008, 'warmup_steps': 26, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5654,2.213557,0.5062,0.546294,0.5062,0.500429
2,2.1232,2.178223,0.5092,0.549018,0.5092,0.509436
3,2.0502,2.133668,0.5301,0.551816,0.5301,0.522055
4,2.0024,2.161067,0.5193,0.540475,0.5193,0.513178
5,1.9718,2.107438,0.5327,0.555442,0.5327,0.52728
6,1.9337,2.055164,0.5461,0.557758,0.5461,0.54091


[I 2025-03-27 21:22:35,790] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0009020055155903746, 'weight_decay': 0.008, 'warmup_steps': 24, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7905,2.311908,0.4868,0.525245,0.4868,0.472766
2,2.1502,2.175859,0.518,0.536662,0.518,0.515483
3,2.0419,2.112278,0.5299,0.537437,0.5299,0.520543


[I 2025-03-27 21:24:59,492] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0015661831855920572, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6262,2.236596,0.502,0.542421,0.502,0.493835
2,2.1044,2.15234,0.5195,0.54819,0.5195,0.518372
3,2.0198,2.103418,0.5338,0.546944,0.5338,0.524304
4,1.9706,2.146112,0.523,0.550849,0.523,0.518459
5,1.9414,2.087003,0.5353,0.550832,0.5353,0.529044
6,1.9079,2.036473,0.5494,0.556397,0.5494,0.544533
7,1.8855,2.022677,0.5503,0.553696,0.5503,0.545631
8,1.8642,2.066501,0.5411,0.546536,0.5411,0.534373
9,1.8427,2.066907,0.5424,0.550456,0.5424,0.537299
10,1.8294,2.054856,0.542,0.548992,0.542,0.538533


[I 2025-03-27 21:32:58,204] Trial 112 finished with value: 0.5385325970981695 and parameters: {'learning_rate': 0.0015661831855920572, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 100 with value: 0.5388203850987133.


Trial 113 with params: {'learning_rate': 0.0008593086616472507, 'weight_decay': 0.006, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7945,2.32012,0.4852,0.522937,0.4852,0.470504
2,2.1562,2.179866,0.5188,0.53731,0.5188,0.516354
3,2.046,2.114967,0.5307,0.537735,0.5307,0.521384


[I 2025-03-27 21:35:22,443] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0012045210743316383, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7165,2.267977,0.4964,0.538691,0.4964,0.484822
2,2.1194,2.158151,0.5163,0.539251,0.5163,0.514488
3,2.024,2.10153,0.5348,0.545608,0.5348,0.525924
4,1.9722,2.143356,0.5225,0.547345,0.5225,0.517309
5,1.9416,2.081683,0.5373,0.549783,0.5373,0.530777
6,1.9093,2.036392,0.5485,0.552046,0.5485,0.542839
7,1.8886,2.022768,0.5515,0.553161,0.5515,0.546332
8,1.8697,2.066867,0.541,0.54518,0.541,0.534062
9,1.8511,2.067328,0.5432,0.55028,0.5432,0.537952
10,1.8406,2.058132,0.5406,0.547621,0.5406,0.537098


[I 2025-03-27 21:43:25,637] Trial 114 finished with value: 0.5370975941378692 and parameters: {'learning_rate': 0.0012045210743316383, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}. Best is trial 100 with value: 0.5388203850987133.


Trial 115 with params: {'learning_rate': 0.0006695529600301519, 'weight_decay': 0.005, 'warmup_steps': 28, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9243,2.389955,0.4749,0.514363,0.4749,0.459224
2,2.2054,2.208503,0.5139,0.531666,0.5139,0.510893
3,2.0778,2.134997,0.5286,0.534585,0.5286,0.519098


[I 2025-03-27 21:45:50,279] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.001770954296413646, 'weight_decay': 0.009000000000000001, 'warmup_steps': 29, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6157,2.227191,0.5029,0.54442,0.5029,0.496171
2,2.1049,2.155946,0.5183,0.550781,0.5183,0.517636
3,2.0235,2.108844,0.5333,0.547478,0.5333,0.523819
4,1.9749,2.147199,0.522,0.54896,0.522,0.516943
5,1.9457,2.091529,0.5329,0.549397,0.5329,0.526663
6,1.9113,2.039414,0.5485,0.557252,0.5485,0.543896
7,1.8877,2.024538,0.5506,0.554457,0.5506,0.546058
8,1.8648,2.067241,0.54,0.545656,0.54,0.533286
9,1.8414,2.068294,0.5423,0.550497,0.5423,0.53719
10,1.8264,2.054331,0.5421,0.548677,0.5421,0.538644


[I 2025-03-27 21:53:49,025] Trial 116 finished with value: 0.5386442555729268 and parameters: {'learning_rate': 0.001770954296413646, 'weight_decay': 0.009000000000000001, 'warmup_steps': 29, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 100 with value: 0.5388203850987133.


Trial 117 with params: {'learning_rate': 0.0017552394061247688, 'weight_decay': 0.008, 'warmup_steps': 28, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6145,2.227624,0.5032,0.544844,0.5032,0.496473
2,2.1046,2.155486,0.5187,0.55103,0.5187,0.517985
3,2.0231,2.108355,0.5333,0.547136,0.5333,0.523765
4,1.9744,2.147123,0.5228,0.550006,0.5228,0.51777
5,1.9453,2.091156,0.5336,0.549987,0.5336,0.527406
6,1.9109,2.039138,0.5486,0.557286,0.5486,0.543935
7,1.8874,2.024358,0.5506,0.554438,0.5506,0.546052
8,1.8647,2.067184,0.5403,0.545989,0.5403,0.533528
9,1.8415,2.068164,0.5423,0.550524,0.5423,0.537232
10,1.8265,2.054353,0.5418,0.548649,0.5418,0.538407


[I 2025-03-27 22:01:49,790] Trial 117 finished with value: 0.5384068029796503 and parameters: {'learning_rate': 0.0017552394061247688, 'weight_decay': 0.008, 'warmup_steps': 28, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 100 with value: 0.5388203850987133.


Trial 118 with params: {'learning_rate': 0.0027178207318209863, 'weight_decay': 0.01, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5706,2.213589,0.5084,0.546555,0.5084,0.501528
2,2.1338,2.181827,0.5098,0.549921,0.5098,0.51014
3,2.0619,2.143083,0.5323,0.555885,0.5323,0.524933
4,2.0135,2.170952,0.518,0.540349,0.518,0.512249
5,1.9822,2.112787,0.532,0.556293,0.532,0.526631
6,1.9427,2.060328,0.5447,0.556082,0.5447,0.53904


[I 2025-03-27 22:06:36,878] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0036141356625163518, 'weight_decay': 0.007, 'warmup_steps': 21, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5546,2.23627,0.5093,0.550572,0.5093,0.502863
2,2.1823,2.197613,0.5112,0.547457,0.5112,0.509239
3,2.1137,2.193746,0.5248,0.560199,0.5248,0.518675
4,2.0625,2.225365,0.5101,0.543139,0.5101,0.505224
5,2.0285,2.140526,0.5259,0.551725,0.5259,0.519494
6,1.9817,2.076938,0.5404,0.550255,0.5404,0.532814


[I 2025-03-27 22:11:26,273] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.00016104904333464902, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.629,3.16433,0.368,0.435574,0.368,0.352193
2,2.8521,2.674534,0.4537,0.473559,0.4537,0.445103
3,2.5406,2.488266,0.4792,0.484669,0.4792,0.464661


[I 2025-03-27 22:13:50,803] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.003390165466264586, 'weight_decay': 0.006, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5677,2.228146,0.5089,0.549087,0.5089,0.502196
2,2.17,2.192121,0.5106,0.548198,0.5106,0.509508
3,2.1008,2.17946,0.5254,0.557987,0.5254,0.519325
4,2.05,2.210642,0.5126,0.543329,0.5126,0.507589
5,2.0167,2.132901,0.527,0.552816,0.527,0.521004
6,1.9719,2.072957,0.5406,0.550443,0.5406,0.533335


[I 2025-03-27 22:18:39,288] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0031628051951182404, 'weight_decay': 0.007, 'warmup_steps': 21, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5478,2.219004,0.5098,0.548612,0.5098,0.50343
2,2.1554,2.187937,0.5106,0.54953,0.5106,0.510415
3,2.0863,2.16507,0.5293,0.556894,0.5293,0.522924
4,2.0365,2.19517,0.5141,0.542252,0.5141,0.509025
5,2.004,2.125011,0.5283,0.55338,0.5283,0.522725
6,1.9613,2.068793,0.5418,0.552691,0.5418,0.535305


[I 2025-03-27 22:23:30,490] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.00030195866130251276, 'weight_decay': 0.01, 'warmup_steps': 26, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3254,2.755929,0.428,0.485033,0.428,0.41237
2,2.4818,2.385249,0.4873,0.503449,0.4873,0.481114
3,2.2632,2.265786,0.5087,0.514179,0.5087,0.496577
4,2.1643,2.254833,0.5032,0.518975,0.5032,0.494638
5,2.1105,2.185471,0.5191,0.528873,0.5191,0.5088
6,2.0707,2.144955,0.533,0.53437,0.533,0.525444


[I 2025-03-27 22:28:15,228] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0013832327421661737, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6752,2.251352,0.5,0.541906,0.5,0.490481
2,2.1102,2.153924,0.5185,0.545933,0.5185,0.517289
3,2.0204,2.10089,0.5344,0.546731,0.5344,0.52536
4,1.97,2.144866,0.5218,0.548012,0.5218,0.516848
5,1.9402,2.083741,0.5354,0.549738,0.5354,0.528918
6,1.9073,2.035518,0.549,0.554077,0.549,0.54365
7,1.8858,2.022113,0.5514,0.554086,0.5514,0.546558
8,1.8658,2.066362,0.541,0.545776,0.541,0.534111
9,1.8457,2.066573,0.5433,0.550659,0.5433,0.538147
10,1.8339,2.056055,0.5418,0.548639,0.5418,0.538332


[I 2025-03-27 22:36:16,572] Trial 124 finished with value: 0.53833219984921 and parameters: {'learning_rate': 0.0013832327421661737, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}. Best is trial 100 with value: 0.5388203850987133.


Trial 125 with params: {'learning_rate': 0.0013374423499311365, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6417,2.250854,0.4997,0.54331,0.4997,0.489773
2,2.1088,2.153742,0.5184,0.544645,0.5184,0.517153
3,2.0194,2.100055,0.5346,0.546822,0.5346,0.525751
4,1.9693,2.144031,0.5216,0.547536,0.5216,0.516636
5,1.9397,2.082653,0.5364,0.549887,0.5364,0.530017
6,1.9071,2.035296,0.5491,0.553647,0.5491,0.543547
7,1.8859,2.02195,0.5518,0.554167,0.5518,0.546884
8,1.8663,2.066247,0.5407,0.545327,0.5407,0.533843
9,1.8467,2.066489,0.5427,0.549748,0.5427,0.537508
10,1.8352,2.056378,0.5416,0.548734,0.5416,0.538228


[I 2025-03-27 22:44:16,054] Trial 125 finished with value: 0.5382280355415704 and parameters: {'learning_rate': 0.0013374423499311365, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}. Best is trial 100 with value: 0.5388203850987133.


Trial 126 with params: {'learning_rate': 0.00011718167106187655, 'weight_decay': 0.007, 'warmup_steps': 22, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.777,3.380327,0.328,0.395355,0.328,0.312411
2,3.0799,2.88141,0.4297,0.455922,0.4297,0.422001
3,2.7383,2.659839,0.4574,0.468672,0.4574,0.441947


[I 2025-03-27 22:46:41,552] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.00010672432719553498, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.798,3.427795,0.3183,0.382568,0.3183,0.303274
2,3.1382,2.941041,0.422,0.451055,0.422,0.414609
3,2.7979,2.714549,0.4502,0.46182,0.4502,0.434296
4,2.6074,2.612776,0.452,0.478858,0.452,0.438996
5,2.4948,2.510681,0.4733,0.490144,0.4733,0.458717
6,2.4202,2.445526,0.4892,0.490167,0.4892,0.476415


[I 2025-03-27 22:51:30,569] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0014635488675576419, 'weight_decay': 0.006, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6276,2.242089,0.5025,0.543581,0.5025,0.493626
2,2.1052,2.152285,0.5194,0.547465,0.5194,0.518362
3,2.0189,2.101297,0.5341,0.546976,0.5341,0.524766
4,1.9693,2.145253,0.5221,0.549018,0.5221,0.51735
5,1.9401,2.084879,0.5347,0.549619,0.5347,0.528408
6,1.907,2.035559,0.549,0.554819,0.549,0.543909
7,1.8852,2.022124,0.5508,0.553995,0.5508,0.54611
8,1.8647,2.066261,0.5415,0.546508,0.5415,0.534727
9,1.844,2.066496,0.5432,0.550469,0.5432,0.537963
10,1.8316,2.055373,0.5418,0.548753,0.5418,0.538347


[I 2025-03-27 22:59:30,047] Trial 128 finished with value: 0.5383469666060084 and parameters: {'learning_rate': 0.0014635488675576419, 'weight_decay': 0.006, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 100 with value: 0.5388203850987133.


Trial 129 with params: {'learning_rate': 0.0009017054474960991, 'weight_decay': 0.009000000000000001, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8109,2.314841,0.4876,0.527284,0.4876,0.473539
2,2.152,2.176422,0.5179,0.53642,0.5179,0.51536
3,2.0427,2.112574,0.5297,0.537345,0.5297,0.520292


[I 2025-03-27 23:01:53,337] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0020724864353961945, 'weight_decay': 0.01, 'warmup_steps': 18, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5622,2.216565,0.504,0.547915,0.504,0.498153
2,2.1075,2.165862,0.5143,0.551607,0.5143,0.513879
3,2.0313,2.117652,0.5324,0.547953,0.5324,0.523328
4,1.9838,2.14949,0.5211,0.544087,0.5211,0.515182
5,1.9545,2.097919,0.5314,0.55021,0.5314,0.525452
6,1.9187,2.045135,0.5474,0.557374,0.5474,0.542459


[I 2025-03-27 23:06:39,719] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.002930916461821087, 'weight_decay': 0.007, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5355,2.213061,0.5099,0.546955,0.5099,0.502953
2,2.142,2.184268,0.5108,0.551013,0.5108,0.511191
3,2.0723,2.152382,0.5334,0.558813,0.5334,0.526699
4,2.0235,2.181187,0.5165,0.541484,0.5165,0.511031
5,1.9919,2.117944,0.5307,0.554953,0.5307,0.524872
6,1.951,2.064375,0.5415,0.552671,0.5415,0.535306


[I 2025-03-27 23:11:24,305] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.0016401117577095096, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5644,2.227873,0.5039,0.544696,0.5039,0.496442
2,2.1002,2.151548,0.5194,0.54946,0.5194,0.518541
3,2.0189,2.104106,0.5332,0.546805,0.5332,0.52366
4,1.9706,2.145975,0.5226,0.550146,0.5226,0.517772
5,1.9419,2.08796,0.5349,0.549945,0.5349,0.528455
6,1.9082,2.036884,0.5498,0.557475,0.5498,0.545121
7,1.8855,2.022871,0.5508,0.554326,0.5508,0.546189
8,1.8639,2.066491,0.541,0.546592,0.541,0.534199
9,1.8418,2.06702,0.542,0.550044,0.542,0.536974
10,1.828,2.054479,0.5419,0.548991,0.5419,0.538531


[I 2025-03-27 23:19:19,432] Trial 132 finished with value: 0.5385313602162862 and parameters: {'learning_rate': 0.0016401117577095096, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 100 with value: 0.5388203850987133.


Trial 133 with params: {'learning_rate': 0.0008640418159104696, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7471,2.312907,0.4873,0.523649,0.4873,0.472614
2,2.1515,2.177914,0.5198,0.537839,0.5198,0.51713
3,2.0439,2.113838,0.5314,0.538425,0.5314,0.522167
4,1.9884,2.145165,0.5231,0.542405,0.5231,0.517433
5,1.956,2.08526,0.5372,0.548732,0.5372,0.5301
6,1.9243,2.045423,0.547,0.549217,0.547,0.541032


[I 2025-03-27 23:24:07,307] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 6.558978114640059e-05, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9604,3.693865,0.2377,0.293618,0.2377,0.22653
2,3.4637,3.283117,0.3592,0.396963,0.3592,0.354945
3,3.1468,3.047145,0.4049,0.422063,0.4049,0.389453


[I 2025-03-27 23:26:31,093] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0027885534428530906, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5071,2.209404,0.5096,0.547667,0.5096,0.503161
2,2.1327,2.181602,0.5098,0.549828,0.5098,0.510196
3,2.0631,2.144857,0.5342,0.558484,0.5342,0.527009
4,2.0151,2.172686,0.5175,0.540555,0.5175,0.511892
5,1.9842,2.113615,0.5324,0.556111,0.5324,0.526947
6,1.9444,2.061096,0.5441,0.555437,0.5441,0.538337


[I 2025-03-27 23:31:15,361] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0018423910737186322, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5421,2.219057,0.5037,0.543775,0.5037,0.497254
2,2.1009,2.156082,0.5191,0.552577,0.5191,0.518546
3,2.023,2.109612,0.5324,0.546634,0.5324,0.522982
4,1.9753,2.146937,0.5227,0.54877,0.5227,0.517515
5,1.9467,2.092395,0.5323,0.548917,0.5323,0.526189
6,1.912,2.04006,0.5492,0.558382,0.5492,0.544636
7,1.8881,2.024958,0.5498,0.553855,0.5498,0.545229
8,1.8649,2.067271,0.5404,0.545934,0.5404,0.533581
9,1.8411,2.068469,0.5422,0.550669,0.5422,0.537078
10,1.8256,2.054091,0.542,0.548756,0.542,0.538615


[I 2025-03-27 23:39:11,460] Trial 136 finished with value: 0.5386154307822597 and parameters: {'learning_rate': 0.0018423910737186322, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 6.5}. Best is trial 100 with value: 0.5388203850987133.


Trial 137 with params: {'learning_rate': 0.0027106697739136274, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5079,2.209296,0.509,0.547274,0.509,0.502674
2,2.129,2.180463,0.5097,0.550327,0.5097,0.510235
3,2.059,2.141474,0.533,0.557319,0.533,0.525686
4,2.0113,2.169002,0.5182,0.540269,0.5182,0.512434
5,1.9806,2.111739,0.5324,0.556039,0.5324,0.526935
6,1.9413,2.059526,0.5446,0.556513,0.5446,0.539193


[I 2025-03-27 23:43:59,978] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.002107408063218619, 'weight_decay': 0.005, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.542,2.21463,0.5048,0.548708,0.5048,0.499097
2,2.1072,2.166589,0.5141,0.551558,0.5141,0.513736
3,2.032,2.118471,0.5323,0.548753,0.5323,0.523388
4,1.9847,2.149766,0.5221,0.544781,0.5221,0.516168
5,1.9555,2.098488,0.5313,0.550143,0.5313,0.525302
6,1.9195,2.045741,0.5476,0.558172,0.5476,0.542698
7,1.8938,2.02911,0.549,0.554191,0.549,0.54447
8,1.8685,2.068851,0.5391,0.545314,0.5391,0.532435
9,1.8422,2.071347,0.541,0.550536,0.541,0.536077
10,1.8244,2.054397,0.5416,0.548491,0.5416,0.538271


[I 2025-03-27 23:51:55,015] Trial 138 finished with value: 0.538270613746785 and parameters: {'learning_rate': 0.002107408063218619, 'weight_decay': 0.005, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 6.5}. Best is trial 100 with value: 0.5388203850987133.


Trial 139 with params: {'learning_rate': 0.0013355920421758436, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6393,2.250741,0.5,0.54378,0.5,0.490113
2,2.1086,2.153698,0.5183,0.544326,0.5183,0.517019
3,2.0194,2.099998,0.5346,0.546804,0.5346,0.525733
4,1.9692,2.14396,0.5216,0.547537,0.5216,0.516625
5,1.9396,2.082593,0.5367,0.55004,0.5367,0.530313
6,1.9071,2.035272,0.549,0.553448,0.549,0.54342
7,1.8859,2.021938,0.5519,0.55423,0.5519,0.546979
8,1.8663,2.066237,0.5408,0.545472,0.5408,0.533954
9,1.8467,2.066486,0.5428,0.549872,0.5428,0.537621
10,1.8353,2.056376,0.5415,0.548668,0.5415,0.538147


[I 2025-03-27 23:59:52,958] Trial 139 finished with value: 0.5381466299104214 and parameters: {'learning_rate': 0.0013355920421758436, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 100 with value: 0.5388203850987133.


Trial 140 with params: {'learning_rate': 0.0008598053115603373, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.756,2.314774,0.4868,0.52336,0.4868,0.472148
2,2.1528,2.178583,0.5199,0.537998,0.5199,0.517287
3,2.0446,2.114263,0.5314,0.538339,0.5314,0.522112
4,1.9889,2.145379,0.5231,0.542553,0.5231,0.517456
5,1.9564,2.085469,0.537,0.548392,0.537,0.529835
6,1.9247,2.045681,0.5473,0.549536,0.5473,0.541369
7,1.9051,2.030627,0.5504,0.550864,0.5504,0.54479
8,1.8883,2.072165,0.5389,0.541572,0.5389,0.531865
9,1.8724,2.074003,0.5401,0.547658,0.5401,0.535016
10,1.8647,2.066834,0.5389,0.546062,0.5389,0.535313


[I 2025-03-28 00:07:53,362] Trial 140 finished with value: 0.5353131705692633 and parameters: {'learning_rate': 0.0008598053115603373, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 0.4, 'temperature': 6.5}. Best is trial 100 with value: 0.5388203850987133.


Trial 141 with params: {'learning_rate': 0.0021443565193407996, 'weight_decay': 0.006, 'warmup_steps': 28, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5846,2.217606,0.5036,0.548385,0.5036,0.497932
2,2.1113,2.169198,0.513,0.551692,0.513,0.513029
3,2.035,2.120585,0.5309,0.547703,0.5309,0.522105


[I 2025-03-28 00:10:17,875] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0007977145402186113, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7765,2.330569,0.4839,0.521765,0.4839,0.468974
2,2.1642,2.185596,0.5187,0.53607,0.5187,0.515944
3,2.0523,2.119114,0.5307,0.537056,0.5307,0.521322
4,1.9952,2.147846,0.5234,0.541795,0.5234,0.517453
5,1.9621,2.088099,0.5356,0.547375,0.5356,0.528459
6,1.9304,2.049137,0.5466,0.548406,0.5466,0.540518


[I 2025-03-28 00:15:02,304] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0009020640813782102, 'weight_decay': 0.01, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7429,2.305509,0.4894,0.529177,0.4894,0.475627
2,2.1461,2.174421,0.5194,0.537677,0.5194,0.516918
3,2.0402,2.111525,0.5309,0.538297,0.5309,0.521495


[I 2025-03-28 00:17:25,574] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 5.8193477735771966e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9889,3.745712,0.2137,0.268553,0.2137,0.203086
2,3.5327,3.361158,0.3388,0.380808,0.3388,0.335747
3,3.2305,3.131279,0.3931,0.411897,0.3931,0.378033
4,3.0245,2.987506,0.4013,0.444521,0.4013,0.388876
5,2.8852,2.871882,0.4295,0.45842,0.4295,0.4146
6,2.7871,2.790487,0.4423,0.456124,0.4423,0.426481


[I 2025-03-28 00:22:11,310] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0010729299558257036, 'weight_decay': 0.006, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7499,2.283767,0.493,0.535083,0.493,0.480446
2,2.1301,2.163703,0.5182,0.539265,0.5182,0.516183
3,2.0296,2.104547,0.5326,0.54175,0.5326,0.523545
4,1.9764,2.14288,0.5219,0.544953,0.5219,0.51648
5,1.945,2.081653,0.5369,0.549288,0.5369,0.530256
6,1.913,2.038533,0.5481,0.55125,0.5481,0.542286


[I 2025-03-28 00:26:55,985] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0018317831994225852, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.543,2.219411,0.5037,0.543968,0.5037,0.497246
2,2.1008,2.155733,0.5191,0.552281,0.5191,0.518471
3,2.0227,2.109288,0.5319,0.546057,0.5319,0.52251
4,1.975,2.146874,0.5224,0.548697,0.5224,0.517257
5,1.9464,2.092139,0.5324,0.548855,0.5324,0.526202
6,1.9118,2.039883,0.5489,0.557991,0.5489,0.544277
7,1.8879,2.024824,0.5502,0.554239,0.5502,0.545646
8,1.8648,2.06721,0.5403,0.545793,0.5403,0.53352
9,1.8411,2.068393,0.542,0.550438,0.542,0.536866
10,1.8257,2.054099,0.542,0.548763,0.542,0.538597


[I 2025-03-28 00:34:53,161] Trial 146 finished with value: 0.5385965298134107 and parameters: {'learning_rate': 0.0018317831994225852, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 100 with value: 0.5388203850987133.


Trial 147 with params: {'learning_rate': 0.0028453261842180558, 'weight_decay': 0.009000000000000001, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5714,2.214429,0.5083,0.547132,0.5083,0.501338
2,2.1402,2.183621,0.5097,0.549953,0.5097,0.510009
3,2.069,2.149038,0.5344,0.559124,0.5344,0.527373
4,2.0201,2.17751,0.5163,0.540444,0.5163,0.510715
5,1.9884,2.116134,0.5309,0.555611,0.5309,0.525433
6,1.9481,2.063057,0.5421,0.553184,0.5421,0.535995


[I 2025-03-28 00:39:37,293] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0019413182155920852, 'weight_decay': 0.007, 'warmup_steps': 25, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5898,2.22066,0.5028,0.545143,0.5028,0.496953
2,2.1061,2.161384,0.5159,0.551013,0.5159,0.515369
3,2.0277,2.113755,0.5336,0.548658,0.5336,0.524268
4,1.9797,2.148149,0.5213,0.545588,0.5213,0.515823
5,1.9504,2.095211,0.5317,0.550096,0.5317,0.52589
6,1.9152,2.042478,0.5485,0.558222,0.5485,0.543707
7,1.8905,2.026643,0.5494,0.553854,0.5494,0.544785
8,1.8664,2.068042,0.5401,0.545653,0.5401,0.53329
9,1.8415,2.069819,0.5424,0.551241,0.5424,0.537369
10,1.825,2.054297,0.5416,0.548408,0.5416,0.538247


[I 2025-03-28 00:47:32,252] Trial 148 finished with value: 0.5382470949316869 and parameters: {'learning_rate': 0.0019413182155920852, 'weight_decay': 0.007, 'warmup_steps': 25, 'lambda_param': 0.0, 'temperature': 5.5}. Best is trial 100 with value: 0.5388203850987133.


Trial 149 with params: {'learning_rate': 0.0011364238732391222, 'weight_decay': 0.005, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6745,2.268925,0.4957,0.537291,0.4957,0.483894
2,2.1198,2.15922,0.5189,0.540617,0.5189,0.516849
3,2.0244,2.102021,0.533,0.542633,0.533,0.523896
4,1.9728,2.142345,0.5227,0.546116,0.5227,0.517547
5,1.9423,2.080982,0.5363,0.548682,0.5363,0.529658
6,1.9103,2.036973,0.5489,0.552201,0.5489,0.543148
7,1.89,2.023242,0.5519,0.553307,0.5519,0.546608
8,1.8717,2.067098,0.5408,0.544753,0.5408,0.533915
9,1.8537,2.067748,0.5428,0.550425,0.5428,0.53772
10,1.8439,2.059108,0.5405,0.547194,0.5405,0.536829


[I 2025-03-28 00:55:31,763] Trial 149 finished with value: 0.5368289292477528 and parameters: {'learning_rate': 0.0011364238732391222, 'weight_decay': 0.005, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 4.0}. Best is trial 100 with value: 0.5388203850987133.


In [None]:
print(best_distill_head)

BestRun(run_id='100', objective=0.5388203850987133, hyperparameters={'learning_rate': 0.0018015420304320652, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}, run_summary=None)


In [None]:
base.reset_seed()

## Prohledávání s normálním tréninkem s doučením předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [None]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [None]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [None]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [None]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_mobilenet(100)
)
  

Nastavení prohledávání.

In [None]:
best_base_pretrained = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-03-28 00:55:32,395] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8533,1.021426,0.6997,0.717809,0.6997,0.697692
2,0.7225,0.966209,0.7174,0.743706,0.7174,0.714577
3,0.4153,0.89885,0.7415,0.758191,0.7415,0.739484


[I 2025-03-28 00:59:16,343] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6427,1.224765,0.6452,0.683868,0.6452,0.64065
2,0.8792,1.083289,0.6928,0.722222,0.6928,0.692123
3,0.5711,1.033941,0.7141,0.741506,0.7141,0.714203
4,0.3671,1.069407,0.7215,0.752894,0.7215,0.723647
5,0.2306,1.009602,0.7352,0.754293,0.7352,0.733277
6,0.1322,0.981335,0.7589,0.774265,0.7589,0.759192


[I 2025-03-28 01:06:38,916] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0229,1.727884,0.5741,0.589221,0.5741,0.564595
2,1.3335,1.253056,0.6522,0.671802,0.6522,0.651078
3,0.9263,1.065865,0.6957,0.704398,0.6957,0.692611


[I 2025-03-28 01:10:22,607] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8992,1.477218,0.5885,0.636944,0.5885,0.578725
2,1.1207,1.281718,0.6419,0.687051,0.6419,0.641247
3,0.799,1.139335,0.6818,0.719103,0.6818,0.680864
4,0.5563,1.101262,0.7074,0.738259,0.7074,0.710486
5,0.3708,1.078662,0.7205,0.739786,0.7205,0.719055
6,0.2273,1.049136,0.7379,0.751545,0.7379,0.738978


[I 2025-03-28 01:17:51,144] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1707,1.836794,0.4938,0.561498,0.4938,0.482284
2,1.4117,1.487009,0.5833,0.633803,0.5833,0.58262
3,1.0665,1.268802,0.6427,0.670674,0.6427,0.638941


[I 2025-03-28 01:21:33,560] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.428,1.290313,0.6459,0.656342,0.6459,0.641039
2,0.9719,1.046135,0.699,0.715582,0.699,0.697818
3,0.6294,0.924796,0.7263,0.739085,0.7263,0.724822
4,0.4242,0.962274,0.7205,0.742938,0.7205,0.722294
5,0.289,0.888739,0.7426,0.754898,0.7426,0.741239
6,0.194,0.860826,0.7522,0.761782,0.7522,0.752834


[I 2025-03-28 01:28:59,514] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7386,1.032296,0.698,0.719389,0.698,0.695703
2,0.7139,0.95837,0.7197,0.746323,0.7197,0.719848
3,0.4064,0.901695,0.7452,0.763693,0.7452,0.743833
4,0.2208,0.977586,0.7347,0.763374,0.7347,0.73834
5,0.1245,0.927816,0.7553,0.770906,0.7553,0.755283
6,0.0672,0.933812,0.7636,0.775336,0.7636,0.764439


[I 2025-03-28 01:36:23,929] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5927,1.401976,0.6215,0.633966,0.6215,0.61541
2,1.0724,1.103337,0.6851,0.701331,0.6851,0.683914
3,0.7182,0.963752,0.7199,0.730719,0.7199,0.717667
4,0.5112,0.988161,0.7084,0.731165,0.7084,0.710008
5,0.3739,0.895877,0.7399,0.751761,0.7399,0.73814
6,0.2723,0.867636,0.7467,0.756779,0.7467,0.747582


[I 2025-03-28 01:43:47,963] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6407,1.048354,0.6901,0.711688,0.6901,0.687519
2,0.7149,0.967011,0.7142,0.741099,0.7142,0.71371
3,0.4141,0.935329,0.7401,0.758495,0.7401,0.738747
4,0.2345,0.99645,0.732,0.756884,0.732,0.733847
5,0.132,0.969231,0.7478,0.76659,0.7478,0.747366
6,0.0721,0.930367,0.7673,0.778826,0.7673,0.768485
7,0.0365,1.002549,0.7678,0.78079,0.7678,0.768544
8,0.0184,1.017734,0.7652,0.777021,0.7652,0.766582
9,0.0069,1.045901,0.7732,0.782607,0.7732,0.77302
10,0.0035,0.979116,0.7773,0.783569,0.7773,0.776314


[I 2025-03-28 01:56:08,266] Trial 8 finished with value: 0.7763144942807644 and parameters: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 8 with value: 0.7763144942807644.


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.584,1.105955,0.6806,0.708274,0.6806,0.676369
2,0.7569,1.041054,0.7001,0.731478,0.7001,0.698826
3,0.4631,0.958058,0.7352,0.753724,0.7352,0.733364


[I 2025-03-28 01:59:50,694] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 6.888788881730778e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8991,1.653115,0.5853,0.59767,0.5853,0.576922
2,1.2793,1.221787,0.6592,0.680184,0.6592,0.658827
3,0.8872,1.042453,0.701,0.710524,0.701,0.698325
4,0.6734,1.042899,0.6921,0.715258,0.6921,0.693504
5,0.5322,0.931388,0.7292,0.738676,0.7292,0.727023
6,0.4276,0.891838,0.7374,0.74676,0.7374,0.738183


[I 2025-03-28 02:07:15,081] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0020781267255701565, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0901,1.758773,0.5196,0.589251,0.5196,0.509606
2,1.3371,1.379353,0.6071,0.649642,0.6071,0.60272
3,0.9976,1.254892,0.648,0.680545,0.648,0.642819


[I 2025-03-28 02:10:58,559] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.0004229895735463087, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6561,1.058729,0.6851,0.708083,0.6851,0.680279
2,0.7156,0.987554,0.7126,0.740529,0.7126,0.711513
3,0.4108,0.933958,0.7396,0.757387,0.7396,0.737804
4,0.2379,1.033556,0.7286,0.760602,0.7286,0.731156
5,0.1315,0.984189,0.7499,0.767263,0.7499,0.748764
6,0.0713,0.929696,0.7652,0.77562,0.7652,0.766132
7,0.0384,1.020524,0.7588,0.770699,0.7588,0.758552
8,0.017,1.00698,0.7715,0.781309,0.7715,0.772443
9,0.0069,1.03853,0.7725,0.782467,0.7725,0.772865
10,0.0035,0.972907,0.7802,0.786666,0.7802,0.779284


[I 2025-03-28 02:23:18,837] Trial 12 finished with value: 0.7792835847818538 and parameters: {'learning_rate': 0.0004229895735463087, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}. Best is trial 12 with value: 0.7792835847818538.


Trial 13 with params: {'learning_rate': 0.0002893591596161301, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7322,1.025028,0.696,0.71731,0.696,0.693334
2,0.7067,0.961588,0.7174,0.74161,0.7174,0.71651
3,0.4015,0.896348,0.7411,0.755638,0.7411,0.739196
4,0.22,0.987608,0.7328,0.758251,0.7328,0.735079
5,0.1154,0.918368,0.7528,0.764624,0.7528,0.750689
6,0.0609,0.910078,0.764,0.773549,0.764,0.764745


[I 2025-03-28 02:30:44,900] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.00036841844828218917, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7141,1.034441,0.6926,0.717962,0.6926,0.689706
2,0.7057,0.973092,0.7175,0.742461,0.7175,0.716042
3,0.4071,0.915094,0.7412,0.758505,0.7412,0.741
4,0.2311,0.966177,0.7407,0.765836,0.7407,0.74321
5,0.1202,0.957859,0.7534,0.769855,0.7534,0.75257
6,0.0679,0.935076,0.7644,0.777343,0.7644,0.7656
7,0.0351,0.990325,0.7627,0.776364,0.7627,0.763419
8,0.0163,1.008308,0.7687,0.778197,0.7687,0.769321
9,0.0066,1.046331,0.7676,0.777772,0.7676,0.767625
10,0.0036,0.988784,0.7768,0.784273,0.7768,0.776619


[I 2025-03-28 02:43:06,235] Trial 14 finished with value: 0.7766185474794804 and parameters: {'learning_rate': 0.00036841844828218917, 'weight_decay': 0.008, 'warmup_steps': 16}. Best is trial 12 with value: 0.7792835847818538.


Trial 15 with params: {'learning_rate': 0.0010578942221340086, 'weight_decay': 0.007, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8174,1.363834,0.6094,0.656621,0.6094,0.60435
2,1.0098,1.139455,0.6717,0.702872,0.6717,0.670415
3,0.7016,1.056939,0.7039,0.729169,0.7039,0.700956


[I 2025-03-28 02:46:48,226] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0004147562434716862, 'weight_decay': 0.01, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6814,1.055771,0.6876,0.710651,0.6876,0.684416
2,0.722,0.996634,0.7156,0.743451,0.7156,0.713673
3,0.4176,0.906057,0.7461,0.76316,0.7461,0.744912
4,0.2383,1.038916,0.7267,0.756661,0.7267,0.729341
5,0.1314,0.964267,0.7526,0.767941,0.7526,0.750729
6,0.0725,0.935273,0.7703,0.77956,0.7703,0.771146
7,0.0412,0.998231,0.7664,0.778086,0.7664,0.766376
8,0.018,1.005968,0.7664,0.774335,0.7664,0.766806
9,0.0071,1.046981,0.7702,0.781139,0.7702,0.770659
10,0.0037,0.993678,0.7766,0.783359,0.7766,0.77607


[I 2025-03-28 02:59:08,286] Trial 16 finished with value: 0.7760698908547341 and parameters: {'learning_rate': 0.0004147562434716862, 'weight_decay': 0.01, 'warmup_steps': 14}. Best is trial 12 with value: 0.7792835847818538.


Trial 17 with params: {'learning_rate': 0.0003392099549100544, 'weight_decay': 0.007, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7462,1.026488,0.6959,0.714682,0.6959,0.693089
2,0.7152,0.965207,0.7204,0.74576,0.7204,0.718792
3,0.4102,0.894211,0.7456,0.763169,0.7456,0.744142
4,0.2219,0.995052,0.7322,0.760551,0.7322,0.735327
5,0.1207,0.953002,0.7506,0.76748,0.7506,0.750368
6,0.0665,0.924011,0.7623,0.772992,0.7623,0.763026


[I 2025-03-28 03:06:32,885] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.001983530571135324, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0678,1.705153,0.5295,0.585988,0.5295,0.519574
2,1.3288,1.396707,0.6037,0.647629,0.6037,0.601452
3,0.9851,1.256818,0.6532,0.680239,0.6532,0.650186


[I 2025-03-28 03:10:15,994] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00017098269191031398, 'weight_decay': 0.005, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1866,1.132495,0.6762,0.689263,0.6762,0.67287
2,0.8245,0.984687,0.7099,0.730242,0.7099,0.709185
3,0.4989,0.891949,0.7369,0.7499,0.7369,0.735511
4,0.296,0.963622,0.7239,0.749931,0.7239,0.726631
5,0.1727,0.903263,0.7485,0.762413,0.7485,0.747493
6,0.0974,0.875316,0.7618,0.771224,0.7618,0.762745
7,0.0554,0.942412,0.7541,0.763852,0.7541,0.75384
8,0.0315,0.983986,0.7514,0.763001,0.7514,0.752475
9,0.0173,1.021979,0.7546,0.766863,0.7546,0.754547
10,0.0118,0.974421,0.7582,0.766278,0.7582,0.75738


[I 2025-03-28 03:22:39,034] Trial 19 finished with value: 0.7573799628175659 and parameters: {'learning_rate': 0.00017098269191031398, 'weight_decay': 0.005, 'warmup_steps': 32}. Best is trial 12 with value: 0.7792835847818538.


Trial 20 with params: {'learning_rate': 0.0013326498867251948, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8817,1.510601,0.5783,0.6251,0.5783,0.568958
2,1.1209,1.245811,0.6424,0.679532,0.6424,0.63913
3,0.7957,1.109344,0.6919,0.713863,0.6919,0.688739


[I 2025-03-28 03:26:23,535] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.0005845724196901293, 'weight_decay': 0.008, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6214,1.146956,0.6693,0.704853,0.6693,0.666012
2,0.7854,1.068625,0.6945,0.727858,0.6945,0.693716
3,0.4824,0.970719,0.7318,0.752542,0.7318,0.730739
4,0.2913,1.039389,0.7264,0.755238,0.7264,0.729133
5,0.171,1.010012,0.739,0.756506,0.739,0.737502
6,0.0971,0.968539,0.7603,0.77106,0.7603,0.760932


[I 2025-03-28 03:33:50,070] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.00045122618593124235, 'weight_decay': 0.006, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6499,1.04554,0.6907,0.714829,0.6907,0.687859
2,0.7247,0.991945,0.7106,0.738228,0.7106,0.709618
3,0.4218,0.93146,0.7374,0.758223,0.7374,0.736225
4,0.2445,1.0036,0.7336,0.761302,0.7336,0.73683
5,0.1385,0.962623,0.7532,0.766363,0.7532,0.751482
6,0.0778,0.930738,0.7693,0.781879,0.7693,0.770894
7,0.0408,1.019353,0.76,0.771574,0.76,0.759761
8,0.0186,0.980622,0.7712,0.780033,0.7712,0.772026
9,0.0071,1.038339,0.7702,0.781379,0.7702,0.770324
10,0.0038,0.96929,0.78,0.787756,0.78,0.779227


[I 2025-03-28 03:46:13,586] Trial 22 finished with value: 0.7792269123777986 and parameters: {'learning_rate': 0.00045122618593124235, 'weight_decay': 0.006, 'warmup_steps': 11}. Best is trial 12 with value: 0.7792835847818538.


Trial 23 with params: {'learning_rate': 0.0005291260344151434, 'weight_decay': 0.005, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6387,1.10604,0.6781,0.708481,0.6781,0.67262
2,0.7518,1.027042,0.7055,0.738446,0.7055,0.70515
3,0.463,0.95137,0.736,0.753543,0.736,0.733961
4,0.2676,1.028404,0.7302,0.757164,0.7302,0.731847
5,0.158,0.997692,0.7456,0.759931,0.7456,0.743341
6,0.0887,0.952724,0.7616,0.771051,0.7616,0.761722
7,0.047,1.008884,0.7605,0.770299,0.7605,0.76049
8,0.0221,1.021792,0.7657,0.776847,0.7657,0.767246
9,0.008,1.060637,0.7709,0.783206,0.7709,0.771874
10,0.0037,1.003859,0.7767,0.783506,0.7767,0.776206


[I 2025-03-28 03:58:35,778] Trial 23 finished with value: 0.7762060724710546 and parameters: {'learning_rate': 0.0005291260344151434, 'weight_decay': 0.005, 'warmup_steps': 12}. Best is trial 12 with value: 0.7792835847818538.


Trial 24 with params: {'learning_rate': 0.0010023089553602236, 'weight_decay': 0.005, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7485,1.372946,0.6137,0.661672,0.6137,0.606882
2,0.9806,1.192242,0.6638,0.699442,0.6638,0.660472
3,0.6787,1.077396,0.7039,0.737822,0.7039,0.702379


[I 2025-03-28 04:02:20,045] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.00015787971267243754, 'weight_decay': 0.006, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1319,1.139173,0.6756,0.688936,0.6756,0.672073
2,0.8396,0.986686,0.7111,0.729382,0.7111,0.710065
3,0.5108,0.899209,0.7349,0.747547,0.7349,0.733419
4,0.3134,0.962117,0.7218,0.745263,0.7218,0.723743
5,0.1883,0.902485,0.7495,0.765035,0.7495,0.74855
6,0.1102,0.879849,0.7546,0.764293,0.7546,0.756007


[I 2025-03-28 04:09:46,979] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.00023362220022309262, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8462,1.050703,0.6945,0.714022,0.6945,0.693135
2,0.7375,0.962971,0.7188,0.743705,0.7188,0.718592
3,0.4267,0.885578,0.7451,0.758911,0.7451,0.744231
4,0.2351,0.971481,0.7311,0.756557,0.7311,0.733215
5,0.1251,0.926231,0.7556,0.769769,0.7556,0.75378
6,0.0669,0.908224,0.7609,0.77172,0.7609,0.761435


[I 2025-03-28 04:17:12,726] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.00037782273556835117, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6996,1.04545,0.6898,0.713265,0.6898,0.686488
2,0.713,0.972804,0.7171,0.741672,0.7171,0.714886
3,0.4101,0.921468,0.7371,0.753913,0.7371,0.735038
4,0.2263,1.02678,0.7288,0.757239,0.7288,0.731748
5,0.1211,0.966826,0.7487,0.767293,0.7487,0.746794
6,0.0683,0.952138,0.7659,0.77567,0.7659,0.766868
7,0.0357,0.996803,0.7637,0.77377,0.7637,0.762748
8,0.0161,0.989621,0.7707,0.779718,0.7707,0.771795
9,0.0069,1.034404,0.7679,0.779194,0.7679,0.768404
10,0.0036,0.974323,0.7772,0.783904,0.7772,0.776516


[I 2025-03-28 04:29:35,110] Trial 27 finished with value: 0.7765157522159047 and parameters: {'learning_rate': 0.00037782273556835117, 'weight_decay': 0.007, 'warmup_steps': 15}. Best is trial 12 with value: 0.7792835847818538.


Trial 28 with params: {'learning_rate': 0.001964823749470024, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0721,1.703494,0.5309,0.600069,0.5309,0.518316
2,1.3272,1.412313,0.6017,0.653499,0.6017,0.597694
3,0.9897,1.242552,0.654,0.681823,0.654,0.650244
4,0.7391,1.236768,0.6664,0.699388,0.6664,0.666167
5,0.5315,1.18215,0.6903,0.7131,0.6903,0.688477
6,0.3471,1.111015,0.7152,0.728339,0.7152,0.715646


[I 2025-03-28 04:37:02,164] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.00020820015360918057, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9665,1.056791,0.6933,0.706808,0.6933,0.690411
2,0.7676,0.960135,0.7151,0.736548,0.7151,0.714048
3,0.4478,0.885244,0.741,0.754275,0.741,0.739533
4,0.254,0.958708,0.7303,0.756607,0.7303,0.733105
5,0.1383,0.912624,0.7561,0.77206,0.7561,0.755369
6,0.0747,0.893718,0.7655,0.775901,0.7655,0.766401
7,0.0409,0.949838,0.7599,0.769729,0.7599,0.759442
8,0.021,0.982726,0.7588,0.770435,0.7588,0.7603
9,0.0108,1.037629,0.7584,0.771524,0.7584,0.758822
10,0.007,0.981848,0.7627,0.770058,0.7627,0.76207


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-28 04:49:32,375] Trial 29 finished with value: 0.7620697193678102 and parameters: {'learning_rate': 0.00020820015360918057, 'weight_decay': 0.008, 'warmup_steps': 14}. Best is trial 12 with value: 0.7792835847818538.


Trial 30 with params: {'learning_rate': 0.00023315362853564245, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8792,1.0441,0.6957,0.713694,0.6957,0.693374
2,0.7399,0.962792,0.7171,0.740761,0.7171,0.716343
3,0.4237,0.914129,0.7346,0.753782,0.7346,0.733704
4,0.2377,0.980273,0.7275,0.753165,0.7275,0.729445
5,0.1268,0.912372,0.7541,0.772483,0.7541,0.7541
6,0.0686,0.899489,0.7691,0.779623,0.7691,0.770527
7,0.036,0.955442,0.7651,0.774566,0.7651,0.765152
8,0.0182,0.994151,0.7629,0.771642,0.7629,0.763347
9,0.0087,1.029742,0.765,0.774801,0.765,0.76472
10,0.0055,0.991996,0.7718,0.779572,0.7718,0.771521


[I 2025-03-28 05:01:54,646] Trial 30 finished with value: 0.7715213974815489 and parameters: {'learning_rate': 0.00023315362853564245, 'weight_decay': 0.01, 'warmup_steps': 9}. Best is trial 12 with value: 0.7792835847818538.


Trial 31 with params: {'learning_rate': 0.00035006588858719275, 'weight_decay': 0.008, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7641,1.034025,0.6954,0.714841,0.6954,0.691308
2,0.7178,0.951293,0.7257,0.750316,0.7257,0.724657
3,0.4065,0.886757,0.748,0.762412,0.748,0.747589
4,0.2247,0.990189,0.7321,0.760381,0.7321,0.735001
5,0.1232,0.956593,0.7539,0.770333,0.7539,0.752131
6,0.0658,0.938175,0.7627,0.772277,0.7627,0.76329
7,0.0354,0.995338,0.7641,0.775494,0.7641,0.763795
8,0.0152,1.005072,0.769,0.780126,0.769,0.770611
9,0.0064,1.038395,0.7686,0.780839,0.7686,0.769561
10,0.0038,0.983483,0.7753,0.782362,0.7753,0.774766


[I 2025-03-28 05:14:22,794] Trial 31 finished with value: 0.7747656074425793 and parameters: {'learning_rate': 0.00035006588858719275, 'weight_decay': 0.008, 'warmup_steps': 21}. Best is trial 12 with value: 0.7792835847818538.


Trial 32 with params: {'learning_rate': 0.0006476545280571131, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6248,1.160676,0.6615,0.693699,0.6615,0.655385
2,0.8126,1.066715,0.6956,0.724648,0.6956,0.694882
3,0.5111,1.036732,0.7195,0.746106,0.7195,0.719


[I 2025-03-28 05:18:06,753] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0007658221352781618, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6735,1.229411,0.6436,0.683311,0.6436,0.638128
2,0.8671,1.097176,0.6864,0.719699,0.6864,0.685917
3,0.5697,1.028092,0.7234,0.746284,0.7234,0.721009
4,0.358,1.106634,0.7131,0.748706,0.7131,0.716891
5,0.2229,1.002816,0.7437,0.759553,0.7437,0.742191
6,0.129,1.006856,0.7497,0.763362,0.7497,0.751281


[I 2025-03-28 05:25:33,044] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.000293800344343292, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7883,1.02885,0.7005,0.717557,0.7005,0.697069
2,0.7162,0.95073,0.7214,0.744538,0.7214,0.719802
3,0.4042,0.901629,0.7416,0.758129,0.7416,0.740379
4,0.221,0.996394,0.7259,0.754762,0.7259,0.728334
5,0.1157,0.93407,0.7579,0.774574,0.7579,0.756897
6,0.0628,0.920409,0.7697,0.780439,0.7697,0.770309
7,0.0331,0.976823,0.7672,0.777969,0.7672,0.767523
8,0.0156,0.992415,0.7708,0.780433,0.7708,0.771892
9,0.007,1.017363,0.7704,0.780521,0.7704,0.770881
10,0.0042,0.974683,0.7737,0.781928,0.7737,0.773154


[I 2025-03-28 05:37:51,589] Trial 34 finished with value: 0.7731535416678712 and parameters: {'learning_rate': 0.000293800344343292, 'weight_decay': 0.007, 'warmup_steps': 14}. Best is trial 12 with value: 0.7792835847818538.


Trial 35 with params: {'learning_rate': 0.00023400061805738516, 'weight_decay': 0.006, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9338,1.048582,0.6916,0.707644,0.6916,0.689394
2,0.7475,0.958717,0.7147,0.737219,0.7147,0.713811
3,0.4269,0.904073,0.7383,0.753401,0.7383,0.736523
4,0.2391,0.96046,0.7313,0.756465,0.7313,0.733495
5,0.1288,0.914599,0.758,0.772139,0.758,0.757021
6,0.0685,0.898316,0.7657,0.776535,0.7657,0.766353
7,0.0365,0.977398,0.7559,0.767963,0.7559,0.755895
8,0.0186,0.998838,0.7616,0.77272,0.7616,0.762789
9,0.0089,1.028132,0.7635,0.775121,0.7635,0.763299
10,0.0057,0.986396,0.7645,0.771473,0.7645,0.763853


[I 2025-03-28 05:50:11,689] Trial 35 finished with value: 0.7638525790227962 and parameters: {'learning_rate': 0.00023400061805738516, 'weight_decay': 0.006, 'warmup_steps': 21}. Best is trial 12 with value: 0.7792835847818538.


Trial 36 with params: {'learning_rate': 0.0004728703352705145, 'weight_decay': 0.008, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6497,1.065822,0.6842,0.707113,0.6842,0.680516
2,0.741,1.00009,0.7108,0.734749,0.7108,0.708558
3,0.4346,0.950078,0.7358,0.756781,0.7358,0.734189
4,0.2598,1.047113,0.7255,0.753573,0.7255,0.727348
5,0.1439,0.967356,0.7514,0.765815,0.7514,0.749568
6,0.082,0.957079,0.7635,0.776897,0.7635,0.765374


[I 2025-03-28 05:57:34,558] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 6.059650408878812e-05, 'weight_decay': 0.006, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0997,1.788844,0.5675,0.58382,0.5675,0.557251
2,1.3874,1.286237,0.6457,0.666533,0.6457,0.645001
3,0.9684,1.090914,0.6912,0.699937,0.6912,0.687739
4,0.7477,1.078919,0.6848,0.708293,0.6848,0.685924
5,0.6056,0.95242,0.7276,0.73638,0.7276,0.725252
6,0.4995,0.908327,0.7366,0.744255,0.7366,0.736841


[I 2025-03-28 06:04:59,672] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.000466016775972213, 'weight_decay': 0.001, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6551,1.078996,0.6817,0.707078,0.6817,0.678603
2,0.7472,1.014194,0.7061,0.736392,0.7061,0.705635
3,0.4418,0.939152,0.7403,0.761611,0.7403,0.74063
4,0.2499,1.016759,0.7313,0.75786,0.7313,0.732889
5,0.1417,0.973529,0.7497,0.764886,0.7497,0.7482
6,0.0789,0.934414,0.7663,0.778884,0.7663,0.768228
7,0.0421,1.007492,0.7658,0.777947,0.7658,0.766504
8,0.0196,1.01026,0.7661,0.774063,0.7661,0.766574
9,0.0074,1.065869,0.7695,0.780918,0.7695,0.769636
10,0.0034,0.993846,0.7757,0.782562,0.7757,0.775545


[I 2025-03-28 06:17:19,305] Trial 38 finished with value: 0.7755451163929102 and parameters: {'learning_rate': 0.000466016775972213, 'weight_decay': 0.001, 'warmup_steps': 11}. Best is trial 12 with value: 0.7792835847818538.


Trial 39 with params: {'learning_rate': 5.7801019639330395e-05, 'weight_decay': 0.002, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1981,1.881091,0.5551,0.571297,0.5551,0.544434
2,1.4455,1.32224,0.6381,0.658774,0.6381,0.636753
3,1.0073,1.114062,0.687,0.695511,0.687,0.683349


[I 2025-03-28 06:21:01,498] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.004241076779716196, 'weight_decay': 0.003, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5592,2.099207,0.4306,0.505245,0.4306,0.417344
2,1.7208,1.645666,0.5341,0.582168,0.5341,0.528221
3,1.3471,1.452675,0.5846,0.616888,0.5846,0.580678
4,1.0672,1.459786,0.6049,0.645935,0.6049,0.601742
5,0.8487,1.299375,0.6403,0.657439,0.6403,0.636703
6,0.622,1.280358,0.6684,0.68607,0.6684,0.667693


[I 2025-03-28 06:28:24,793] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 6.459897452290429e-05, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9737,1.717097,0.5741,0.587216,0.5741,0.564567
2,1.3286,1.253234,0.652,0.673758,0.652,0.651457
3,0.9252,1.064598,0.6958,0.704205,0.6958,0.69266


[I 2025-03-28 06:32:11,700] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.000245576881381508, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8408,1.030515,0.699,0.71545,0.699,0.696878
2,0.7311,0.963055,0.7153,0.741345,0.7153,0.714264
3,0.418,0.900189,0.7414,0.758888,0.7414,0.740402
4,0.2324,0.949458,0.7393,0.763417,0.7393,0.741503
5,0.1254,0.929916,0.7522,0.77275,0.7522,0.751995
6,0.0661,0.895915,0.7691,0.778443,0.7691,0.770459
7,0.0343,0.947341,0.7661,0.774808,0.7661,0.765862
8,0.0166,0.988109,0.7645,0.774566,0.7645,0.765686
9,0.0084,1.033508,0.763,0.774814,0.763,0.762947
10,0.0053,0.9834,0.7658,0.773525,0.7658,0.765446


[I 2025-03-28 06:44:34,192] Trial 42 finished with value: 0.7654456621275672 and parameters: {'learning_rate': 0.000245576881381508, 'weight_decay': 0.007, 'warmup_steps': 8}. Best is trial 12 with value: 0.7792835847818538.


Trial 43 with params: {'learning_rate': 0.0005720722787138559, 'weight_decay': 0.008, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5933,1.125585,0.6718,0.70441,0.6718,0.668864
2,0.7704,1.000305,0.7083,0.73544,0.7083,0.707627
3,0.4725,0.963173,0.7282,0.755643,0.7282,0.728617


[I 2025-03-28 06:48:16,142] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.00036781439687623603, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6962,1.03431,0.6974,0.72206,0.6974,0.693274
2,0.7065,0.950133,0.725,0.747173,0.725,0.723028
3,0.4046,0.887847,0.7503,0.768622,0.7503,0.749306
4,0.2243,0.988734,0.7356,0.760314,0.7356,0.737544
5,0.1228,0.936949,0.7538,0.771348,0.7538,0.752598
6,0.068,0.910341,0.7719,0.784508,0.7719,0.774169
7,0.0353,0.96299,0.7671,0.774738,0.7671,0.766472
8,0.0167,0.996292,0.7745,0.78346,0.7745,0.775119
9,0.0068,1.012014,0.7716,0.782311,0.7716,0.771985
10,0.0036,0.973264,0.7782,0.785247,0.7782,0.777577


[I 2025-03-28 07:00:38,353] Trial 44 finished with value: 0.7775766356993649 and parameters: {'learning_rate': 0.00036781439687623603, 'weight_decay': 0.008, 'warmup_steps': 11}. Best is trial 12 with value: 0.7792835847818538.


Trial 45 with params: {'learning_rate': 0.0002643765567058854, 'weight_decay': 0.01, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8659,1.01751,0.7037,0.719221,0.7037,0.701549
2,0.7262,0.949145,0.7184,0.742772,0.7184,0.717441
3,0.412,0.897563,0.7425,0.760144,0.7425,0.741349
4,0.2252,0.972024,0.7317,0.75747,0.7317,0.734536
5,0.1198,0.933904,0.7541,0.772218,0.7541,0.75394
6,0.0618,0.903733,0.7666,0.776258,0.7666,0.767322
7,0.0335,0.972546,0.7612,0.773951,0.7612,0.762088
8,0.0166,0.98149,0.764,0.775863,0.764,0.765598
9,0.0075,1.026544,0.7665,0.777064,0.7665,0.766574
10,0.0046,0.975147,0.7712,0.778419,0.7712,0.770728


[I 2025-03-28 07:13:00,556] Trial 45 finished with value: 0.7707283814349946 and parameters: {'learning_rate': 0.0002643765567058854, 'weight_decay': 0.01, 'warmup_steps': 20}. Best is trial 12 with value: 0.7792835847818538.


Trial 46 with params: {'learning_rate': 0.0002887647202545722, 'weight_decay': 0.006, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7913,1.025822,0.7001,0.720037,0.7001,0.697635
2,0.7196,0.952139,0.7208,0.745613,0.7208,0.720303
3,0.4076,0.910177,0.742,0.762971,0.742,0.741821
4,0.2231,0.971705,0.7335,0.759632,0.7335,0.736055
5,0.1171,0.920556,0.7541,0.76942,0.7541,0.754011
6,0.0633,0.918054,0.7658,0.777253,0.7658,0.767549
7,0.0328,0.984764,0.7619,0.773803,0.7619,0.762024
8,0.0151,0.988932,0.7687,0.77871,0.7687,0.769979
9,0.0071,1.023383,0.7645,0.775089,0.7645,0.764877
10,0.0042,0.980181,0.7711,0.778216,0.7711,0.770225


[I 2025-03-28 07:25:19,895] Trial 46 finished with value: 0.7702250442855336 and parameters: {'learning_rate': 0.0002887647202545722, 'weight_decay': 0.006, 'warmup_steps': 13}. Best is trial 12 with value: 0.7792835847818538.


Trial 47 with params: {'learning_rate': 0.0009933713403660405, 'weight_decay': 0.004, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7115,1.340715,0.615,0.656575,0.615,0.607797
2,0.9762,1.174637,0.6653,0.700175,0.6653,0.663239
3,0.6608,1.113006,0.6992,0.731794,0.6992,0.698483


[I 2025-03-28 07:29:01,349] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0007358771072324186, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6322,1.192338,0.6517,0.680809,0.6517,0.646788
2,0.8436,1.082902,0.6933,0.723565,0.6933,0.691957
3,0.5453,1.000741,0.7243,0.751776,0.7243,0.723281


[I 2025-03-28 07:32:42,127] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.00030685314448999424, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7564,1.031398,0.698,0.71711,0.698,0.695585
2,0.7091,0.969672,0.7187,0.743426,0.7187,0.717444
3,0.4028,0.889417,0.7431,0.761204,0.7431,0.742074
4,0.2205,1.003062,0.7328,0.764085,0.7328,0.735951
5,0.1167,0.939213,0.7502,0.765252,0.7502,0.749377
6,0.0639,0.913583,0.7667,0.778205,0.7667,0.768068
7,0.0342,0.977039,0.7649,0.777059,0.7649,0.764786
8,0.0168,0.989537,0.7667,0.776283,0.7667,0.767386
9,0.0076,1.024206,0.7685,0.779202,0.7685,0.76892
10,0.004,0.974024,0.7732,0.780137,0.7732,0.772646


[I 2025-03-28 07:45:05,650] Trial 49 finished with value: 0.7726462156984788 and parameters: {'learning_rate': 0.00030685314448999424, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13}. Best is trial 12 with value: 0.7792835847818538.


Trial 50 with params: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2858,1.918674,0.4831,0.528812,0.4831,0.469736
2,1.4982,1.544212,0.567,0.619624,0.567,0.563251
3,1.1398,1.353265,0.6287,0.659079,0.6287,0.626528
4,0.8766,1.364213,0.6329,0.672142,0.6329,0.632631
5,0.664,1.252362,0.6621,0.682694,0.6621,0.657588
6,0.4585,1.187517,0.6967,0.715664,0.6967,0.697813


[I 2025-03-28 07:52:34,831] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.00027688509342389675, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7823,1.02743,0.6977,0.717063,0.6977,0.695714
2,0.7161,0.962481,0.7182,0.746282,0.7182,0.717254
3,0.4075,0.885102,0.7448,0.760847,0.7448,0.74393
4,0.2208,0.967224,0.7362,0.763451,0.7362,0.739098
5,0.1171,0.918281,0.7525,0.768932,0.7525,0.75163
6,0.0621,0.900559,0.7686,0.780592,0.7686,0.769886
7,0.0337,0.959217,0.7624,0.772883,0.7624,0.762397
8,0.016,0.989689,0.7647,0.775038,0.7647,0.765809
9,0.0077,1.030463,0.764,0.776218,0.764,0.764158
10,0.0045,0.972187,0.7692,0.776983,0.7692,0.768476


[I 2025-03-28 08:04:54,798] Trial 51 finished with value: 0.7684761256854764 and parameters: {'learning_rate': 0.00027688509342389675, 'weight_decay': 0.008, 'warmup_steps': 7}. Best is trial 12 with value: 0.7792835847818538.


Trial 52 with params: {'learning_rate': 0.0006335925891742951, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6169,1.161731,0.6602,0.691131,0.6602,0.656186
2,0.8063,1.033377,0.7055,0.732246,0.7055,0.704259
3,0.5116,0.983874,0.7258,0.742846,0.7258,0.724479
4,0.3114,1.039769,0.7296,0.756367,0.7296,0.732054
5,0.1814,1.002104,0.7447,0.763575,0.7447,0.743851
6,0.1093,0.951195,0.7622,0.773534,0.7622,0.763648
7,0.0573,1.026468,0.7587,0.770108,0.7587,0.759105
8,0.0286,1.041224,0.763,0.775302,0.763,0.764679
9,0.0103,1.069847,0.7681,0.778101,0.7681,0.768428
10,0.0042,0.991835,0.7789,0.786006,0.7789,0.779132


[I 2025-03-28 08:17:17,771] Trial 52 finished with value: 0.7791315432522842 and parameters: {'learning_rate': 0.0006335925891742951, 'weight_decay': 0.007, 'warmup_steps': 9}. Best is trial 12 with value: 0.7792835847818538.


Trial 53 with params: {'learning_rate': 0.0005479243821711004, 'weight_decay': 0.006, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6444,1.114581,0.6739,0.703337,0.6739,0.670612
2,0.7678,1.043434,0.7024,0.729308,0.7024,0.701047
3,0.4662,0.917289,0.7448,0.764779,0.7448,0.744318
4,0.2777,1.002423,0.7333,0.763101,0.7333,0.736039
5,0.1619,0.993709,0.7452,0.763954,0.7452,0.745182
6,0.0928,0.966989,0.7653,0.77871,0.7653,0.766313


[I 2025-03-28 08:24:42,185] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.000403916017640712, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7355,1.048047,0.6933,0.716116,0.6933,0.690159
2,0.7263,0.95199,0.7194,0.745071,0.7194,0.718475
3,0.4241,0.908751,0.7415,0.761023,0.7415,0.740111
4,0.234,0.998718,0.7339,0.763722,0.7339,0.738063
5,0.1294,0.963589,0.7536,0.770871,0.7536,0.752511
6,0.0722,0.935056,0.765,0.777837,0.765,0.765703
7,0.0387,0.99608,0.7639,0.77434,0.7639,0.763801
8,0.0176,1.017313,0.7693,0.781435,0.7693,0.770666
9,0.0067,1.046993,0.7744,0.78554,0.7744,0.774585
10,0.0033,0.971793,0.778,0.784623,0.778,0.777818


[I 2025-03-28 08:37:03,102] Trial 54 finished with value: 0.7778181991203731 and parameters: {'learning_rate': 0.000403916017640712, 'weight_decay': 0.0, 'warmup_steps': 23}. Best is trial 12 with value: 0.7792835847818538.


Trial 55 with params: {'learning_rate': 0.0003644598615777491, 'weight_decay': 0.001, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7481,1.024637,0.6989,0.718127,0.6989,0.696712
2,0.7132,0.974744,0.7178,0.74426,0.7178,0.716329
3,0.407,0.904225,0.7431,0.759116,0.7431,0.742019
4,0.2273,0.99832,0.7316,0.760317,0.7316,0.733989
5,0.1226,0.960747,0.7478,0.768092,0.7478,0.747766
6,0.067,0.936156,0.7637,0.777644,0.7637,0.765007


[I 2025-03-28 08:44:34,590] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0007442764061691288, 'weight_decay': 0.001, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.695,1.202422,0.6464,0.678931,0.6464,0.640148
2,0.8623,1.102176,0.6878,0.721586,0.6878,0.686699
3,0.5605,1.05182,0.7115,0.742024,0.7115,0.711189
4,0.3524,1.037144,0.7272,0.75335,0.7272,0.729231
5,0.2114,1.004837,0.7444,0.75968,0.7444,0.742813
6,0.1293,0.972024,0.755,0.767058,0.755,0.755847


[I 2025-03-28 08:52:00,399] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0011736831652930928, 'weight_decay': 0.001, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8251,1.451192,0.5893,0.640942,0.5893,0.58462
2,1.0732,1.2061,0.6616,0.696313,0.6616,0.659916
3,0.7403,1.087162,0.698,0.724684,0.698,0.697243


[I 2025-03-28 08:55:43,723] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0013880578878306888, 'weight_decay': 0.008, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8625,1.507112,0.5733,0.622893,0.5733,0.567194
2,1.1363,1.267512,0.6377,0.678217,0.6377,0.635026
3,0.8131,1.11278,0.6931,0.715405,0.6931,0.689564


[I 2025-03-28 08:59:26,760] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0003067998000470195, 'weight_decay': 0.001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8616,1.034316,0.6937,0.714564,0.6937,0.690194
2,0.7263,0.953069,0.7216,0.745862,0.7216,0.721056
3,0.4095,0.88819,0.7449,0.760332,0.7449,0.743102
4,0.2246,0.980875,0.7299,0.756179,0.7299,0.732547
5,0.1192,0.939012,0.7566,0.773073,0.7566,0.755528
6,0.0634,0.912417,0.7664,0.77865,0.7664,0.767562
7,0.0329,0.96368,0.7704,0.777233,0.7704,0.769772
8,0.0161,1.032269,0.7611,0.773685,0.7611,0.762191
9,0.0071,1.050799,0.7648,0.777554,0.7648,0.765433
10,0.0041,0.99997,0.7693,0.776866,0.7693,0.76893


[I 2025-03-28 09:11:50,178] Trial 59 finished with value: 0.7689303526938385 and parameters: {'learning_rate': 0.0003067998000470195, 'weight_decay': 0.001, 'warmup_steps': 32}. Best is trial 12 with value: 0.7792835847818538.


Trial 60 with params: {'learning_rate': 0.00011728115260668028, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4049,1.283438,0.6496,0.660989,0.6496,0.64441
2,0.9665,1.040356,0.6995,0.715938,0.6995,0.6986
3,0.6245,0.920444,0.7289,0.739547,0.7289,0.727132


[I 2025-03-28 09:15:37,722] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0004763715558786438, 'weight_decay': 0.007, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6213,1.078155,0.6883,0.719031,0.6883,0.685062
2,0.7302,1.017893,0.7114,0.74185,0.7114,0.711131
3,0.4362,0.933053,0.7363,0.75415,0.7363,0.735018
4,0.2553,0.999561,0.7362,0.762762,0.7362,0.739012
5,0.1472,0.957142,0.7525,0.767514,0.7525,0.751507
6,0.0797,0.937149,0.7682,0.779816,0.7682,0.769873
7,0.0445,1.005867,0.7623,0.773419,0.7623,0.761375
8,0.0196,1.036435,0.7682,0.781494,0.7682,0.769678
9,0.0075,1.05635,0.7668,0.78019,0.7668,0.76794
10,0.0034,1.00636,0.773,0.780084,0.773,0.772


[I 2025-03-28 09:28:01,141] Trial 61 finished with value: 0.7719999954669062 and parameters: {'learning_rate': 0.0004763715558786438, 'weight_decay': 0.007, 'warmup_steps': 6}. Best is trial 12 with value: 0.7792835847818538.


Trial 62 with params: {'learning_rate': 0.00028225352558308145, 'weight_decay': 0.001, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8426,1.013988,0.6996,0.718086,0.6996,0.697666
2,0.7165,0.961015,0.7218,0.74863,0.7218,0.721545
3,0.4065,0.903587,0.7437,0.76084,0.7437,0.742307
4,0.2234,0.982011,0.7357,0.762723,0.7357,0.738149
5,0.1187,0.951328,0.7499,0.770043,0.7499,0.749936
6,0.0622,0.912736,0.7662,0.779366,0.7662,0.768229
7,0.0345,0.981771,0.7623,0.774432,0.7623,0.762865
8,0.0162,1.000991,0.7656,0.776668,0.7656,0.766319
9,0.0074,1.03976,0.765,0.777561,0.765,0.765267
10,0.0045,0.983607,0.7712,0.779473,0.7712,0.770538


[I 2025-03-28 09:40:23,738] Trial 62 finished with value: 0.7705383785512206 and parameters: {'learning_rate': 0.00028225352558308145, 'weight_decay': 0.001, 'warmup_steps': 22}. Best is trial 12 with value: 0.7792835847818538.


Trial 63 with params: {'learning_rate': 0.00037416385209000473, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7595,1.052157,0.6908,0.713644,0.6908,0.687445
2,0.7243,0.991203,0.7124,0.740615,0.7124,0.71053
3,0.413,0.924169,0.739,0.756746,0.739,0.737428
4,0.2281,1.011945,0.7343,0.76046,0.7343,0.736438
5,0.1233,0.94421,0.7541,0.7676,0.7541,0.752563
6,0.0696,0.960141,0.7633,0.777309,0.7633,0.764732
7,0.0355,0.995819,0.7634,0.773348,0.7634,0.763035
8,0.0176,1.004117,0.7693,0.777961,0.7693,0.769536
9,0.0067,1.030019,0.7695,0.779505,0.7695,0.769581
10,0.0039,0.994866,0.7748,0.782861,0.7748,0.774777


[I 2025-03-28 09:52:46,720] Trial 63 finished with value: 0.7747772476151102 and parameters: {'learning_rate': 0.00037416385209000473, 'weight_decay': 0.0, 'warmup_steps': 24}. Best is trial 12 with value: 0.7792835847818538.


Trial 64 with params: {'learning_rate': 0.0007326300809046857, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6338,1.208458,0.6539,0.691646,0.6539,0.648998
2,0.8524,1.086994,0.6869,0.719356,0.6869,0.686059
3,0.5443,1.018324,0.7155,0.740276,0.7155,0.714351


[I 2025-03-28 09:56:29,783] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.000241251747353242, 'weight_decay': 0.009000000000000001, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9715,1.054879,0.6871,0.705299,0.6871,0.684901
2,0.7498,0.955751,0.7175,0.740444,0.7175,0.716097
3,0.4284,0.895304,0.7399,0.756652,0.7399,0.738963
4,0.2395,0.98477,0.7296,0.758257,0.7296,0.732955
5,0.1273,0.92728,0.7517,0.765523,0.7517,0.751013
6,0.0664,0.915082,0.7613,0.773272,0.7613,0.762571


[I 2025-03-28 10:03:56,202] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.00030135557948449837, 'weight_decay': 0.007, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.784,1.018103,0.7007,0.721165,0.7007,0.698899
2,0.7085,0.961723,0.7169,0.743792,0.7169,0.715284
3,0.3993,0.891554,0.7421,0.760415,0.7421,0.741312
4,0.2211,0.990986,0.7317,0.762271,0.7317,0.734812
5,0.1173,0.951365,0.7542,0.770467,0.7542,0.753477
6,0.0624,0.910948,0.7658,0.775669,0.7658,0.765984


[I 2025-03-28 10:11:19,628] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0004866596741893669, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6574,1.088868,0.6789,0.705318,0.6789,0.674644
2,0.7412,0.988635,0.7159,0.743759,0.7159,0.714767
3,0.4438,0.909375,0.7405,0.756718,0.7405,0.738787
4,0.2606,1.044716,0.7287,0.760646,0.7287,0.731429
5,0.1462,0.947492,0.7523,0.771094,0.7523,0.751438
6,0.0808,0.960037,0.7641,0.779255,0.7641,0.76627
7,0.044,0.99076,0.7683,0.776785,0.7683,0.767829
8,0.0201,1.014059,0.7713,0.780838,0.7713,0.772132
9,0.0074,1.027939,0.7739,0.7821,0.7739,0.773657
10,0.0037,0.990789,0.78,0.786505,0.78,0.779399


[I 2025-03-28 10:23:37,298] Trial 67 finished with value: 0.7793985767219671 and parameters: {'learning_rate': 0.0004866596741893669, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14}. Best is trial 67 with value: 0.7793985767219671.


Trial 68 with params: {'learning_rate': 0.0004868773864754982, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6334,1.084841,0.681,0.70735,0.681,0.676319
2,0.7436,0.992378,0.7109,0.739996,0.7109,0.710111
3,0.4419,0.949747,0.7328,0.755847,0.7328,0.732114
4,0.2544,1.000277,0.7309,0.757657,0.7309,0.73348
5,0.1447,0.983009,0.7511,0.766601,0.7511,0.750386
6,0.0808,0.948102,0.7637,0.774934,0.7637,0.764837
7,0.0425,1.009961,0.7654,0.774258,0.7654,0.763854
8,0.0198,1.028435,0.7674,0.778086,0.7674,0.76876
9,0.0075,1.074132,0.7659,0.779898,0.7659,0.7669
10,0.0038,0.993428,0.7751,0.781429,0.7751,0.77348


[I 2025-03-28 10:35:59,602] Trial 68 finished with value: 0.7734796195810132 and parameters: {'learning_rate': 0.0004868773864754982, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}. Best is trial 67 with value: 0.7793985767219671.


Trial 69 with params: {'learning_rate': 0.004618563219406311, 'weight_decay': 0.007, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6209,2.098387,0.4337,0.492399,0.4337,0.419455
2,1.7577,1.739679,0.5117,0.571545,0.5117,0.507831
3,1.3772,1.554835,0.571,0.60766,0.571,0.563188


[I 2025-03-28 10:39:42,254] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00011173677241810954, 'weight_decay': 0.01, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.46,1.300762,0.6438,0.656668,0.6438,0.639044
2,0.9888,1.05106,0.6966,0.713841,0.6966,0.695903
3,0.644,0.933657,0.7258,0.735886,0.7258,0.72341


[I 2025-03-28 10:43:26,525] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0009740810529971122, 'weight_decay': 0.008, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7408,1.316839,0.6271,0.672406,0.6271,0.622634
2,0.9643,1.142796,0.6734,0.7056,0.6734,0.669858
3,0.6596,1.067604,0.7026,0.736547,0.7026,0.703284
4,0.4381,1.068122,0.7183,0.746005,0.7183,0.720098
5,0.2841,1.031557,0.7357,0.753181,0.7357,0.734634
6,0.1702,1.004831,0.7442,0.756813,0.7442,0.744487


[I 2025-03-28 10:50:51,665] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0004514593719282817, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6782,1.057109,0.6898,0.714977,0.6898,0.686052
2,0.7328,0.986268,0.7141,0.73661,0.7141,0.712781
3,0.4335,0.927684,0.7373,0.755832,0.7373,0.736203
4,0.2498,0.98781,0.7321,0.757461,0.7321,0.734079
5,0.1394,0.954338,0.7516,0.768042,0.7516,0.750879
6,0.0781,0.953583,0.7645,0.778145,0.7645,0.766202


[I 2025-03-28 10:58:16,719] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 5.953168512495511e-05, 'weight_decay': 0.01, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1604,1.84442,0.5591,0.575689,0.5591,0.548413
2,1.4204,1.304265,0.6425,0.664572,0.6425,0.64182
3,0.9873,1.100895,0.6874,0.696538,0.6874,0.683912
4,0.7627,1.084105,0.6841,0.706909,0.6841,0.68504
5,0.6182,0.957606,0.7251,0.733898,0.7251,0.722641
6,0.5112,0.914052,0.7321,0.740561,0.7321,0.732624


[I 2025-03-28 11:05:41,300] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.00047057940812218703, 'weight_decay': 0.007, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.654,1.086126,0.6848,0.712196,0.6848,0.679602
2,0.7344,1.01499,0.7059,0.736703,0.7059,0.703975
3,0.4329,0.93742,0.7416,0.758935,0.7416,0.740511
4,0.2498,1.022608,0.7348,0.764604,0.7348,0.738643
5,0.1413,0.976984,0.7456,0.763688,0.7456,0.744803
6,0.079,0.947401,0.7624,0.772806,0.7624,0.762741


[I 2025-03-28 11:13:06,195] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0007543774748223616, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6727,1.221446,0.6445,0.684429,0.6445,0.639229
2,0.8747,1.065305,0.6933,0.722228,0.6933,0.69225
3,0.5653,0.972508,0.7287,0.751756,0.7287,0.728
4,0.3584,1.050071,0.7237,0.752022,0.7237,0.726312
5,0.2169,0.982474,0.7471,0.762148,0.7471,0.745966
6,0.1274,0.987462,0.7535,0.769521,0.7535,0.755281


[I 2025-03-28 11:20:31,356] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.0003878225608200182, 'weight_decay': 0.008, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7108,1.045588,0.6944,0.715728,0.6944,0.691071
2,0.7196,0.961719,0.7172,0.742361,0.7172,0.716324
3,0.4136,0.89617,0.7454,0.759555,0.7454,0.744398
4,0.234,1.014408,0.7301,0.758488,0.7301,0.732678
5,0.1257,0.966287,0.7524,0.768435,0.7524,0.751566
6,0.0693,0.944498,0.7672,0.780812,0.7672,0.769231
7,0.0356,0.993034,0.7645,0.775712,0.7645,0.764552
8,0.0166,1.004882,0.7682,0.777627,0.7682,0.768533
9,0.0071,1.043041,0.7704,0.781851,0.7704,0.770865
10,0.0036,1.005251,0.774,0.780098,0.774,0.773226


[I 2025-03-28 11:32:57,287] Trial 76 finished with value: 0.7732262327991535 and parameters: {'learning_rate': 0.0003878225608200182, 'weight_decay': 0.008, 'warmup_steps': 17}. Best is trial 67 with value: 0.7793985767219671.


Trial 77 with params: {'learning_rate': 0.00038957051474042167, 'weight_decay': 0.005, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6754,1.023575,0.6974,0.719266,0.6974,0.694967
2,0.7104,0.985911,0.7151,0.741121,0.7151,0.712744
3,0.4065,0.917894,0.7422,0.75916,0.7422,0.740524
4,0.2295,1.031403,0.73,0.759315,0.73,0.733134
5,0.1235,0.947223,0.7574,0.773666,0.7574,0.757316
6,0.0689,0.948858,0.7621,0.776089,0.7621,0.763204


[I 2025-03-28 11:40:21,698] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.000534235910277785, 'weight_decay': 0.008, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6393,1.112455,0.6755,0.707132,0.6755,0.672307
2,0.7615,1.009467,0.7052,0.73333,0.7052,0.704727
3,0.4631,0.935416,0.7385,0.757113,0.7385,0.737331
4,0.2711,1.006747,0.7317,0.75749,0.7317,0.733668
5,0.1549,0.997689,0.7481,0.763532,0.7481,0.747347
6,0.089,0.974187,0.7638,0.776553,0.7638,0.765193
7,0.049,0.983867,0.7635,0.771856,0.7635,0.763543
8,0.0212,1.037151,0.7672,0.778677,0.7672,0.767857
9,0.0086,1.050192,0.7707,0.781486,0.7707,0.771008
10,0.0039,0.986107,0.7793,0.784703,0.7793,0.778425


[I 2025-03-28 11:52:46,305] Trial 78 finished with value: 0.778425239863479 and parameters: {'learning_rate': 0.000534235910277785, 'weight_decay': 0.008, 'warmup_steps': 12}. Best is trial 67 with value: 0.7793985767219671.


Trial 79 with params: {'learning_rate': 0.0007434607905551791, 'weight_decay': 0.008, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6366,1.177675,0.6602,0.69791,0.6602,0.655482
2,0.8488,1.084756,0.686,0.717529,0.686,0.685314
3,0.5481,0.983699,0.7234,0.746879,0.7234,0.722014


[I 2025-03-28 11:56:30,070] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0004171998229558951, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6671,1.037134,0.6955,0.718696,0.6955,0.692863
2,0.7213,0.978568,0.7152,0.742552,0.7152,0.71473
3,0.4166,0.924723,0.7389,0.757471,0.7389,0.737829
4,0.2394,0.998824,0.7365,0.763336,0.7365,0.738812
5,0.1325,0.992881,0.7466,0.762792,0.7466,0.743648
6,0.075,0.951919,0.7661,0.777943,0.7661,0.767004
7,0.0411,0.993027,0.7701,0.779341,0.7701,0.769854
8,0.0187,1.029236,0.7651,0.776515,0.7651,0.766059
9,0.0075,1.035674,0.7723,0.78219,0.7723,0.772613
10,0.004,0.981702,0.7793,0.785234,0.7793,0.778612


[I 2025-03-28 12:09:13,544] Trial 80 finished with value: 0.7786116277401254 and parameters: {'learning_rate': 0.0004171998229558951, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11}. Best is trial 67 with value: 0.7793985767219671.


Trial 81 with params: {'learning_rate': 0.00019039598177024198, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0081,1.086298,0.6853,0.700687,0.6853,0.682044
2,0.7875,0.97426,0.7103,0.733383,0.7103,0.709497
3,0.4636,0.900431,0.7357,0.750817,0.7357,0.733791
4,0.2693,0.97442,0.7261,0.75396,0.7261,0.728883
5,0.1509,0.908404,0.7487,0.764083,0.7487,0.747994
6,0.0829,0.905073,0.7581,0.770836,0.7581,0.759794


[I 2025-03-28 12:16:41,092] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.00046563477382348276, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6582,1.086258,0.6798,0.705831,0.6798,0.674908
2,0.7379,0.989585,0.712,0.7375,0.712,0.710213
3,0.4394,0.938409,0.7349,0.753401,0.7349,0.73324
4,0.2529,1.00751,0.7299,0.760388,0.7299,0.733223
5,0.1441,0.963663,0.7502,0.770366,0.7502,0.749532
6,0.0819,0.96643,0.7682,0.77854,0.7682,0.768812
7,0.0444,1.002975,0.7591,0.768514,0.7591,0.758862
8,0.0199,1.022443,0.7667,0.776994,0.7667,0.76779
9,0.0074,1.04423,0.7683,0.777938,0.7683,0.76804
10,0.0037,1.002329,0.7733,0.780492,0.7733,0.772544


[I 2025-03-28 12:29:20,075] Trial 82 finished with value: 0.7725436446808428 and parameters: {'learning_rate': 0.00046563477382348276, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13}. Best is trial 67 with value: 0.7793985767219671.


Trial 83 with params: {'learning_rate': 0.0005640241621813505, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.611,1.137553,0.6669,0.700318,0.6669,0.66293
2,0.7705,1.042684,0.7002,0.731615,0.7002,0.699895
3,0.4697,0.941851,0.7361,0.753096,0.7361,0.734817
4,0.2785,1.042543,0.7275,0.756375,0.7275,0.730129
5,0.1649,1.014807,0.7415,0.763595,0.7415,0.741342
6,0.0901,0.979276,0.7599,0.771939,0.7599,0.760934


[I 2025-03-28 12:36:58,852] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.0013348564566056686, 'weight_decay': 0.006, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8432,1.46953,0.5878,0.6451,0.5878,0.58051
2,1.111,1.238052,0.643,0.682516,0.643,0.642592
3,0.7865,1.127573,0.692,0.718318,0.692,0.689672
4,0.5566,1.146071,0.695,0.722479,0.695,0.696159
5,0.3776,1.113703,0.7149,0.735802,0.7149,0.712578
6,0.232,1.045622,0.7425,0.754912,0.7425,0.742895


[I 2025-03-28 12:44:23,824] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0005743098097216993, 'weight_decay': 0.007, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6202,1.139474,0.669,0.702797,0.669,0.664998
2,0.7734,1.036573,0.7031,0.73187,0.7031,0.702162
3,0.4761,0.939425,0.7347,0.753207,0.7347,0.732875


[I 2025-03-28 12:48:07,264] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.00035748227256751105, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7046,1.023452,0.6964,0.714522,0.6964,0.693334
2,0.7067,0.963299,0.7205,0.746278,0.7205,0.719788
3,0.4059,0.923486,0.7429,0.765165,0.7429,0.742034
4,0.2243,0.99804,0.7319,0.760311,0.7319,0.73355
5,0.1205,0.949965,0.7521,0.767789,0.7521,0.75013
6,0.0663,0.924767,0.7666,0.780717,0.7666,0.768945
7,0.0351,1.003851,0.7658,0.77979,0.7658,0.766475
8,0.017,1.003074,0.7676,0.777735,0.7676,0.768614
9,0.0066,1.023861,0.7703,0.782399,0.7703,0.770801
10,0.0036,0.963704,0.7768,0.784849,0.7768,0.776837


[I 2025-03-28 13:00:39,351] Trial 86 finished with value: 0.7768365856134832 and parameters: {'learning_rate': 0.00035748227256751105, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11}. Best is trial 67 with value: 0.7793985767219671.


Trial 87 with params: {'learning_rate': 0.0004017729612602353, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6692,1.04294,0.6917,0.713971,0.6917,0.68982
2,0.711,0.983184,0.7147,0.740152,0.7147,0.713245
3,0.4126,0.883219,0.7472,0.762967,0.7472,0.74687
4,0.2317,0.989243,0.7364,0.759236,0.7364,0.738729
5,0.1274,0.970303,0.7472,0.765231,0.7472,0.746436
6,0.07,0.926207,0.768,0.778473,0.768,0.768999
7,0.0385,0.991119,0.7645,0.777209,0.7645,0.764995
8,0.0166,1.014453,0.7665,0.776699,0.7665,0.767707
9,0.0069,1.038184,0.7672,0.779824,0.7672,0.76749
10,0.0034,0.991443,0.7729,0.778825,0.7729,0.772209


[I 2025-03-28 13:13:05,641] Trial 87 finished with value: 0.7722087345229339 and parameters: {'learning_rate': 0.0004017729612602353, 'weight_decay': 0.01, 'warmup_steps': 11}. Best is trial 67 with value: 0.7793985767219671.


Trial 88 with params: {'learning_rate': 0.00032109171012285095, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7221,1.035463,0.6985,0.719041,0.6985,0.69635
2,0.7069,0.961481,0.7169,0.74055,0.7169,0.715179
3,0.3978,0.896964,0.7474,0.764279,0.7474,0.746433
4,0.215,1.01778,0.728,0.760145,0.728,0.731065
5,0.1173,0.936009,0.7526,0.768107,0.7526,0.752605
6,0.0631,0.912732,0.7685,0.778344,0.7685,0.769628
7,0.0322,0.998303,0.7625,0.771709,0.7625,0.761912
8,0.0145,1.01947,0.7633,0.773929,0.7633,0.764371
9,0.0068,1.042865,0.7719,0.781971,0.7719,0.772224
10,0.0039,1.01026,0.7723,0.779226,0.7723,0.771706


[I 2025-03-28 13:25:40,666] Trial 88 finished with value: 0.7717064886068797 and parameters: {'learning_rate': 0.00032109171012285095, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8}. Best is trial 67 with value: 0.7793985767219671.


Trial 89 with params: {'learning_rate': 0.0003051554208886587, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7494,1.028967,0.6984,0.71559,0.6984,0.695265
2,0.7112,0.960105,0.7189,0.743265,0.7189,0.718265
3,0.4076,0.902575,0.7425,0.75965,0.7425,0.741756
4,0.2232,0.994377,0.7331,0.760316,0.7331,0.734478
5,0.1182,0.926836,0.7552,0.77078,0.7552,0.754382
6,0.0624,0.903192,0.771,0.78146,0.771,0.771664
7,0.0328,0.978442,0.7615,0.771698,0.7615,0.760798
8,0.0156,0.996125,0.7663,0.77796,0.7663,0.767601
9,0.0076,1.028225,0.7667,0.777279,0.7667,0.766928
10,0.0041,0.98131,0.7736,0.780927,0.7736,0.772987


[I 2025-03-28 13:38:06,799] Trial 89 finished with value: 0.7729867907788435 and parameters: {'learning_rate': 0.0003051554208886587, 'weight_decay': 0.008, 'warmup_steps': 10}. Best is trial 67 with value: 0.7793985767219671.


Trial 90 with params: {'learning_rate': 0.0008164899730030919, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6712,1.284879,0.6305,0.680329,0.6305,0.627037
2,0.9005,1.076393,0.685,0.714274,0.685,0.684226
3,0.5903,0.998771,0.7238,0.745083,0.7238,0.721925
4,0.3793,1.051718,0.7199,0.744993,0.7199,0.720991
5,0.2319,1.032889,0.7391,0.756378,0.7391,0.739018
6,0.1381,1.005855,0.7497,0.762016,0.7497,0.749825


[I 2025-03-28 13:45:39,304] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0003894162217910998, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6886,1.029177,0.6957,0.714425,0.6957,0.691388
2,0.7071,0.96534,0.7181,0.743321,0.7181,0.716354
3,0.4063,0.896644,0.7447,0.759773,0.7447,0.743184
4,0.2305,0.998951,0.7315,0.759757,0.7315,0.734141
5,0.1249,0.952207,0.7552,0.769054,0.7552,0.753433
6,0.0681,0.928916,0.7701,0.782449,0.7701,0.772123
7,0.0356,0.988137,0.7661,0.775797,0.7661,0.766343
8,0.017,1.026971,0.7661,0.777835,0.7661,0.767573
9,0.0068,1.043564,0.7728,0.784417,0.7728,0.773429
10,0.0035,0.990117,0.7771,0.784652,0.7771,0.777122


[I 2025-03-28 13:58:05,505] Trial 91 finished with value: 0.7771216461789271 and parameters: {'learning_rate': 0.0003894162217910998, 'weight_decay': 0.008, 'warmup_steps': 16}. Best is trial 67 with value: 0.7793985767219671.


Trial 92 with params: {'learning_rate': 0.00034935771321383316, 'weight_decay': 0.0, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7375,1.019954,0.7001,0.717829,0.7001,0.696815
2,0.7078,0.955388,0.7227,0.749534,0.7227,0.721666
3,0.404,0.906461,0.7441,0.762067,0.7441,0.742279
4,0.2246,0.975892,0.7411,0.763083,0.7411,0.742409
5,0.1203,0.940748,0.7584,0.77271,0.7584,0.757516
6,0.0645,0.927888,0.7698,0.782269,0.7698,0.770538
7,0.0345,0.988072,0.7646,0.772208,0.7646,0.764286
8,0.0157,1.014169,0.765,0.77699,0.765,0.766316
9,0.0066,1.029331,0.771,0.781437,0.771,0.771429
10,0.0038,0.985398,0.777,0.784298,0.777,0.7768


[I 2025-03-28 14:10:35,744] Trial 92 finished with value: 0.776800159515452 and parameters: {'learning_rate': 0.00034935771321383316, 'weight_decay': 0.0, 'warmup_steps': 17}. Best is trial 67 with value: 0.7793985767219671.


Trial 93 with params: {'learning_rate': 0.00029236386785266237, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7917,1.024999,0.7011,0.71945,0.7011,0.699084
2,0.7116,0.95653,0.7211,0.744827,0.7211,0.720306
3,0.4039,0.91009,0.7376,0.754912,0.7376,0.736184
4,0.219,0.973532,0.7345,0.761492,0.7345,0.737119
5,0.1163,0.938686,0.7503,0.767083,0.7503,0.749091
6,0.0638,0.911435,0.7628,0.776095,0.7628,0.764788


[I 2025-03-28 14:18:01,894] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0006815749986470376, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6276,1.148331,0.666,0.697379,0.666,0.660744
2,0.8181,1.038573,0.7015,0.726511,0.7015,0.69906
3,0.5201,1.005189,0.7228,0.744279,0.7228,0.721638
4,0.3227,1.020915,0.7306,0.757479,0.7306,0.733197
5,0.1952,1.052149,0.7362,0.758732,0.7362,0.735512
6,0.1119,0.965684,0.7601,0.770258,0.7601,0.760942


[I 2025-03-28 14:25:32,962] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0002267650021393895, 'weight_decay': 0.006, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8758,1.051854,0.6926,0.708109,0.6926,0.689477
2,0.7428,0.967458,0.7156,0.739648,0.7156,0.71517
3,0.4282,0.906183,0.7368,0.753883,0.7368,0.73646
4,0.2398,0.958391,0.7314,0.756658,0.7314,0.734211
5,0.1309,0.929061,0.7514,0.765967,0.7514,0.750888
6,0.0689,0.915228,0.7639,0.775808,0.7639,0.76569


[I 2025-03-28 14:33:02,894] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.00043054586762287123, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6525,1.051765,0.6954,0.718173,0.6954,0.691821
2,0.7199,1.001057,0.7119,0.741971,0.7119,0.711255
3,0.4268,0.919155,0.7408,0.757029,0.7408,0.739248
4,0.2421,1.027291,0.7312,0.759741,0.7312,0.733952
5,0.1335,0.956219,0.7539,0.769835,0.7539,0.752782
6,0.0736,0.928076,0.7702,0.780433,0.7702,0.771372
7,0.0387,1.022804,0.7619,0.774211,0.7619,0.761873
8,0.0176,1.010623,0.768,0.777841,0.768,0.768957
9,0.0062,1.057803,0.7732,0.785116,0.7732,0.774056
10,0.0033,0.980765,0.7798,0.786162,0.7798,0.779132


[I 2025-03-28 14:45:36,089] Trial 96 finished with value: 0.7791320903992918 and parameters: {'learning_rate': 0.00043054586762287123, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}. Best is trial 67 with value: 0.7793985767219671.


Trial 97 with params: {'learning_rate': 0.000766818047088654, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6603,1.254248,0.6432,0.683538,0.6432,0.637446
2,0.8685,1.057422,0.6927,0.720458,0.6927,0.691763


: 

: 

In [None]:
print(best_base_pretrained)

In [None]:
base.reset_seed()

## Prohledávání s destilací s doučením předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [None]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-KD_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-KD_hp-search",  remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [None]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [None]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [None]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_mobilenet(100)
)

Nastavení prohledávání.

In [None]:
best_distil_pretrained = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-03-28 19:35:28,117] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.004346495755569775, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4645,2.034819,0.4735,0.516155,0.4735,0.459134
2,1.7442,1.678343,0.5655,0.60968,0.5655,0.563037
3,1.4052,1.454783,0.6206,0.643512,0.6206,0.612996


[I 2025-03-28 19:39:15,438] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.003489140632563016, 'weight_decay': 0.0, 'warmup_steps': 6, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3778,1.977903,0.4905,0.542747,0.4905,0.477161
2,1.6654,1.628992,0.5722,0.616906,0.5722,0.568948
3,1.34,1.414789,0.6316,0.656816,0.6316,0.627771
4,1.0938,1.368851,0.6386,0.671296,0.6386,0.635903
5,0.8902,1.213373,0.6748,0.692982,0.6748,0.670579
6,0.6981,1.148816,0.6919,0.711204,0.6919,0.691012


[I 2025-03-28 19:46:49,950] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 0.00029945018361271497, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8851,1.137421,0.7017,0.716924,0.7017,0.69914
2,0.8646,1.019316,0.7308,0.754516,0.7308,0.730331
3,0.5806,0.919312,0.7506,0.765314,0.7506,0.748397


[I 2025-03-28 19:50:36,066] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0006086423540561215, 'weight_decay': 0.001, 'warmup_steps': 26, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7337,1.168994,0.6891,0.712,0.6891,0.68477
2,0.8961,1.032792,0.7252,0.748423,0.7252,0.723038
3,0.6184,0.9391,0.7535,0.774178,0.7535,0.752645
4,0.4481,0.968532,0.7467,0.768854,0.7467,0.747914
5,0.3333,0.898339,0.7611,0.779978,0.7611,0.760375
6,0.2578,0.818676,0.7852,0.79764,0.7852,0.786997


[I 2025-03-28 19:58:08,568] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0017516992455793442, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9586,1.600045,0.5823,0.629448,0.5823,0.571833
2,1.3038,1.356752,0.6435,0.680105,0.6435,0.639694
3,0.9948,1.191992,0.6874,0.713881,0.6874,0.682045


[I 2025-03-28 20:01:55,184] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.001435437674097734, 'weight_decay': 0.008, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8516,1.487792,0.6102,0.657872,0.6102,0.606699
2,1.1953,1.272344,0.6597,0.695845,0.6597,0.655567
3,0.9074,1.102439,0.7098,0.733667,0.7098,0.708855
4,0.6931,1.093103,0.7157,0.741457,0.7157,0.715788
5,0.5323,0.997498,0.742,0.756517,0.742,0.740095
6,0.3958,0.909525,0.7581,0.769933,0.7581,0.757716


[I 2025-03-28 20:09:27,365] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.002661808797375751, 'weight_decay': 0.006, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1892,1.832249,0.5206,0.571755,0.5206,0.508158
2,1.5118,1.530475,0.5935,0.635539,0.5935,0.590172
3,1.1887,1.306962,0.6575,0.684766,0.6575,0.649971


[I 2025-03-28 20:13:13,479] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.00022353042733892474, 'weight_decay': 0.008, 'warmup_steps': 21, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9975,1.172907,0.6968,0.712652,0.6968,0.693375
2,0.9073,1.041864,0.7274,0.749463,0.7274,0.726382
3,0.616,0.93175,0.751,0.766685,0.751,0.749529
4,0.4434,0.963534,0.7432,0.765942,0.7432,0.745122
5,0.3352,0.882602,0.767,0.780125,0.767,0.76561
6,0.2644,0.84614,0.7729,0.785022,0.7729,0.773357
7,0.2192,0.857072,0.7726,0.782725,0.7726,0.772033
8,0.1897,0.877668,0.7669,0.781672,0.7669,0.768908
9,0.1689,0.87583,0.77,0.782431,0.77,0.770931
10,0.1564,0.844547,0.7707,0.782536,0.7707,0.771509


[I 2025-03-28 20:25:47,906] Trial 7 finished with value: 0.771509151963985 and parameters: {'learning_rate': 0.00022353042733892474, 'weight_decay': 0.008, 'warmup_steps': 21, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 7 with value: 0.771509151963985.


Trial 8 with params: {'learning_rate': 8.672783321180477e-05, 'weight_decay': 0.007, 'warmup_steps': 25, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6857,1.624464,0.6066,0.624063,0.6066,0.598826
2,1.3082,1.262057,0.6801,0.696819,0.6801,0.677885
3,0.9502,1.092261,0.7128,0.723507,0.7128,0.709883
4,0.7524,1.088237,0.712,0.733024,0.712,0.713013
5,0.6248,0.976764,0.7439,0.753432,0.7439,0.74117
6,0.5319,0.937561,0.7504,0.760974,0.7504,0.750774


[I 2025-03-28 20:33:19,750] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.00048591599724086986, 'weight_decay': 0.005, 'warmup_steps': 14, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.695,1.143902,0.6944,0.717634,0.6944,0.690474
2,0.8563,1.026256,0.7288,0.755154,0.7288,0.729033
3,0.5901,0.91199,0.7584,0.77371,0.7584,0.756897
4,0.4207,0.970473,0.7444,0.772896,0.7444,0.747738
5,0.3141,0.869952,0.77,0.78218,0.77,0.768377
6,0.2433,0.814453,0.7798,0.793333,0.7798,0.781152


[I 2025-03-28 20:40:54,536] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0005677490617498829, 'weight_decay': 0.008, 'warmup_steps': 17, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6996,1.146881,0.6945,0.716958,0.6945,0.68913
2,0.8792,1.033588,0.7245,0.750957,0.7245,0.724408
3,0.6072,0.960765,0.7453,0.768,0.7453,0.7444
4,0.4378,0.951328,0.7496,0.772095,0.7496,0.750889
5,0.3259,0.891269,0.7629,0.77771,0.7629,0.761712
6,0.2516,0.824671,0.7795,0.791149,0.7795,0.779783
7,0.203,0.824468,0.7802,0.790389,0.7802,0.780701
8,0.1687,0.832422,0.7767,0.794423,0.7767,0.779399
9,0.1448,0.820082,0.7785,0.790992,0.7785,0.779365
10,0.1302,0.778133,0.7844,0.79635,0.7844,0.785184


[I 2025-03-28 20:53:29,300] Trial 10 finished with value: 0.7851838411346583 and parameters: {'learning_rate': 0.0005677490617498829, 'weight_decay': 0.008, 'warmup_steps': 17, 'lambda_param': 0.8, 'temperature': 3.5}. Best is trial 10 with value: 0.7851838411346583.


Trial 11 with params: {'learning_rate': 0.0007721411337822958, 'weight_decay': 0.007, 'warmup_steps': 16, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.704,1.232429,0.6728,0.705644,0.6728,0.668446
2,0.9476,1.116753,0.7038,0.735848,0.7038,0.701535
3,0.6736,1.008204,0.7367,0.76604,0.7367,0.735973
4,0.4938,0.985543,0.7403,0.764889,0.7403,0.741481
5,0.372,0.907876,0.7634,0.777184,0.7634,0.762615
6,0.279,0.830983,0.7767,0.788462,0.7767,0.777925


[I 2025-03-28 21:00:59,184] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.00010293025141970873, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5337,1.510113,0.626,0.640549,0.626,0.618793
2,1.2082,1.204233,0.6926,0.710287,0.6926,0.690162
3,0.8663,1.050044,0.7227,0.733897,0.7227,0.720058
4,0.6734,1.046542,0.7218,0.742376,0.7218,0.722944
5,0.5496,0.952181,0.7502,0.760473,0.7502,0.747659
6,0.4594,0.91678,0.7524,0.763742,0.7524,0.75302


[I 2025-03-28 21:08:48,916] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.00014947913488867464, 'weight_decay': 0.005, 'warmup_steps': 24, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2544,1.314297,0.6659,0.679773,0.6659,0.66162
2,1.0318,1.100073,0.7156,0.733412,0.7156,0.713888
3,0.7156,0.973048,0.7409,0.753078,0.7409,0.738263
4,0.5328,0.996669,0.7348,0.757008,0.7348,0.736477
5,0.4188,0.919866,0.7583,0.76989,0.7583,0.756338
6,0.3376,0.874131,0.7621,0.773966,0.7621,0.762753


[I 2025-03-28 21:16:20,356] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0007299426115981158, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6834,1.208714,0.6786,0.708198,0.6786,0.674074
2,0.9325,1.116732,0.7025,0.736203,0.7025,0.701221
3,0.6608,0.992152,0.7392,0.762871,0.7392,0.737449
4,0.4836,0.985607,0.7443,0.767555,0.7443,0.745606
5,0.3626,0.921582,0.7568,0.775119,0.7568,0.755576
6,0.2748,0.836165,0.7784,0.789949,0.7784,0.778507
7,0.2173,0.83071,0.7778,0.790418,0.7778,0.778293
8,0.179,0.807653,0.7873,0.799483,0.7873,0.789285
9,0.1498,0.817218,0.7803,0.792776,0.7803,0.781635
10,0.1327,0.773382,0.7861,0.796886,0.7861,0.786958


[I 2025-03-28 21:28:51,797] Trial 14 finished with value: 0.7869584501147755 and parameters: {'learning_rate': 0.0007299426115981158, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 14 with value: 0.7869584501147755.


Trial 15 with params: {'learning_rate': 0.0006522486282715796, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6569,1.187039,0.684,0.711745,0.684,0.680295
2,0.9049,1.071392,0.7132,0.743266,0.7132,0.710671
3,0.634,0.959972,0.7478,0.768928,0.7478,0.745078
4,0.4616,0.972444,0.7446,0.767094,0.7446,0.746001
5,0.3443,0.897633,0.7614,0.777112,0.7614,0.759381
6,0.2617,0.831501,0.7795,0.791819,0.7795,0.779872
7,0.2105,0.807074,0.7828,0.791044,0.7828,0.782634
8,0.1733,0.835304,0.7796,0.797539,0.7796,0.782344
9,0.1472,0.817,0.7825,0.79429,0.7825,0.783602
10,0.1311,0.774037,0.781,0.792072,0.781,0.781207


[I 2025-03-28 21:41:23,511] Trial 15 finished with value: 0.7812069201990057 and parameters: {'learning_rate': 0.0006522486282715796, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.9, 'temperature': 7.0}. Best is trial 14 with value: 0.7869584501147755.


Trial 16 with params: {'learning_rate': 0.00012435687347047645, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3353,1.392513,0.6541,0.664647,0.6541,0.647275
2,1.1072,1.141515,0.7073,0.723087,0.7073,0.705342
3,0.7832,1.006392,0.7311,0.743141,0.7311,0.728784
4,0.5951,1.014059,0.7297,0.750505,0.7297,0.730805
5,0.4767,0.936873,0.752,0.76399,0.752,0.749741
6,0.3911,0.896831,0.7574,0.768134,0.7574,0.757508


[I 2025-03-28 21:48:59,168] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.00226881207698981, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0964,1.739455,0.5472,0.594633,0.5472,0.538409
2,1.4311,1.444068,0.6233,0.663375,0.6233,0.619636
3,1.1112,1.252646,0.6696,0.697587,0.6696,0.665866


[I 2025-03-28 21:52:45,062] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0006413693067056748, 'weight_decay': 0.007, 'warmup_steps': 26, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7352,1.194541,0.6844,0.711361,0.6844,0.67981
2,0.9105,1.076193,0.7118,0.743706,0.7118,0.710916
3,0.6362,0.972522,0.7423,0.762666,0.7423,0.73914
4,0.4629,0.981109,0.7453,0.769121,0.7453,0.747235
5,0.3442,0.872604,0.7671,0.779507,0.7671,0.766402
6,0.2631,0.820435,0.7822,0.792974,0.7822,0.783339
7,0.2092,0.830252,0.7769,0.789591,0.7769,0.777533
8,0.1728,0.818572,0.7812,0.796632,0.7812,0.783486
9,0.1466,0.822775,0.7799,0.791705,0.7799,0.780635
10,0.1306,0.77135,0.788,0.799008,0.788,0.788658


[I 2025-03-28 22:05:17,329] Trial 18 finished with value: 0.7886579941376499 and parameters: {'learning_rate': 0.0006413693067056748, 'weight_decay': 0.007, 'warmup_steps': 26, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 18 with value: 0.7886579941376499.


Trial 19 with params: {'learning_rate': 0.0006387880868896327, 'weight_decay': 0.008, 'warmup_steps': 29, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7326,1.19995,0.6824,0.70666,0.6824,0.675708
2,0.916,1.062563,0.7152,0.742833,0.7152,0.712246
3,0.6316,0.947581,0.7445,0.764182,0.7445,0.741403
4,0.4558,0.939515,0.7553,0.77624,0.7553,0.756841
5,0.3463,0.86976,0.7678,0.781843,0.7678,0.766615
6,0.2644,0.82726,0.7803,0.792794,0.7803,0.780792
7,0.2087,0.819295,0.7801,0.791269,0.7801,0.780604
8,0.173,0.826253,0.7784,0.794219,0.7784,0.780636
9,0.1466,0.806729,0.7838,0.795678,0.7838,0.784642
10,0.1308,0.76732,0.7889,0.798743,0.7889,0.788784


[I 2025-03-28 22:17:49,644] Trial 19 finished with value: 0.788784321745608 and parameters: {'learning_rate': 0.0006387880868896327, 'weight_decay': 0.008, 'warmup_steps': 29, 'lambda_param': 0.5, 'temperature': 3.5}. Best is trial 19 with value: 0.788784321745608.


Trial 20 with params: {'learning_rate': 0.0005267268280578099, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7472,1.149747,0.6974,0.71911,0.6974,0.693839
2,0.882,1.04112,0.7245,0.751062,0.7245,0.724691
3,0.6062,0.950502,0.7463,0.766934,0.7463,0.744199
4,0.4314,0.975525,0.743,0.768165,0.743,0.745301
5,0.3195,0.882468,0.7706,0.785179,0.7706,0.769587
6,0.2473,0.81195,0.7799,0.79238,0.7799,0.780904
7,0.2003,0.818884,0.7826,0.793876,0.7826,0.78316
8,0.1664,0.831486,0.7801,0.796169,0.7801,0.783283
9,0.1438,0.816725,0.7792,0.791177,0.7792,0.780472
10,0.1294,0.782991,0.7838,0.795625,0.7838,0.784646


[I 2025-03-28 22:30:21,793] Trial 20 finished with value: 0.7846462656550759 and parameters: {'learning_rate': 0.0005267268280578099, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 19 with value: 0.788784321745608.


Trial 21 with params: {'learning_rate': 0.0024806070652836684, 'weight_decay': 0.006, 'warmup_steps': 27, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1603,1.770551,0.54,0.601301,0.54,0.531163
2,1.4602,1.480443,0.6132,0.65702,0.6132,0.611384
3,1.1397,1.251078,0.664,0.691197,0.664,0.659495
4,0.9034,1.201725,0.6866,0.714163,0.6866,0.686155
5,0.7125,1.097602,0.7098,0.724373,0.7098,0.70585
6,0.5444,1.013295,0.7245,0.738096,0.7245,0.7242


[I 2025-03-28 22:37:50,363] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0005480138815057326, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7559,1.174235,0.6864,0.711479,0.6864,0.681432
2,0.8802,1.033504,0.7235,0.74976,0.7235,0.72168
3,0.6073,0.937078,0.7541,0.771953,0.7541,0.752649
4,0.4354,0.96097,0.7449,0.770694,0.7449,0.747776
5,0.3258,0.883743,0.7629,0.781031,0.7629,0.763475
6,0.2491,0.825984,0.7778,0.789466,0.7778,0.77875


[I 2025-03-28 22:45:20,802] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0009232408310142194, 'weight_decay': 0.009000000000000001, 'warmup_steps': 28, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7763,1.311954,0.6575,0.690354,0.6575,0.652511
2,1.015,1.130632,0.7006,0.731992,0.7006,0.698737
3,0.7342,1.039886,0.7279,0.753638,0.7279,0.727098
4,0.5441,1.031991,0.7331,0.760622,0.7331,0.734888
5,0.4109,0.932307,0.7532,0.7683,0.7532,0.750691
6,0.3079,0.858907,0.7723,0.783495,0.7723,0.77204
7,0.2376,0.851058,0.7748,0.785946,0.7748,0.774316
8,0.1907,0.836447,0.7777,0.790256,0.7777,0.779195
9,0.156,0.822072,0.7818,0.793487,0.7818,0.783066
10,0.1366,0.788434,0.7851,0.794946,0.7851,0.785347


[I 2025-03-28 22:57:53,507] Trial 23 finished with value: 0.7853470449252089 and parameters: {'learning_rate': 0.0009232408310142194, 'weight_decay': 0.009000000000000001, 'warmup_steps': 28, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}. Best is trial 19 with value: 0.788784321745608.


Trial 24 with params: {'learning_rate': 0.0007549338138864273, 'weight_decay': 0.007, 'warmup_steps': 24, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7374,1.23716,0.6785,0.707633,0.6785,0.67448
2,0.9462,1.094196,0.7041,0.735549,0.7041,0.702692
3,0.6776,0.998016,0.7372,0.760465,0.7372,0.735429


[I 2025-03-28 23:01:38,124] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0016919842706983037, 'weight_decay': 0.005, 'warmup_steps': 20, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9449,1.60841,0.5784,0.629255,0.5784,0.573137
2,1.2731,1.328739,0.6476,0.683334,0.6476,0.643034
3,0.9707,1.143895,0.6921,0.713959,0.6921,0.687949
4,0.7424,1.152713,0.7018,0.729163,0.7018,0.702689
5,0.5719,1.032996,0.7332,0.747378,0.7332,0.729782
6,0.433,0.951653,0.7481,0.761858,0.7481,0.748271


[I 2025-03-28 23:09:09,233] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.0023542201210474166, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1404,1.730814,0.5434,0.603666,0.5434,0.534941
2,1.4421,1.467373,0.6145,0.651072,0.6145,0.610704
3,1.1285,1.29656,0.6617,0.691271,0.6617,0.657046


[I 2025-03-28 23:12:55,865] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0011710805588155784, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7678,1.386903,0.6351,0.680139,0.6351,0.629566
2,1.1031,1.230015,0.6793,0.713745,0.6793,0.677585
3,0.8139,1.102313,0.7057,0.738472,0.7057,0.703469
4,0.6107,1.059237,0.7266,0.753621,0.7266,0.72905
5,0.4661,0.954058,0.7535,0.768106,0.7535,0.751599
6,0.3499,0.873103,0.7667,0.775518,0.7667,0.76636
7,0.2665,0.870692,0.7646,0.777809,0.7646,0.764331
8,0.2069,0.859154,0.7674,0.784375,0.7674,0.770005
9,0.1658,0.850222,0.7698,0.784157,0.7698,0.771905
10,0.1432,0.793005,0.781,0.792921,0.781,0.782499


[I 2025-03-28 23:25:33,524] Trial 27 finished with value: 0.7824986668256797 and parameters: {'learning_rate': 0.0011710805588155784, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 19 with value: 0.788784321745608.


Trial 28 with params: {'learning_rate': 0.00013710899623213207, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3367,1.349317,0.6585,0.670045,0.6585,0.652615
2,1.0692,1.117713,0.7125,0.72943,0.7125,0.710528
3,0.749,0.989836,0.7373,0.749998,0.7373,0.735188
4,0.5616,1.001918,0.7323,0.75324,0.7323,0.73305
5,0.4451,0.920573,0.7578,0.769486,0.7578,0.755613
6,0.3613,0.88238,0.7616,0.772328,0.7616,0.761903


[I 2025-03-28 23:33:03,762] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 7.474347094761894e-05, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7208,1.710879,0.592,0.611715,0.592,0.582777
2,1.3861,1.318481,0.6683,0.686645,0.6683,0.665873
3,1.0194,1.134567,0.7037,0.713188,0.7037,0.700144


[I 2025-03-28 23:36:50,656] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0033727683282408225, 'weight_decay': 0.007, 'warmup_steps': 31, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3493,1.908766,0.5101,0.567336,0.5101,0.50101
2,1.6292,1.606947,0.5822,0.630457,0.5822,0.579521
3,1.2874,1.363004,0.6402,0.663176,0.6402,0.633595


[I 2025-03-28 23:40:38,206] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0016056553477408909, 'weight_decay': 0.01, 'warmup_steps': 28, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9343,1.540947,0.5968,0.650143,0.5968,0.589269
2,1.249,1.302449,0.6558,0.688836,0.6558,0.651134
3,0.944,1.142995,0.6942,0.721751,0.6942,0.690116
4,0.7292,1.14,0.7049,0.732313,0.7049,0.705042
5,0.5622,1.016284,0.7318,0.747124,0.7318,0.729526
6,0.4173,0.92944,0.7552,0.766349,0.7552,0.754787


[I 2025-03-28 23:48:10,088] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.0010108130286685692, 'weight_decay': 0.008, 'warmup_steps': 25, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7816,1.336661,0.6512,0.69022,0.6512,0.644588
2,1.0489,1.197368,0.681,0.715235,0.681,0.678636
3,0.764,1.058404,0.7228,0.750887,0.7228,0.720501


[I 2025-03-28 23:51:55,381] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.001185299123481456, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8326,1.419632,0.6268,0.670936,0.6268,0.620539
2,1.1143,1.206674,0.6809,0.713665,0.6809,0.679367
3,0.8209,1.062657,0.7225,0.746924,0.7225,0.722213
4,0.6211,1.078589,0.7198,0.748023,0.7198,0.721244
5,0.469,0.969639,0.7449,0.758667,0.7449,0.741912
6,0.3551,0.883457,0.7612,0.772507,0.7612,0.761495


[I 2025-03-28 23:59:27,475] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0028465718802679397, 'weight_decay': 0.008, 'warmup_steps': 27, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2427,1.872427,0.5091,0.567489,0.5091,0.498588
2,1.5461,1.551835,0.5955,0.644405,0.5955,0.592253
3,1.2121,1.348134,0.6455,0.67237,0.6455,0.639607


[I 2025-03-29 00:03:12,992] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.0006975812382762342, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7546,1.22736,0.6701,0.696417,0.6701,0.663452
2,0.9278,1.065387,0.716,0.742928,0.716,0.715184
3,0.6531,0.969478,0.7463,0.767006,0.7463,0.746148
4,0.4799,0.989805,0.743,0.770523,0.743,0.745569
5,0.3554,0.904711,0.7589,0.777716,0.7589,0.758762
6,0.2713,0.833529,0.776,0.792228,0.776,0.77803
7,0.214,0.826855,0.7788,0.790389,0.7788,0.778516
8,0.1765,0.819225,0.7798,0.793645,0.7798,0.782061
9,0.1487,0.815105,0.7819,0.793146,0.7819,0.783081
10,0.1319,0.773042,0.7893,0.801654,0.7893,0.790463


[I 2025-03-29 00:15:47,887] Trial 35 finished with value: 0.7904627692423465 and parameters: {'learning_rate': 0.0006975812382762342, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 36 with params: {'learning_rate': 0.00043688050515517106, 'weight_decay': 0.01, 'warmup_steps': 28, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7751,1.124779,0.7046,0.722054,0.7046,0.700453
2,0.8575,1.019529,0.727,0.754028,0.727,0.725615
3,0.5827,0.916502,0.7596,0.777125,0.7596,0.758026
4,0.4124,0.958073,0.7488,0.77492,0.7488,0.751759
5,0.3084,0.867917,0.7709,0.784444,0.7709,0.770405
6,0.2386,0.816965,0.7812,0.789895,0.7812,0.781516
7,0.1952,0.832943,0.7779,0.789221,0.7779,0.778098
8,0.1655,0.848326,0.7703,0.786824,0.7703,0.77261
9,0.1456,0.830034,0.78,0.793976,0.78,0.78168
10,0.1314,0.784228,0.7848,0.795349,0.7848,0.78559


[I 2025-03-29 00:28:22,520] Trial 36 finished with value: 0.785590454447738 and parameters: {'learning_rate': 0.00043688050515517106, 'weight_decay': 0.01, 'warmup_steps': 28, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 37 with params: {'learning_rate': 0.00017559280388301614, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0618,1.232319,0.6826,0.694929,0.6826,0.677572
2,0.963,1.0806,0.7161,0.738187,0.7161,0.714309
3,0.6625,0.956113,0.7411,0.75522,0.7411,0.738941
4,0.4861,0.973258,0.7411,0.762493,0.7411,0.742904
5,0.3753,0.903706,0.7606,0.772759,0.7606,0.758491
6,0.2996,0.865559,0.7665,0.778346,0.7665,0.767135
7,0.2506,0.879916,0.764,0.774218,0.764,0.763701
8,0.2168,0.902293,0.7593,0.775232,0.7593,0.761612
9,0.1938,0.909581,0.7623,0.774329,0.7623,0.763138
10,0.1804,0.870151,0.7662,0.7779,0.7662,0.766901


[I 2025-03-29 00:40:56,579] Trial 37 finished with value: 0.7669010269336838 and parameters: {'learning_rate': 0.00017559280388301614, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 35 with value: 0.7904627692423465.


Trial 38 with params: {'learning_rate': 0.00018952641174433627, 'weight_decay': 0.008, 'warmup_steps': 29, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1148,1.219364,0.688,0.700646,0.688,0.684107
2,0.9488,1.05745,0.7209,0.741508,0.7209,0.719486
3,0.6487,0.948231,0.7448,0.759484,0.7448,0.7423
4,0.4741,0.967896,0.7415,0.763969,0.7415,0.74365
5,0.3631,0.901289,0.7616,0.774365,0.7616,0.759487
6,0.2885,0.851056,0.7669,0.777911,0.7669,0.767332
7,0.2406,0.878483,0.7691,0.780353,0.7691,0.768412
8,0.2074,0.894917,0.76,0.775306,0.76,0.762201
9,0.1851,0.907088,0.7591,0.771407,0.7591,0.759544
10,0.1718,0.864143,0.7641,0.775074,0.7641,0.764823


[I 2025-03-29 00:53:27,875] Trial 38 finished with value: 0.7648232784468117 and parameters: {'learning_rate': 0.00018952641174433627, 'weight_decay': 0.008, 'warmup_steps': 29, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 39 with params: {'learning_rate': 0.0005783010583859606, 'weight_decay': 0.007, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7629,1.179142,0.6872,0.712384,0.6872,0.68321
2,0.8941,1.051204,0.7165,0.74589,0.7165,0.714602
3,0.6167,0.952574,0.7461,0.768349,0.7461,0.745684
4,0.4396,0.987043,0.7428,0.771399,0.7428,0.745683
5,0.329,0.884172,0.7678,0.784383,0.7678,0.767526
6,0.2518,0.816143,0.7838,0.79554,0.7838,0.784144
7,0.2036,0.803057,0.7882,0.799828,0.7882,0.789071
8,0.1705,0.811044,0.7829,0.79807,0.7829,0.785204
9,0.1461,0.817744,0.7816,0.795049,0.7816,0.782716
10,0.13,0.771409,0.7894,0.800987,0.7894,0.790044


[I 2025-03-29 01:05:58,727] Trial 39 finished with value: 0.7900443704456083 and parameters: {'learning_rate': 0.0005783010583859606, 'weight_decay': 0.007, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 3.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 40 with params: {'learning_rate': 0.0004659087727357161, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7846,1.147804,0.7006,0.724688,0.7006,0.697072
2,0.8563,1.013717,0.7253,0.749029,0.7253,0.723488
3,0.5847,0.922194,0.7517,0.770799,0.7517,0.749649
4,0.417,0.975925,0.7457,0.773588,0.7457,0.748765
5,0.3127,0.862307,0.7709,0.786969,0.7709,0.771324
6,0.2414,0.818815,0.7794,0.791535,0.7794,0.779828
7,0.1967,0.817662,0.7806,0.794683,0.7806,0.782007
8,0.1664,0.818721,0.7812,0.795651,0.7812,0.783577
9,0.1445,0.820759,0.783,0.794385,0.783,0.784131
10,0.1304,0.779882,0.7871,0.797705,0.7871,0.787684


[I 2025-03-29 01:18:31,226] Trial 40 finished with value: 0.7876838416206046 and parameters: {'learning_rate': 0.0004659087727357161, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 35 with value: 0.7904627692423465.


Trial 41 with params: {'learning_rate': 0.0006325229652805972, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7479,1.179903,0.6878,0.712244,0.6878,0.684047
2,0.9124,1.0538,0.7189,0.747676,0.7189,0.717296
3,0.6345,0.984834,0.7376,0.756742,0.7376,0.734365


[I 2025-03-29 01:22:17,159] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.000361119602496036, 'weight_decay': 0.005, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8476,1.11856,0.7044,0.721619,0.7044,0.701053
2,0.8547,1.00737,0.7305,0.754894,0.7305,0.728339
3,0.5759,0.907174,0.7615,0.776278,0.7615,0.759747
4,0.4107,0.948079,0.7526,0.775115,0.7526,0.754889
5,0.3049,0.862633,0.7719,0.785948,0.7719,0.770975
6,0.2368,0.825234,0.7785,0.793185,0.7785,0.780351
7,0.1959,0.822035,0.7784,0.789253,0.7784,0.778254
8,0.1668,0.839353,0.7752,0.792085,0.7752,0.777754
9,0.1473,0.847015,0.7777,0.791769,0.7777,0.778795
10,0.1342,0.796866,0.7843,0.795817,0.7843,0.784981


[I 2025-03-29 01:34:49,781] Trial 42 finished with value: 0.7849806209734471 and parameters: {'learning_rate': 0.000361119602496036, 'weight_decay': 0.005, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 43 with params: {'learning_rate': 0.00028926146194969087, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9166,1.138146,0.706,0.721992,0.706,0.702969
2,0.8712,1.018095,0.7306,0.753663,0.7306,0.729938
3,0.5841,0.90931,0.7604,0.775102,0.7604,0.758243
4,0.4132,0.95539,0.7513,0.775207,0.7513,0.753843
5,0.3083,0.870851,0.7693,0.782491,0.7693,0.767808
6,0.242,0.827695,0.7739,0.784882,0.7739,0.774352


[I 2025-03-29 01:42:21,731] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0006859986671601721, 'weight_decay': 0.008, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.746,1.200844,0.6831,0.714004,0.6831,0.677873
2,0.9293,1.080265,0.7092,0.739288,0.7092,0.708929
3,0.6527,0.976148,0.7408,0.760125,0.7408,0.738823
4,0.4733,0.938153,0.7546,0.774414,0.7546,0.755826
5,0.353,0.897287,0.7636,0.780524,0.7636,0.762724
6,0.2701,0.84058,0.7793,0.792764,0.7793,0.779951
7,0.2128,0.844563,0.7747,0.788184,0.7747,0.775072
8,0.1747,0.83317,0.7785,0.796479,0.7785,0.781989
9,0.1484,0.817626,0.7814,0.793512,0.7814,0.783258
10,0.1318,0.775082,0.7871,0.799485,0.7871,0.788305


[I 2025-03-29 01:54:55,308] Trial 44 finished with value: 0.7883047264774512 and parameters: {'learning_rate': 0.0006859986671601721, 'weight_decay': 0.008, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}. Best is trial 35 with value: 0.7904627692423465.


Trial 45 with params: {'learning_rate': 0.0017941045496355028, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9963,1.64086,0.5696,0.622898,0.5696,0.562155
2,1.3049,1.339345,0.6467,0.688339,0.6467,0.643823
3,0.9968,1.204462,0.683,0.709911,0.683,0.677863


[I 2025-03-29 01:58:41,293] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0005322483773918045, 'weight_decay': 0.006, 'warmup_steps': 30, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7618,1.14786,0.6956,0.720513,0.6956,0.692892
2,0.8772,1.024745,0.7287,0.757534,0.7287,0.728796
3,0.601,0.942711,0.7518,0.770395,0.7518,0.749701
4,0.4315,0.960488,0.7481,0.775178,0.7481,0.750417
5,0.323,0.87388,0.7681,0.781498,0.7681,0.767307
6,0.2491,0.81262,0.7831,0.794441,0.7831,0.783782
7,0.2002,0.813401,0.7817,0.791273,0.7817,0.782063
8,0.1683,0.819458,0.7826,0.797293,0.7826,0.785365
9,0.1446,0.817821,0.782,0.794185,0.782,0.783559
10,0.1298,0.772894,0.7874,0.798438,0.7874,0.788176


[I 2025-03-29 02:11:17,569] Trial 46 finished with value: 0.7881761138885807 and parameters: {'learning_rate': 0.0005322483773918045, 'weight_decay': 0.006, 'warmup_steps': 30, 'lambda_param': 0.8, 'temperature': 2.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 47 with params: {'learning_rate': 0.0007573001517173893, 'weight_decay': 0.006, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7391,1.254089,0.6728,0.71065,0.6728,0.667186
2,0.9501,1.085981,0.7115,0.737634,0.7115,0.711108
3,0.6752,0.996215,0.7393,0.760399,0.7393,0.737612
4,0.4932,1.00695,0.7405,0.767625,0.7405,0.74219
5,0.3666,0.886913,0.7655,0.779378,0.7655,0.764349
6,0.2791,0.82984,0.7763,0.787587,0.7763,0.776837


[I 2025-03-29 02:18:55,613] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.001671417393382959, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9569,1.580449,0.586,0.637836,0.586,0.578834
2,1.2721,1.352008,0.6445,0.684856,0.6445,0.641023
3,0.9675,1.175131,0.6861,0.712256,0.6861,0.680739


[I 2025-03-29 02:22:44,868] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0004507997556548818, 'weight_decay': 0.005, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6907,1.125084,0.701,0.723109,0.701,0.69761
2,0.8487,1.017493,0.7291,0.749775,0.7291,0.726885
3,0.5761,0.924467,0.752,0.767631,0.752,0.748938
4,0.4131,0.945442,0.7526,0.777044,0.7526,0.75498
5,0.3064,0.878564,0.773,0.787422,0.773,0.772223
6,0.2402,0.816068,0.783,0.793576,0.783,0.783207
7,0.1959,0.822424,0.7821,0.793264,0.7821,0.782262
8,0.1663,0.830934,0.7804,0.796493,0.7804,0.783118
9,0.1443,0.822152,0.7801,0.792887,0.7801,0.781817
10,0.1309,0.782402,0.7865,0.795727,0.7865,0.786874


[I 2025-03-29 02:35:18,612] Trial 49 finished with value: 0.7868743687086777 and parameters: {'learning_rate': 0.0004507997556548818, 'weight_decay': 0.005, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 50 with params: {'learning_rate': 0.0005773111008367347, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7622,1.169918,0.6904,0.707943,0.6904,0.685146
2,0.8956,1.058776,0.7133,0.742428,0.7133,0.711031
3,0.6186,0.947987,0.7475,0.767475,0.7475,0.744936
4,0.4441,0.949104,0.7522,0.773533,0.7522,0.753654
5,0.3308,0.872527,0.7707,0.786285,0.7707,0.770526
6,0.2543,0.802897,0.7843,0.795021,0.7843,0.78505
7,0.2053,0.823457,0.7812,0.792944,0.7812,0.780902
8,0.1697,0.811709,0.7836,0.796451,0.7836,0.785303
9,0.1455,0.818901,0.7838,0.796561,0.7838,0.785267
10,0.1302,0.770069,0.7895,0.799664,0.7895,0.789745


[I 2025-03-29 02:47:56,433] Trial 50 finished with value: 0.7897453927576475 and parameters: {'learning_rate': 0.0005773111008367347, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}. Best is trial 35 with value: 0.7904627692423465.


Trial 51 with params: {'learning_rate': 0.0003854365453092872, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8127,1.108748,0.7095,0.725642,0.7095,0.706163
2,0.8508,1.010072,0.7294,0.751658,0.7294,0.728382
3,0.5741,0.915844,0.7574,0.774042,0.7574,0.755425
4,0.405,0.953977,0.7504,0.773883,0.7504,0.752571
5,0.3038,0.870869,0.7686,0.782768,0.7686,0.767715
6,0.238,0.809455,0.7799,0.791911,0.7799,0.780942
7,0.1955,0.827755,0.7777,0.789066,0.7777,0.778509
8,0.1665,0.834538,0.7755,0.790587,0.7755,0.777916
9,0.1462,0.839465,0.7748,0.788417,0.7748,0.77617
10,0.1329,0.795569,0.778,0.790174,0.778,0.778736


[I 2025-03-29 03:00:30,387] Trial 51 finished with value: 0.7787364185271476 and parameters: {'learning_rate': 0.0003854365453092872, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 35 with value: 0.7904627692423465.


Trial 52 with params: {'learning_rate': 0.0006211979211525547, 'weight_decay': 0.002, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.68,1.185923,0.6856,0.712318,0.6856,0.680375
2,0.9035,1.062693,0.7156,0.742275,0.7156,0.713644
3,0.6294,0.970908,0.7436,0.765312,0.7436,0.742402
4,0.4545,0.947704,0.7518,0.771974,0.7518,0.753264
5,0.3395,0.897776,0.765,0.781646,0.765,0.763942
6,0.2574,0.831686,0.7787,0.791319,0.7787,0.779019
7,0.2064,0.828422,0.7821,0.792925,0.7821,0.782078
8,0.1714,0.829266,0.7778,0.792183,0.7778,0.780308
9,0.1464,0.817642,0.7814,0.792024,0.7814,0.782489
10,0.1308,0.781212,0.7885,0.798847,0.7885,0.788779


[I 2025-03-29 03:13:13,615] Trial 52 finished with value: 0.7887794555894493 and parameters: {'learning_rate': 0.0006211979211525547, 'weight_decay': 0.002, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 53 with params: {'learning_rate': 0.00045462270345961094, 'weight_decay': 0.0, 'warmup_steps': 5, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6715,1.139217,0.7026,0.724403,0.7026,0.698015
2,0.8498,1.036716,0.7304,0.757583,0.7304,0.730975
3,0.5757,0.929208,0.7547,0.769867,0.7547,0.753355
4,0.4098,0.960191,0.751,0.773817,0.751,0.752621
5,0.3067,0.882988,0.7646,0.776964,0.7646,0.763126
6,0.2386,0.818844,0.7832,0.795116,0.7832,0.784251
7,0.1964,0.812563,0.7843,0.795337,0.7843,0.785338
8,0.1659,0.833951,0.7799,0.795,0.7799,0.78242
9,0.1441,0.825636,0.7807,0.793553,0.7807,0.782028
10,0.13,0.788462,0.7844,0.795897,0.7844,0.78494


[I 2025-03-29 03:25:55,831] Trial 53 finished with value: 0.7849403565936699 and parameters: {'learning_rate': 0.00045462270345961094, 'weight_decay': 0.0, 'warmup_steps': 5, 'lambda_param': 0.4, 'temperature': 3.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 54 with params: {'learning_rate': 0.000352224533791071, 'weight_decay': 0.003, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7338,1.104891,0.7072,0.725132,0.7072,0.704865
2,0.8416,1.005208,0.7376,0.760541,0.7376,0.736114
3,0.5684,0.92139,0.7526,0.767349,0.7526,0.749671
4,0.4026,0.969335,0.7441,0.772386,0.7441,0.747316
5,0.3008,0.889908,0.7652,0.781213,0.7652,0.76418
6,0.236,0.83016,0.7774,0.791129,0.7774,0.778521
7,0.1946,0.83791,0.7754,0.787573,0.7754,0.775846
8,0.167,0.85257,0.7703,0.786714,0.7703,0.772726
9,0.1478,0.855086,0.7733,0.787037,0.7733,0.774699
10,0.135,0.813352,0.7749,0.787312,0.7749,0.7758


[I 2025-03-29 03:38:30,733] Trial 54 finished with value: 0.775799724182971 and parameters: {'learning_rate': 0.000352224533791071, 'weight_decay': 0.003, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 55 with params: {'learning_rate': 0.0007369870844525628, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6944,1.221151,0.679,0.711096,0.679,0.676417
2,0.9334,1.084041,0.7128,0.742247,0.7128,0.710874
3,0.6612,0.992614,0.74,0.765267,0.74,0.738034
4,0.4838,0.979699,0.7435,0.765581,0.7435,0.744642
5,0.361,0.908522,0.7612,0.774763,0.7612,0.76007
6,0.2774,0.852691,0.771,0.782696,0.771,0.77061


[I 2025-03-29 03:46:01,320] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 5.902380787515226e-05, 'weight_decay': 0.002, 'warmup_steps': 29, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0302,1.963785,0.5489,0.578849,0.5489,0.536047
2,1.589,1.444321,0.6422,0.661512,0.6422,0.638476
3,1.1752,1.227933,0.6881,0.696845,0.6881,0.684021
4,0.9579,1.198037,0.6872,0.708331,0.6872,0.687522
5,0.8218,1.064031,0.7246,0.731735,0.7246,0.720368
6,0.7228,1.012731,0.731,0.740214,0.731,0.730886


[I 2025-03-29 03:53:32,773] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0018124741331072727, 'weight_decay': 0.004, 'warmup_steps': 14, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9703,1.61486,0.5811,0.63165,0.5811,0.575731
2,1.2935,1.349385,0.6468,0.680937,0.6468,0.642807
3,0.9978,1.203934,0.6846,0.713415,0.6846,0.680986


[I 2025-03-29 03:57:20,851] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0018210074133797683, 'weight_decay': 0.002, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9774,1.625703,0.5727,0.624673,0.5727,0.565984
2,1.3163,1.345754,0.6474,0.684704,0.6474,0.642497
3,1.0076,1.196103,0.6841,0.712608,0.6841,0.680485


[I 2025-03-29 04:01:06,516] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0006499402069571891, 'weight_decay': 0.0, 'warmup_steps': 12, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6758,1.201592,0.6786,0.702742,0.6786,0.672897
2,0.9032,1.058096,0.716,0.742034,0.716,0.714864
3,0.6383,0.965106,0.7471,0.767925,0.7471,0.745559
4,0.4621,0.963952,0.7498,0.774592,0.7498,0.751426
5,0.3423,0.895022,0.7618,0.775335,0.7618,0.760244
6,0.2646,0.831016,0.7758,0.787319,0.7758,0.776472
7,0.209,0.833109,0.7785,0.789396,0.7785,0.77888
8,0.1732,0.819756,0.7831,0.799044,0.7831,0.785996
9,0.147,0.820791,0.7788,0.790571,0.7788,0.779649
10,0.1306,0.776065,0.7899,0.800582,0.7899,0.79039


[I 2025-03-29 04:13:47,629] Trial 59 finished with value: 0.7903902459098492 and parameters: {'learning_rate': 0.0006499402069571891, 'weight_decay': 0.0, 'warmup_steps': 12, 'lambda_param': 0.8, 'temperature': 7.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 60 with params: {'learning_rate': 0.0004891655372785404, 'weight_decay': 0.0, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.677,1.1293,0.7005,0.721676,0.7005,0.697388
2,0.8506,1.012252,0.7302,0.752616,0.7302,0.729337
3,0.5885,0.926182,0.751,0.768325,0.751,0.748486
4,0.4208,0.95569,0.7486,0.770242,0.7486,0.750045
5,0.3149,0.872479,0.767,0.78259,0.767,0.765231
6,0.2429,0.825916,0.7785,0.79136,0.7785,0.780322


[I 2025-03-29 04:21:23,029] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0015848806725402348, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9151,1.582892,0.591,0.636904,0.591,0.582833
2,1.2497,1.308058,0.6523,0.684983,0.6523,0.647963
3,0.9439,1.171189,0.6885,0.715847,0.6885,0.685599
4,0.7237,1.115066,0.7184,0.742759,0.7184,0.717843
5,0.5574,1.04697,0.7291,0.745103,0.7291,0.726447
6,0.4152,0.92861,0.756,0.768865,0.756,0.755478


[I 2025-03-29 04:28:55,631] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0004773323020302073, 'weight_decay': 0.002, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6753,1.123542,0.7051,0.722571,0.7051,0.701707
2,0.8551,1.032117,0.7262,0.754319,0.7262,0.725064
3,0.585,0.924118,0.7536,0.772094,0.7536,0.751612
4,0.4149,0.948101,0.7497,0.774636,0.7497,0.752619
5,0.3107,0.882217,0.7672,0.784225,0.7672,0.76767
6,0.2422,0.821736,0.7821,0.793477,0.7821,0.783014
7,0.1966,0.818913,0.7794,0.791451,0.7794,0.779782
8,0.1672,0.842224,0.7727,0.790697,0.7727,0.775591
9,0.1446,0.831778,0.7782,0.791642,0.7782,0.779319
10,0.1304,0.7798,0.7836,0.795513,0.7836,0.784291


[I 2025-03-29 04:41:41,711] Trial 62 finished with value: 0.7842910138835667 and parameters: {'learning_rate': 0.0004773323020302073, 'weight_decay': 0.002, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 63 with params: {'learning_rate': 0.0002681159956916346, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9178,1.134514,0.7067,0.71865,0.7067,0.702657
2,0.8789,1.010991,0.7311,0.752986,0.7311,0.729788
3,0.5924,0.924063,0.7545,0.769751,0.7545,0.752411
4,0.4199,0.944831,0.7494,0.771092,0.7494,0.751633
5,0.314,0.86961,0.7709,0.783463,0.7709,0.7701
6,0.2483,0.822503,0.7754,0.787429,0.7754,0.776288
7,0.2067,0.837568,0.7771,0.787844,0.7771,0.777324
8,0.1776,0.853565,0.7704,0.784709,0.7704,0.77241
9,0.1571,0.863439,0.774,0.787073,0.774,0.774917
10,0.1451,0.82437,0.7764,0.788271,0.7764,0.776854


[I 2025-03-29 04:54:21,766] Trial 63 finished with value: 0.7768535094859479 and parameters: {'learning_rate': 0.0002681159956916346, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 64 with params: {'learning_rate': 0.0003145032499765757, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8416,1.123876,0.7031,0.719227,0.7031,0.699625
2,0.8563,1.014045,0.728,0.751351,0.728,0.726384
3,0.5755,0.892358,0.7608,0.775392,0.7608,0.75966
4,0.4056,0.954636,0.7488,0.771453,0.7488,0.751394
5,0.3045,0.876377,0.7676,0.783364,0.7676,0.766945
6,0.2388,0.824065,0.7726,0.785245,0.7726,0.773596


[I 2025-03-29 05:01:54,691] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0013712508563756259, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8896,1.491544,0.6122,0.662231,0.6122,0.607638
2,1.1715,1.283521,0.665,0.701958,0.665,0.663881
3,0.8905,1.125581,0.7072,0.732456,0.7072,0.704032
4,0.6691,1.063681,0.7302,0.754823,0.7302,0.731416
5,0.5119,0.980969,0.7437,0.755829,0.7437,0.741711
6,0.3837,0.900765,0.7613,0.772428,0.7613,0.761468


[I 2025-03-29 05:09:33,132] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0003033154846890641, 'weight_decay': 0.007, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8878,1.126979,0.7024,0.716739,0.7024,0.698668
2,0.87,1.009208,0.731,0.752597,0.731,0.728949
3,0.5827,0.911294,0.7609,0.775927,0.7609,0.758902
4,0.413,0.958631,0.7448,0.771449,0.7448,0.748075
5,0.3082,0.863816,0.7729,0.78436,0.7729,0.771817
6,0.2418,0.831795,0.7722,0.786048,0.7722,0.772795


[I 2025-03-29 05:17:05,034] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0009864349181481417, 'weight_decay': 0.001, 'warmup_steps': 19, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.748,1.357863,0.6407,0.680646,0.6407,0.635099
2,1.035,1.1619,0.689,0.720472,0.689,0.685926
3,0.7529,1.091905,0.7148,0.745702,0.7148,0.714066
4,0.5608,1.054044,0.7261,0.75606,0.7261,0.727052
5,0.4207,0.965375,0.7475,0.766019,0.7475,0.745455
6,0.3162,0.876113,0.7645,0.780105,0.7645,0.76589


[I 2025-03-29 05:24:34,466] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.002593866966886471, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2598,1.83568,0.5166,0.5673,0.5166,0.509068
2,1.5438,1.559459,0.5921,0.641633,0.5921,0.590656
3,1.2145,1.328263,0.6444,0.66913,0.6444,0.639296
4,0.9811,1.322079,0.6596,0.691658,0.6596,0.659369
5,0.7888,1.161417,0.6956,0.710905,0.6956,0.69102
6,0.6132,1.092471,0.7108,0.728158,0.7108,0.709425


[I 2025-03-29 05:32:03,250] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.00012787851127128892, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3325,1.377548,0.6544,0.665858,0.6544,0.64852
2,1.0966,1.135475,0.7096,0.726632,0.7096,0.707602
3,0.773,1.000301,0.7344,0.746539,0.7344,0.731858


[I 2025-03-29 05:35:47,629] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0012326232267415068, 'weight_decay': 0.006, 'warmup_steps': 31, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.855,1.461766,0.6212,0.668471,0.6212,0.618366
2,1.1277,1.229353,0.6744,0.710684,0.6744,0.670439
3,0.8381,1.106131,0.7066,0.735821,0.7066,0.703497


[I 2025-03-29 05:39:32,319] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0006076581047829204, 'weight_decay': 0.008, 'warmup_steps': 32, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.753,1.177302,0.6883,0.713796,0.6883,0.684282
2,0.9029,1.034327,0.7175,0.744457,0.7175,0.715314
3,0.6285,0.943823,0.7481,0.768095,0.7481,0.747056
4,0.4518,0.971881,0.7469,0.773068,0.7469,0.748955
5,0.3397,0.861649,0.7728,0.785174,0.7728,0.771661
6,0.2593,0.81398,0.7803,0.794499,0.7803,0.781419
7,0.2064,0.819875,0.7793,0.790902,0.7793,0.779665
8,0.1706,0.827134,0.779,0.796298,0.779,0.781993
9,0.1457,0.808539,0.7847,0.796511,0.7847,0.78562
10,0.1308,0.768779,0.7884,0.799266,0.7884,0.789101


[I 2025-03-29 05:52:08,428] Trial 71 finished with value: 0.7891010363795702 and parameters: {'learning_rate': 0.0006076581047829204, 'weight_decay': 0.008, 'warmup_steps': 32, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 35 with value: 0.7904627692423465.


Trial 72 with params: {'learning_rate': 0.0005309673044490899, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7613,1.157429,0.6919,0.719742,0.6919,0.686223
2,0.8776,1.016581,0.7274,0.751973,0.7274,0.725312
3,0.6016,0.929012,0.7562,0.774417,0.7562,0.754109
4,0.428,0.950691,0.7514,0.777502,0.7514,0.753326
5,0.3216,0.876104,0.7676,0.781112,0.7676,0.766868
6,0.2471,0.810171,0.7811,0.793675,0.7811,0.782932
7,0.2006,0.807653,0.7832,0.794083,0.7832,0.783578
8,0.1678,0.81664,0.7807,0.795626,0.7807,0.783423
9,0.1438,0.807392,0.7815,0.791431,0.7815,0.781426
10,0.13,0.771661,0.7866,0.799731,0.7866,0.787838


[I 2025-03-29 06:04:52,609] Trial 72 finished with value: 0.7878383506232299 and parameters: {'learning_rate': 0.0005309673044490899, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 35 with value: 0.7904627692423465.


Trial 73 with params: {'learning_rate': 0.00015972356535382792, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1057,1.259991,0.6795,0.691969,0.6795,0.674568
2,0.9916,1.080157,0.7205,0.73883,0.7205,0.718935
3,0.6889,0.968549,0.7412,0.753398,0.7412,0.739046
4,0.5121,0.982014,0.7353,0.75511,0.7353,0.736424
5,0.3986,0.909502,0.7568,0.767659,0.7568,0.75432


[W 2025-03-29 06:12:23,882] Trial 73 failed with parameters: {'learning_rate': 0.00015972356535382792, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 6.0} because of the following error: FileNotFoundError(2, 'No such file or directory').
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/integration_utils.py", line 250, in _objective
    trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2241, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2639, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval, start_time)
  File "/usr/local/lib/python3.1

FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/.cache/huggingface/metrics/accuracy/default/default_experiment-1-0.arrow'

In [None]:
print(best_distil_pretrained)

In [None]:
print("Best random init training score: ", best_base_random)
print("Best random init distilation trianing score: ", best_distill_random)
print("Best pretrained (head only) training score: ", best_base_head)
print("Best pretrained distilation (head only) training score: ",best_distill_head)
print("Best pretrained training score: ", best_base_pretrained)
print("Best pretrained distilation training score: ", best_distil_pretrained)